Feature OCPSTRAT-106: Provide a TEST action for Serverless functions

View the Description

Feature Overview (aka. Goal Summary)

An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

Requirements (aka. Acceptance Criteria):

A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.

Epic ODC-7240: Provide a Test action for Serverless functions

View the Description

Design Doc:

https://docs.google.com/document/d/1m6OYdz696vg1v8591v0Ao0_r_iqgsWjjM2UjcR_tIrM/

Problem:

Goal

As a developer, I want to be able to test my serverless function after it's been deployed.

Why is it important?

Use cases:

As a developer, I want to test my serverless function

Acceptance criteria:

This features needs to work in ACM (Multi cluster environment when console is being run on the Hub cluster)

Dependencies (External/Internal):

Please add a spike to see if there are dependencies.

Design Artifacts:

Exploration:

Developers can use the the kn func invoke CLI to accomplish this. According to Naina, there is an API, but it's in Go.

Note:

Story ODC-7288: Implement a proxy backend to access private Serverless functions

View the Description View the linked PRs

Description

As a user, I want to invoke a Serverless function from the developer console. This action should be available as a page and as a modal.

Acceptance Criteria

A backend proxy to invoke a serverless function (or a k8s service in general) from the frontend without a public route.
The API endpoint should be only accessible to logged-in users.
Should also work when the bridge is running off-cluster (as developers start them mostly for local development)

Additional Details:

This will be similar to the web terminal proxy, except that no auth headers will be passed to the underlying service.

We need something similar to:

POST /proxy/in-cluster

{
  endpoint: string
  # Or just service: string ?? tbd.

  headers: Record<string, string | string[]>
  body: string
  timeout: number
}

https://github.com/openshift/console/pull/12789

Story ODC-7275: Implement invoke serverless functions

View the Description View the linked PRs

Description

As a user, I want to invoke a Serverless function from the developer console. This action should be available as a page and as a modal.

This story depends on ~~ODC-7273~~, ~~ODC-7274~~, and ~~ODC-7288~~. This story should bring the backend proxy, and the frontend together and finalize the work.

Acceptance Criteria

Write proper types if they are missed
Connect the form and invoke a serverless function, consume and show the response
Unit testes
E2E tests

Additional Details:

https://github.com/openshift/console/pull/12755

Story ODC-7296: Rename all instances of YAMLEditor to CodeEditor

View the Description View the linked PRs

Description

Current YAMLEditor also supports other languages like JSON. Therefore need to rename the component.

Acceptance Criteria

Rename all instances of YAMLEditor to CodeEditor

Additional Details:

https://github.com/openshift/console/pull/12708

Story ODC-7274: Prepare a page and modal to invoke a Serverless function

View the Description View the linked PRs

Description

As a user, I want to invoke a Serverless function from the developer console. This action should be available as a page and as a modal.

This story is to evaluate a good UI for this and check this with our PM (Serena) and the Serverless team (Naina and Lance).

Acceptance Criteria

Add a new page with title "Invoke Serverless function {function-name}" and should be available via a new URL (/serverless/ns/:ns/invoke-function/:function-name/).
Implement a form with formik to "invoke" (console.log for now) Serverless functions, without writing the network call for this already. Focus on the UI to get feedback as early as possible. Use reusable, well-named components anyway.
The page should be also available as a modal. Add a new action to all Serverless Services with the label (tbd) to open this modal from the Topology graph or from the Serverless Service list view.
The page should have two tabs or two panes for the request and response. Each of this tabs/panes should have again two tabs, "similar" to the browser network inspector. See below for what we know currently.
Get confirmation from Christoph, Serena, Naina, and Lance.
Disable the action until we implement the network communication in ~~ODC-7275~~ with the serverless function.
No e2e tests are needed for this story.

Additional Details:

Information the form should show:

Request tab shows "Body" and "Options" tab
1. Body is just a full size editor. We should reuse our code editor.
2. Options contains:
  1. Auto complete text field “Content type” with placeholder “application/json”, that will be used when nothing is entered
  2. Dropdown “Format” with values “cloudevent” (default) and “http”
  3. Text field “Type” with placeholder text “boson.fn”, that will be used when nothing is entered
  4. Text field “Source” with placeholder “/boson/fn”, that will be used when nothing is entered
Response tab shows Body and Info tab
1. Body is a full size editor that shows the response. We should format a JSON string with JSON.stringify(data, null, 2)
2. Info contains:
  1. Id (id)
  2. Type (type)
  3. Source (source)
  4. Time (time) (formatted)
  5. Content-Type: (datacontenttype)

https://github.com/openshift/console/pull/12686

Feature OCPSTRAT-109: Provide discoverability about available RH developer tooling

View the Description

< High-Level description of the feature ie: Executive Summary >

Goals

Cluster administrators need an in-product experience to discover and install new Red Hat offerings that can add high value to developer workflows.

Requirements

Requirements	Notes	IS MVP
Discover new offerings in Home Dashboard		Y
Access details outlining value of offerings		Y
Access step-by-step guide to install offering		N
Allow developers to easily find and use newly installed offerings		Y
Support air-gapped clusters		Y

- (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

Discovering solutions that are not available for installation on cluster

Dependencies

No known dependencies

Background, and strategic fit

Assumptions

None

Customer Considerations

Documentation Considerations

Quick Starts

What does success look like?

QE Contact

Impact

Related Architecture/Technical Documents

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Epic ODC-7111: Provide discoverability about RH developer tooling that is available to me

View the Description

Problem:

Developers using Dev Console need to be made aware of the RH developer tooling available to them.

Goal:

Provide awareness to developers using Dev Console of the RH developer tooling that is available to them, including:

odo
OpenShift IDE extension, which is available on IntelliJ & VScode
- VSCode Knative IDE extension: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-knative
- IntelliJ Knative Plugin: https://plugins.jetbrains.com/plugin/16476-knative--serverless-functions-by-red-hat
Dev spaces

Consider enhancing the +Add page and/or the Guided tour

Provide a Quick Start for installing the Cryostat Operator

Why is it important?

To increase usage of our RH portfolio

Acceptance criteria:

Quick Start - Installing Cryostat Operator
Quick Start - Get started with JBoss EAP using a Helm Chart
Discoverability of the IDE extensions from Create Serverless form
Update Terminal step of the Guided Tour to indicate that odo CLI is accessible (link to https://developers.redhat.com/products/odo/overview)

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Story ODC-7292: Create new Quick Start for installing the Cryostat Operator

View the Description View the linked PRs

Description

This story is to add new Quick Start for installing the Cryostat Operator

Acceptance Criteria

Create new Quick Start for installing the Cryostat Operator

Additional Details:

https://github.com/openshift/console-operator/pull/770

Story ODC-7316: Add discoverability of the IDE extensions from Create Serverless form

View the Description View the linked PRs

Description

Add below IDE extensions in create serverless form,

VSCode Knative IDE extension: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-knative
IntelliJ Knative Plugin: https://plugins.jetbrains.com/plugin/16476-knative--serverless-functions-by-red-hat

Acceptance Criteria

In create serverless form add above IDE extensions
On click of the link, user needs to take to respective pages
Add e2e tests for that

Additional Details:

https://github.com/openshift/console/pull/12846

Story ODC-7312: Add OpenShift Quickstart for JBoss EAP 7

View the Description View the linked PRs

Description

Add OpenShift Quickstart for JBoss EAP 7

Acceptance Criteria

Add OpenShift Quickstart for JBoss EAP 7

Additional Details:

https://github.com/openshift/console-operator/pull/760

Story ODC-7339: Follow up ticket for ODC-7292

View the Description View the linked PRs

This issue is to handle the PR comment - https://github.com/openshift/console-operator/pull/770#pullrequestreview-1501727662 for the issue https://issues.redhat.com/browse/ODC-7292

https://github.com/openshift/console-operator/pull/773

Story ODC-7317: Update Terminal step of the Guided Tour to indicate that odo CLI is accessible

View the Description View the linked PRs

Description

Update Terminal step of the Guided Tour to indicate that odo CLI is accessible - https://developers.redhat.com/products/odo/overview

Acceptance Criteria

Update Guided tour of Web Terminal to add odo CLI link
On click of link user has to redirected to respective page

Additional Details:

https://github.com/openshift/console/pull/12848

Feature OCPSTRAT-118: Deprecated deploymentconfig to deployments

View the Description

We are deprecating DeploymentConfig with Deployment in OpenShift because Deployment is the recommended way to deploy applications. Deployment is a more flexible and powerful resource that allows you to control the deployment of your applications more precisely. DeploymentConfig is a legacy resource that is no longer necessary. We will continue to support DeploymentConfig for a period of time, but we encourage you to migrate to Deployment as soon as possible.

Here are some of the benefits of using Deployment over DeploymentConfig:

Deployment is more flexible. You can specify the number of replicas to deploy, the image to deploy, and the environment variables to use.
Deployment is more powerful. You can use Deployment to roll out changes to your applications in a controlled manner.
Deployment is the recommended way to deploy applications. OpenShift will continue to improve Deployment and make it the best way to deploy applications.

We hope that you will migrate to Deployment as soon as possible. If you have any questions, please contact us.

Epic WRKLDS-695: Make DeploymentConfig and BuildConfig apis optional

View the Description

Epic Goal

Make it possible to disable the DeploymentConfig and BuildConfig APIs, and associated controller logic.

Given the nature of this component (embedded into a shared api server and controller manager), this will likely require adding logic within those shared components to not enable specific bits of function when the build or DeploymentConfig capability is disabled, and watching the enabled capability set so that the components enable the functionality when necessary.

I would not expect us to split the components out of their existing location as part of this, though that is theoretically an option.

Why is this important?

Reduces resource footprint and bug surface area for clusters that do not need to utilize the DeploymentConfig or BuildConfig functionality, such as SNO and OKE.

Acceptance Criteria (Mandatory)

CI - MUST be running successfully with tests automated (we have an existing CI job that runs a cluster with all optional capabilities disabled. Passing that job will require disabling certain deploymentconfig tests when the cap is disabled)
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Cluster install capabilities

Previous Work (Optional):

The optional cap architecture and guidance for adding a new capability is described here: https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md

Open questions::

None

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Story WRKLDS-728: Capabilities: enable/disable API based on capabilities (OAS + OASO)

View the Description View the linked PRs

Make the list of enabled/disable controllers in OAS reflect enabled/disabled capabilities.

Acceptance criteria:

OAS allows to specify a list of enabled/disabled APIs (e.g. watches, caches, ...)
OASO watches capabilities and generates the right configuration for OAS with enabled/disabled list of APIs
Documentation is properly updated

QE:

enabled/disable capabilities and validate a given API (DC, Builds, ...) is/is not managed by a cluster:
checking the OAS logs do/do not log entries about affected API(s)
DC/Builds objects are created/fail to be created

Feature OCPSTRAT-122: Introduce / Refactor ETCD Operator to support HyperShift

View the Description

Feature Overview

At the moment, HyperShift is relying on an older etcd operator (i.e, the CoreOS etcd operator). However, this operator is basic and does not support HA as required.

Goals

Introduce a reliable component to operate Etcd that:

Is backed up by a stable operator
Supports Images with a Hash
Supprts for Backups
Local-persistent volumes for persistent data?
Encryption.
HA and Scalablity.

Epic HOSTEDCP-327: [B2-F4]Enable migrating a cluster manually from one management cluster to another within the same region

View the Description

For an initial MVP of service delivery adoption of Hypershift we need to enable support for manual cluster migration.

Additional information: https://docs.google.com/presentation/d/1JDfd34jvj_4VvVn1bNieSXRejbFqAs_g8G-5rBqTtxw/edit?usp=sharing

Story HOSTEDCP-445: Test/Document manual migration (Controlplane + Nodes)

View the Description View the linked PRs

Following on from https://issues.redhat.com/browse/HOSTEDCP-444 we need to add the steps to enable migration of the Node/CAPI resources to enable workloads to continue running during controlplane migration.

This will be a manual process where controlplane downtime will occur.

This must satisfy a successful migration criteria:

All HC conditions are positive.
All NodePool conditions are positive.
All service endpoints kas/oauth/ignition server... are reachable.
Ability to create/scale NodePools remains operational.

We need to validate and document this manually for starters.

Eventually this should be automated in the upcoming e2e test.

We could even have a job running conformance tests over a migrated cluster

Feature OCPSTRAT-130: [Technology Preview] Support static IP assignments with vSphere IPI

View the Description

Epic Goal

As an OpenShift on vSphere administrator, I want to specify static IP assignments to my VMs.

As an OpenShift on vSphere administrator, I want to completely avoid using a DHCP server for the VMs of my OpenShift cluster.

Why is this important?

Customers want the convenience of IPI deployments for vSphere without having to use DHCP. As in bare metal, where ~~METAL-1~~ added this capability, some of the reasons are the security implications of DHCP (customers report that for example depending on configuration they allow any device to get in the network). At the same time IPI deployments only require to our OpenShift installation software, while with UPI they would need automation software that in secure environments they would have to certify along with OpenShift.

Acceptance Criteria

I can specify static IPs for node VMs at install time with IPI

Previous Work

Bare metal related work:

CoreOS Afterburn:

https://github.com/coreos/afterburn/blob/main/src/providers/vmware/amd64.rs#L28

https://github.com/openshift/installer/blob/master/upi/vsphere/vm/main.tf#L34

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic SPLAT-827: [Technology Preview] Support static IP assignments with vSphere IPI

View the Description View the linked PRs

Epic Goal

As an OpenShift on vSphere administrator, I want to specify static IP assignments to my VMs.

As an OpenShift on vSphere administrator, I want to completely avoid using a DHCP server for the VMs of my OpenShift cluster.

Why is this important?

Acceptance Criteria

I can specify static IPs for node VMs at install time with IPI

Previous Work

Bare metal related work:

CoreOS Afterburn:

https://github.com/coreos/afterburn/blob/main/src/providers/vmware/amd64.rs#L28

https://github.com/openshift/installer/blob/master/upi/vsphere/vm/main.tf#L34

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/installer/pull/7179

Task SPLAT-1084: [vsphere] install IPAddress/IPAddressClaim CRDs

View the linked PRs

https://github.com/openshift/cluster-capi-operator/pull/117

Story SPLAT-845: [vsphere] [machine-api] Apply IP configuration to VM extraconfig

View the Description View the linked PRs

{}USER STORY:{}

As an OpenShift administrator, I want to apply an IP configuration so that I can adhere to my organizations security guidelines.

{}DESCRIPTION:{}

The vSphere machine controller needs to be modified to convert nmstate to `guestinfo.afterburn.initrd.network-kargs` upon cloning the template for a new machine. An example of this is here: https://github.com/openshift/machine-api-operator/pull/1079

{}Required:{}

PR is approved
Associated test(s) are created in https://github.com/openshift/cluster-api-actuator-pkg

{}Nice to have:{}

{}ACCEPTANCE CRITERIA:{}

{}ENGINEERING DETAILS:{}

https://github.com/openshift/enhancements/pull/1267

https://github.com/openshift/machine-api-operator/pull/1079

Feature OCPSTRAT-142: Authentication-operator ignores noproxy settings defined in the cluster-wide proxy

View the Description

Authentication-operator ignores noproxy settings defined in the cluster-wide proxy.

Expected outcome: When noproxy is set, Authentication operator should initialize connections through ingress instead of the cluster-wide proxy.

Epic AUTH-334: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task AUTH-363: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/608

Feature OCPSTRAT-143: Allow admins to add 3rd party and custom content to RHCOS

View the Description

Background and Goal

Currently in OpenShift we do not support adding 3rd party agents and other software to cluster nodes. While rpm-ostree supports adding packages, we have no way today to do that in a sane, scalable way across machineconfigpools and clusters. Some customers may not be able to meet their IT policies due to this.

In addition to third party content, some customers may want to use the layering process as a point to inject configuration. The build process allows for simple copying of config files and the ability to run arbitrary scripts to set user config files (e.g. through an Ansible playbook). This should be a supported use case, except where it conflicts with OpenShift (for example, the MCO must continue to manage Cri-O and Kubelet configs).

Example Use Cases

Bare metal firmware update software that is packaged as an RPM
Host security monitors
Forensic tools
SEIM logging agents
SSH Key management
Device Drivers from OEM/ODM partners

Acceptance Criteria

Administrators can deploy 3rd party repositories and packages to MachineConfigPools.
Administrators can easily remove added packages and repository files.
Administrators can manage system configuration files by copying files into the RHCOS build. [Note: if the same file is managed by the MCO, the MachineConfig version of the file is expected to "win" over the OS image version.]

Epic MCO-399: Document support for 3rd party and RHEL content installation

View the Description

Background

As part of enabling OCP CoreOS Layering for third party components, we will need to allow for package installation to /opt. Many OEMs and ISVs install to /opt and it would be difficult for them to make the change only for RHCOS. Meanwhile changing their RHEL target to a different target would also be problematic as their customers are expecting these tools to install in a certain way. Not having to worry about this path will provide the best ecosystem partner and customer experience.

Requirements

Document how 3rd party vendors can be compatible with our current offering.
Provide mechanism for 3rd party vendors or their customers to provide information for exceptions that require an RPM to install binaries to /opt as an install target path.

Story MCO-423: Image override e2e test

View the Description View the linked PRs

e2e test in our ci to override kernel

Possibly repurpose https://github.com/openshift/os/tree/master/tests/layering

https://github.com/openshift/machine-config-operator/pull/3558

Feature OCPSTRAT-148: Support adding custom security groups in AWS

View the Description

Feature Overview (aka. Goal Summary)

Add support for custom security groups to be attached to control plane and compute nodes at installation time.

Goals (aka. expected user outcomes)

Allow the user to provide existing security groups to be attached to the control plane and compute node instances at installation time.

Requirements (aka. Acceptance Criteria):

The user will be able to provide a list of existing security groups to the install config manifest that will be used as additional custom security groups to be attached to the control plane and compute node instances at installation time.

Out of Scope

The installer won't be responsible of creating any custom security groups, these must be created by the user before the installation starts.

Background

We do have users/customers with specific requirements on adding additional network rules to every instance created in AWS. For OpenShift these additional rules need to be added on day-2 manually as the Installer doesn't provide the ability to add custom security groups to be attached to any instance at install time.

MachineSets already support adding a list of existing custom security groups, so this could be automated already at install time manually editing each MachineSet manifest before starting the installation, but even for these cases the Installer doesn't allow the user to provide this information to add the list of these security groups to the MachineSet manifests.

Documentation Considerations

Documentation will be required to explain how this information needs to be provided to the install config manifest as any other supported field.

Epic CORS-2602: Support adding custom security groups in AWS

View the Description

Epic Goal

Allow the user to provide existing security groups to be attached to the control plane and compute node instances at installation time.

Why is this important?

We do have users/customers with specific requirements on adding additional network rules to every instance created in AWS. For OpenShift these additional rules need to be added on day-2 manually as the Installer doesn't provide the ability to add custom security groups to be attached to any instance at install time.

MachineSets already support adding a list of existing custom security groups, so this could be automated already at install time manually editing each MachineSet manifest before starting the installation, but even for these cases the Installer doesn't allow the user to provide this information to add the list of these security groups to the MachineSet manifests.

Scenarios

The user will be able to provide a list of existing security groups to the install config that will be used as additional custom security groups to be attached to the control plane and compute node instances at installation time.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Previous Work (Optional):

Compute Nodes managed by MAPI already support this feature

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2631: Add install config option for additionalSecurityGroupIDs

View the Description View the linked PRs

User Story:

As a (user persona), I want to be able to:

Add custom security groups for compute nodes
Add custom security groups for control plane nodes

so that I can achieve

Control Plane and Compute nodes can support operational specific security rules. For instance: specific traffic may be required for compute vs control plane nodes.

Acceptance Criteria:

Description of criteria:

The control plane and compute machine sections of the install config accept user input as additionalSecurityGroupIDs (when using the aws platform).

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

additionalSecurityGroupIDs:
  description: AdditionalSecurityGroupIDs contains IDs of
    additional security groups for machines, where each ID
    is presented in the format sg-xxxx.
  items:
    type: string
  type: array

This requires/does not require a design proposal.

https://github.com/openshift/installer/pull/7151

Feature OCPSTRAT-154: Create metrics, prometheus queries, dashboard in openshift that customer can monitor to understand their pod scaling limit

View the Description

Feature Overview (aka. Goal Summary)

Scaling of pod in Openshift highly depends on customer workload and their hardware setup . Some workloads on certain hardware might not scale beyond 100 pods and others might scale to 1000 pods .

As a openshift admin i want to monitor metrics that will indicate why i am not able to scale my pods . think of pressure gauge that will tell customer when its green ( can scale) when its red ( not scale)

As a openshift support team if a customer call in with their complain about pod scaling then i should be able to check some metrics and inform them why they are not able to scale

Goals (aka. expected user outcomes)

Metrics and alert and dashboard

Requirements (aka. Acceptance Criteria):

able to integrate these metrics and alert in a monitoring dashboard

Epic OCPNODE-1640: Identify metrics for indicating optimal node resource usage

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

To come up with set of metrics that indicate optimal node resource usage.

Why is this important?

These metrics will help customers to understand the capacity they have instead of restricting themselves to hard coded max pod limit.

Scenarios

As a owner of extremely high capacity machine, I want to be able to deploy as many pods as my machine can handle.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

None

Previous Work (Optional):

https://issues.redhat.com/browse/OCPNODE-1125

Open questions::

The challenging part is come up with set of metrics that accurately indicate system resource usage.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPNODE-1655: Update MCO to inject configmap with dashboard jsonnet

View the Description View the linked PRs

We need to have an operator to inject dashboard jsonnet. E.g. etcd team injects their dashboard jsonnet using their operator in the form of a config map.

https://redhat-internal.slack.com/archives/C027U68LP/p1683574004805639?thread_ts=1683573783.216759&cid=C027U68LP

We will need similar approach for node dashboard.

https://github.com/openshift/machine-config-operator/pull/3708

Feature OCPSTRAT-169: [Tech Preview] Apply user defined tags to all resources created by OpenShift (GCP)

View the Description

Feature Overview

Create a GCP cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in GCP) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

Goals

Functionality on GCP Tech Preview
inclusion in the cluster backups
flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

List any affected packages or components.

Installer
Cluster Infrastructure
Storage
Node
NetworkEdge
Internal Registry
CCO

Epic CORS-2455: User defined labels for GCP Resources(TP)

View the Description

This epic covers the work to apply user defined labels GCP resources created for openshift cluster available as tech preview.

The user should be able to define GCP labels to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags/labels in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for labeling when the resources are created.

Updating/deleting of labels added during cluster creation or adding new labels as Day-2 operation is out of scope of this epic.

List any affected packages or components.

Installer
Cluster Infrastructure
Storage
Node
NetworkEdge
Internal Registry
CCO

Reference - https://issues.redhat.com/browse/RFE-2017

Story CFE-688: Update install-config CRD to support gcp userLabels.

View the Description View the linked PRs

Enhancement proposed for Azure tags support in OCP, requires install-config CRD to be updated to include gcp userLabels for user to configure, which will be referred by the installer to apply the list of labels on each resource created by it and as well make it available in the Infrastructure CR created.

Below is the snippet of the change required in the CRD

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata: 
  name: installconfigs.install.openshift.io
spec: 
  versions: 
  - name: v1
    schema: 
      openAPIV3Schema: 
        properties: 
          platform: 
            properties: 
              gcp: 
                properties: 
                  userLabels: 
                    additionalProperties: 
                      type: string
                    description: UserLabels additional keys and values that the installer
                      will add as labels to all resources that it creates. Resources
                      created by the cluster itself may not include these labels.
                  type: object

This change is required for testing the changes of the feature, and should ideally get merged first.

Acceptance Criteria

Code linting, validation and best practices adhered to
User should be able to configure gcp user defined labels in the install-config.yaml
Fields descriptions

https://github.com/openshift/installer/pull/7126

Story CFE-858: Update terraform provider google version for latest tags and labels APIs

View the Description View the linked PRs

Enhancement proposed for GCP labels and tags support in OCP requires making use of latest APIs made available in terraform provider for google and requires an update to use the same.

Acceptance Criteria

Code linting, validation and best practices adhered to.

https://github.com/openshift/installer/pull/7201

Story CFE-846: cluster-image-registry-operator should add user defined tags to the created storage resource

View the Description View the linked PRs

Enhancement proposed for GCP tags support in OCP, requires cluster-image-registry-operator to add gcp userTags available in the status sub resource of infrastructure CR, to the gcp storage resource created.

cluster-image-registry-operator uses the method createStorageAccount() to create storage resource which should be updated to add tags after resource creation.

Acceptance Criteria

Code linting, validation and best practices adhered to
UTs and e2e are added/updated

https://github.com/openshift/cluster-image-registry-operator/pull/873

Story CFE-689: Update openshift/api package version in cluster-config-operator for GCP labels definition

View the Description View the linked PRs

cluster-config-operator makes Infrastructure CRD available for installer, which is included in it's container image from the openshift/api package and requires the package to be updated to have the latest CRD.

https://github.com/openshift/cluster-config-operator/pull/335

Story CFE-687: User defined labels for gcp should be applied on all resources created by installer

View the Description View the linked PRs

Installer creates below list of gcp resources during create cluster phase and these resources should be applied with the user defined labels and the default OCP label kubernetes-io-cluster-<cluster_id>:owned

Resources List

Resource	Terraform API
VM Instance	google_compute_instance
Image	google_compute_image
Address	google_compute_address(beta)
ForwardingRule	google_compute_forwarding_rule(beta)
Zones	google_dns_managed_zone
Storage Bucket	google_storage_bucket

Acceptance Criteria:

Code linting, validation and best practices adhered to
List of gcp resources created by installer should have user defined labels and as well as the default OCP label.

https://github.com/openshift/installer/pull/7153

Story CFE-682: cluster-image-registry-operator should add user defined labels to the created storage resource

View the Description View the linked PRs

Enhancement proposed for GCP labels support in OCP, requires cluster-image-registry-operator to add gcp userLabels available in the status sub resource of infrastructure CR, to the gcp storage resource created.

cluster-image-registry-operator uses the method createStorageAccount() to create storage resource which should be updated to add labels.

Acceptance Criteria

Code linting, validation and best practices adhered to
UTs and e2e are added/updated

Story CFE-683: machine-api-provider-gcp should add user defined labels to the machines created

View the Description View the linked PRs

Enhancement proposed for GCP labels support in OCP, requires machine-api-provider-gcp to add azure userLabels available in the status sub resource of infrastructure CR, to the gcp virtual machines resource and the sub-resources created.

Acceptance Criteria

Code linting, validation and best practices adhered to
UTs and e2e are added/updated

Story CFE-878: gcp-pd-csi-driver-operator should add labels to the resources it creates

View the linked PRs

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/74

Story CFE-686: Infrastructure CR created should be updated with gcp user defined labels

View the Description View the linked PRs

Installer generates Infrastructure CR in manifests creation step of cluster creation process based on the user provided input recorded in install-config.yaml. While generating Infrastructure CR platformStatus.gcp.resourceLabels should be updated with the user provided labels(installconfig.platform.gcp.userLabels).

Acceptance Criteria

Code linting, validation and best practices adhered to
Infrastructure CR created by installer should have gcp user defined labels if any, in status field.

https://github.com/openshift/installer/pull/7138

Feature OCPSTRAT-171: CloudCredentialOperator-based flow for OLM-managed operators and AWS STS

View the Description

Feature Overview

Much like core OpenShift operators, a standardized flow exists for OLM-managed operators to interact with the cluster in a specific way to leverage AWS STS authorization when using AWS APIs as opposed to insecure static, long-lived credentials. OLM-managed operators can implement integration with the CloudCredentialOperator in well-defined way to support this flow.

Goals:

Enable customers to easily leverage OpenShift's capabilities around AWS STS with layered products, for increased security posture. Enable OLM-managed operators to implement support for this in well-defined pattern.

Requirements:

CCO gets a new mode in which it can reconcile STS credential request for OLM-managed operators
A standardized flow is leveraged to guide users in discovering and preparing their AWS IAM policies and roles with permissions that are required for OLM-managed operators
A standardized flow is defined in which users can configure OLM-managed operators to leverage AWS STS
An example operator is used to demonstrate the end2end functionality
Clear instructions and documentation for operator development teams to implement the required interaction with the CloudCredentialOperator to support this flow

Use Cases:

See Operators & STS slide deck.

Out of Scope:

handling OLM-managed operator updates in which AWS IAM permission requirements might change from one version to another (which requires user awareness and intervention)

Background:

The CloudCredentialsOperator already provides a powerful API for OpenShift's cluster core operator to request credentials and acquire them via short-lived tokens. This capability should be expanded to OLM-managed operators, specifically to Red Hat layered products that interact with AWS APIs. The process today is cumbersome to none-existent based on the operator in question and seen as an adoption blocker of OpenShift on AWS.

Customer Considerations

This is particularly important for ROSA customers. Customers are expected to be asked to pre-create the required IAM roles outside of OpenShift, which is deemed acceptable.

Documentation Considerations

Internal documentation needs to exists to guide Red Hat operator developer teams on the requirements and proposed implementation of integration with CCO and the proposed flow
External documentation needs to exist to guide users on:
- how to become aware that the cluster is in STS mode
- how to become aware of operators that support STS and the proposed CCO flow
- how to become aware of the IAM permissions requirements of these operators
- how to configure an operator in the proposed flow to interact with CCO

Interoperability Considerations

this needs to work with ROSA
this needs to work with self-managed OCP on AWS

Epic CCO-286: STS Enablement for layered products (OLM operators)

View the Description

Market Problem

This Section: High-Level description of the Market Problem ie: Executive Summary

As a customer of OpenShift layered products, I need to be able to fluidly, reliably and consistently install and use OpenShift layered product Kubernetes Operators into my ROSA STS clusters, while keeping a STS workflow throughout.
As a customer of OpenShift on the big cloud providers, overall I expect OpenShift as a platform to function equally well with tokenized cloud auth as it does with "mint-mode" IAM credentials. I expect the same from the Kubernetes Operators under the Red Hat brand (that need to reach cloud APIs) in that tokenized workflows are equally integrated and workable as with "mint-mode" IAM credentials.
As the managed services, including Hypershift teams, offering a downstream opinionated, supported and managed lifecycle of OpenShift (in the forms of ROSA, ARO, OSD on GCP, Hypershift, etc), the OpenShift platform should have as close as possible, native integration with core platform operators when clusters use tokenized cloud auth, driving the use of layered products.
.
As the Hypershift team, where the only credential mode for clusters/customers is STS (on AWS) , the Red Hat branded Operators that must reach the AWS API, should be enabled to work with STS credentials in a consistent, and automated fashion that allows customer to use those operators as easily as possible, driving the use of layered products.

Why it Matters

Adding consistent, automated layered product integrations to OpenShift would provide great added value to OpenShift as a platform, and its downstream offerings in Managed Cloud Services and related offerings.
Enabling Kuberenetes Operators (at first, Red Hat ones) on OpenShift for the "big3" cloud providers is a key differentiation and security requirement that our customers have been and continue to demand.
HyperShift is an STS-only architecture, which means that if our layered offerings via Operators cannot easily work with STS, then it would be blocking us from our broad product adoption goals.

Illustrative User Stories or Scenarios

Main success scenario - high-level user story
1. customer creates a ROSA STS or Hypershift cluster (AWS)
2. customer wants basic (table-stakes) features such as AWS EFS or RHODS or Logging
3. customer sees necessary tasks for preparing for the operator in OperatorHub from their cluster
4. customer prepares AWS IAM/STS roles/policies in anticipation of the Operator they want, using what they get from OperatorHub
5. customer's provides a very minimal set of parameters (AWS ARN of role(s) with policy) to the Operator's OperatorHub page
6. The cluster can automatically setup the Operator, using the provided tokenized credentials and the Operator functions as expected
7. Cluster and Operator upgrades are taken into account and automated
8. The above steps 1-7 should apply similarly for Google Cloud and Microsoft Azure Cloud, with their respective token-based workload identity systems.
Alternate flow/scenarios - high-level user stories
1. The same as above, but the ROSA CLI would assist with AWS role/policy management
2. The same as above, but the oc CLI would assist with cloud role/policy management (per respective cloud provider for the cluster)
...

Expected Outcomes

This Section: Articulates and defines the value proposition from a users point of view

See SDE-1868 as an example of what is needed, including design proposed, for current-day ROSA STS and by extension Hypershift.
Further research is required to accomodate the AWS STS equivalent systems of GCP and Azure
Order of priority at this time is
- 1. AWS STS for ROSA and ROSA via HyperShift
- 2. Microsoft Azure for ARO
- 3. Google Cloud for OpenShift Dedicated on GCP

Effect

This Section: Effect is the expected outcome within the market. There are two dimensions of outcomes; growth or retention. This represents part of the “why” statement for a feature.

Growth is the acquisition of net new usage of the platform. This can be new workloads not previously able to be supported, new markets not previously considered, or new end users not previously served.
Retention is maintaining and expanding existing use of the platform. This can be more effective use of tools, competitive pressures, and ease of use improvements.
Both of growth and retention are the effect of this effort.
- Customers have strict requirements around using only token-based cloud credential systems for workloads in their cloud accounts, which include OpenShift clusters in all forms.
  - We gain new customers from both those that have waited for token-based auth/auth from OpenShift and from those that are new to OpenShift, with strict requirements around cloud account access
  - We retain customers that are going thru both cloud-native and hybrid-cloud journeys that all inevitably see security requirements driving them towards token-based auth/auth.

References

Story CCO-367: Implement periodic CI testing capability for CCO

View the Description View the linked PRs

As an engineer I want the capability to implement CI test cases that run at different intervals, be it daily, weekly so as to ensure downstream operators that are dependent on certain capabilities are not negatively impacted if changes in systems CCO interacts with change behavior.

Acceptance Criteria:

Create a stubbed out e2e test path in CCO and matching e2e calling code in release such that there exists a path to tests that verify working in an AWS STS workflow.

https://github.com/openshift/origin/pull/27887

Feature OCPSTRAT-173: OC mirror enhancements

View the Description

OC mirror is GA product as of Openshift 4.11 .

The goal of this feature is to solve any future customer request for new features or capabilities in OC mirror

Epic CFE-626: oc-mirror OCI FBC feature GA

View the Description

In 4.12 release, a new feature was introduced to oc-mirror allowing it to use OCI FBC catalogs as starting point for mirroring operators.

Overview

As a oc-mirror user, I would like the OCI FBC feature to be stable
so that I can use it in a production ready environment
and to make the new feature and all existing features of oc-mirror seamless

Current Status

This feature is ring-fenced in the oc mirror repository, it uses the following flags to achieve this so as not to cause any breaking changes in the current oc-mirror functionality.

--use-oci-feature
--oci-feature-action (copy or mirror)
--oci-registries-config

The OCI FBC (file base container) format has been delivered for Tech Preview in 4.12

Tech Enablement slides can be found here https://docs.google.com/presentation/d/1jossypQureBHGUyD-dezHM4JQoTWPYwiVCM3NlANxn0/edit#slide=id.g175a240206d_0_7

Design doc is in https://docs.google.com/document/d/1-TESqErOjxxWVPCbhQUfnT3XezG2898fEREuhGena5Q/edit#heading=h.r57m6kfc2cwt (also contains latest design discussions around the stories of this epic)

Link to previous working epic https://issues.redhat.com/browse/CFE-538

Contacts for the OCI FBC feature

Story CFE-658: As an end user I want oc-mirror to respect filtering by channels for OCI FBC images in Operators and Platform

View the linked PRs

https://github.com/openshift/oc-mirror/pull/627

Feature OCPSTRAT-187: Assisted Installer support for IBM zSystems - z/VM

View the Description

Feature Overview (aka. Goal Summary)

The OpenShift Assisted Installer is a user-friendly OpenShift installation solution for the various platforms, but focused on bare metal. This very useful functionality should be made available for the IBM zSystem platform.

Goals (aka. expected user outcomes)

Use of the OpenShift Assisted Installer to install OpenShift on an IBM zSystem

Requirements (aka. Acceptance Criteria):

Using the OpenShift Assisted Installer to install OpenShift on an IBM zSystem

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic MULTIARCH-3415: Assisted Installer Enhancement - z/VM

View the Description

As a multi-arch development engineer, I would like to ensure that the Assisted Installer workflow is fully functional and supported for z/VM deployments.

Acceptance Criteria

Feature is implemented, tested, QE, documented, and technically enabled.
Stories closed.

Bug MGMT-14750: [s390x][UI]: FCP devices (zVM) are not handled correctly.

View the Description View the linked PRs

Description of the problem:

Using FCP (multipath) devices for zVM node
parmline:

rd.neednet=1 console=ttysclp0 coreos.live.rootfs_url=http://172.23.236.156:8080/assisted-installer/rootfs.img ip=10.14.6.8::10.14.6.1:255.255.255.0:master-0:encbdd0:none nameserver=10.14.6.1 ip=[fd00::8]::[fd00::1]:64::encbdd0:none nameserver=[fd00::1] zfcp.allow_lun_scan=0 rd.znet=qeth,0.0.bdd0,0.0.bdd1,0.0.bdd2,layer2=1 rd.zfcp=0.0.8007,0x500507630400d1e3,0x4000401e00000000 rd.zfcp=0.0.8107,0x50050763040851e3,0x4000401e00000000 random.trust_cpu=on rd.luks.options=discard ignition.firstboot ignition.platform.id=metal console=tty1 console=ttyS1,115200n8

shows disk limitation error in the UI.

How reproducible:

Attach two FCP devices to a zVM node. Create a cluster and boot zVM node into discovery service. Host discovery panel shows an error for discovered host.

Steps to reproduce:

1. Attach two FCP devices to the zVM.

2. Create new cluster using the AI UI and configure discovery image

3. Boot zVM node

4. Waiting until node is showing up on the Host discovery panel.

5. FCP devices are not recognized as valid option

Actual results:

FCP devices can't be used as installable disk

Expected results:
FCP device can be used for installation (multipath must be activated after installation:
https://docs.openshift.com/container-platform/4.13/post_installation_configuration/ibmz-post-install.html#enabling-multipathing-fcp-luns_post-install-configure-additional-devices-ibmz)

https://github.com/openshift/assisted-service/pull/5269

Bug MGMT-14992: [BE] s390x: Regression-staging: Minimal ISO set as default

View the Description View the linked PRs

Discovering an regression on staging where default is set to minimal ISO preventing installation of OCP 4.13 for s390x architecture.

See following older bugs addressing the same issue I guess

MGMT-14298

https://github.com/openshift/assisted-service/pull/5302

Bug MGMT-14751: s390x: DASD devices are not recognized correctly (not supported)

View the Description View the linked PRs

Description of the problem:

Using DASD devices are not recognized correctly if attached and used for a zVM node.
<see attached screenshot>

Attach two FCP devices to a zVM node. Create a cluster and boot zVM node into discovery service. Host discovery panel shows an error for discovered host.

Steps to reproduce:

1. Attach two DASD devices to the zVM.

2. Create new cluster using the AI UI and configure discovery image

3. Boot zVM node

4. Waiting until node is showing up on the Host discovery panel.

5. DASD devices are not recognized as valid option

Actual results:

DASD devices can't be used as installable disk

Expected results:
DASD device can be used for installation. User can choose the on which device AI will install to.

https://github.com/openshift/assisted-installer-agent/pull/549

Feature OCPSTRAT-213: External load balancers with OpenStack IPI (GA)

View the Description

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy.

Scenarios

A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

Should we require the support of migration from internal to external LB?
CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)

Dependencies (internal and external)

Fixed IPs would be very interesting to support, already WIP by vsphere (need to Spike on this): https://issues.redhat.com/browse/OCPBU-179
Confirm with customers that they are ok with external LB or they prefer a new internal LB that supports BGP

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic OSASINFRA-3101: External load balancers with OpenStack IPI (GA)

View the Description

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy.

Scenarios

A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

Should we require the support of migration from internal to external LB?
CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)

Dependencies (internal and external)

Fixed IPs would be very interesting to support, already WIP by vsphere (need to Spike on this): https://issues.redhat.com/browse/OCPBU-179
Confirm with customers that they are ok with external LB or they prefer a new internal LB that supports BGP

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Notes: https://github.com/EmilienM/ansible-role-routed-lb is an example of a LB that will be used for CI, can be used by QE and customers.

Feature OCPSTRAT-213: External load balancers with OpenStack IPI (GA)

View the Description

Epic Goal

As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers

Why is this important?

Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy.

Scenarios

A large deployment routed across multiple failure domains without stretched L2 networks, would require to dynamically route the control plane VIP traffic through load-balancers capable of living in multiple L2.
Customers who want to use their existing LB appliances for the control plane.

Acceptance Criteria

Should we require the support of migration from internal to external LB?
CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
QE - must be testing a scenario where we disable the internal LB and setup an external LB and OCP deployment is running fine.
Documentation - we need to document all the gotchas regarding this type of deployment, even the specifics about the load-balancer itself (routing policy, dynamic routing, etc)

Dependencies (internal and external)

Fixed IPs would be very interesting to support, already WIP by vsphere (need to Spike on this): https://issues.redhat.com/browse/OCPBU-179
Confirm with customers that they are ok with external LB or they prefer a new internal LB that supports BGP

Previous Work:

vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task OSASINFRA-3153: installer - move loadBalancer to GA for OpenStack

View the linked PRs

https://github.com/openshift/installer/pull/7127

Feature OCPSTRAT-222: Add support to AWS Local Zones (Phase I)

View the Description

Feature Overview

Support OpenShift to be deployed on AWS Local Zones

Goals

Support OpenShift to be deployed from day-0 on AWS Local Zones
Support an existing OpenShift cluster to deploy compute Nodes on AWS Local Zones (day-2)

AWS Local Zones support - feature delivered in phases:

Phase 0 (~~OCPPLAN-9630~~): Document how to create compute nodes on AWS Local Zones in day-0 (~~SPLAT-635~~)
Phase 1 ( ~~OCPBU-2~~): Create edge compute pool to generate MachineSets for node with NoSchedule taints when installing a cluster in existing VPC with AWS Local Zone subnets (~~SPLAT-636~~)
Phase 2 (~~OCPBU-351~~): Installer automates network resources creation on Local Zone based on the edge compute pool (~~SPLAT-657~~)

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Epic OSDOCS-5238: OCPBU-21 Add support to AWS Local Zones (Phase 1)

View the Description

<!--

Please make sure to fill all story details here with enough information so
that it can be properly sized and is immediately actionable. Our Definition
of Ready for user stories is detailed in the link below:

https://docs.google.com/document/d/1Ps9hWl6ymuLOAhX_-usLmZIP4pQ8PWO15tMksh0Lb_A/

As much as possible, make sure this story represents a small chunk of work
that could be delivered within a sprint. If not, consider the possibility
of splitting it or turning it into an epic with smaller related stories.

Before submitting it, please make sure to remove all comments like this one.

-->

{}USER STORY:{}

<!--

One sentence describing this story from an end-user perspective.

-->

As a [type of user], I want [an action] so that [a benefit/a value].

{}DESCRIPTION:{}

<!--

Provide as many details as possible, so that any team member can pick it up
and start to work on it immediately without having to reach out to you.

-->

{}Required:{}

...

{}Nice to have:{}

...

{}ACCEPTANCE CRITERIA:{}

<!--

Describe the goals that need to be achieved so that this story can be
considered complete. Note this will also help QE to write their acceptance
tests.

-->

{}ENGINEERING DETAILS:{}

<!--

Any additional information that might be useful for engineers: related
repositories or pull requests, related email threads, GitHub issues or
other online discussions, how to set up any required accounts and/or
environments if applicable, and so on.

-->

Story OSDOCS-5240: SPLAT-636 AWS Local Zones - Phase 1 - Installer support to automatically create the MachineSets for the "edge" Nodes

View the linked PRs

https://github.com/openshift/installer/pull/6993

Feature OCPSTRAT-233: HyperShift CI & E2E Enablement

View the Description

Feature Overview

Testing is one of the main pillars of production-grade software. It helps validate and flag issues early on before the code is shipped into productive landscapes. Code changes no matter how small they are might lead to bugs and outages, the best way to validate bugs is to write proper tests, and to run those tests we need to have a foundation for a test infrastructure, finally, to close the circle, automation of these tests and their corresponding build help reduce errors and save a lot of time.

Goal(s)

How do we get infrastructure, what infrastructure accounts are required?
Build e2e integration with openshift-release on AWS.
Define MVP CI Jobs to validate (e.g., conformance). What tests are failing, are we skipping any? why?

Note: Sync with the Developer productivity teams might be required to understand infra requirements especially for our first HyperShift infrastructure backend, AWS.

Epic HOSTEDCP-199: CI e2e test matrix

View the Description

Context:

This is a placeholder epic to capture all the e2e scenarios that we want to test in CI in the long term. Anything which is a TODO here should at minimum be validated by QE as it is developed.

DoD:

Every supported scenario is e2e CI tested.

Scenarios:

Hypershift deployment with services as routes.

Hypershift deployment with services as NodePorts.

Story HOSTEDCP-688: Refactor NodePool Upgrade E2E test

View the Description View the linked PRs

DoD:

Refactor the E2E tests following new pattern with 1 HostedCluster and targeted NodePools:

nodepool_upgrade_test.go

https://github.com/openshift/hypershift/pull/2256

Feature OCPSTRAT-241: Productize agent-installer-utils container

View the Description

Goal

Productize agent-installer-utils container from https://github.com/openshift/agent-installer-utils

Feature Description

In order to ship the network reconfiguration it would be useful to move the agent-tui to its own image instead of sharing the agent-installer-node-agent one.

Epic AGENT-508: Ship container image for agent-installer-utils

View the Description

Goal

Productize agent-installer-utils container from https://github.com/openshift/agent-installer-utils

Feature Description

In order to ship the network reconfiguration it would be useful to move the agent-tui to its own image instead of sharing the agent-installer-node-agent one.

Story AGENT-596: Extract agent-tui from agent-installer-utils when creating the agent ISO

View the Description View the linked PRs

Currently the `agent create image` command takes care to extract the agent-tui binary (and required libs) from the `assisted-installer-agent` image (shipped in the release as `agent-installer-node-agent`).
Once the agent-tui will be available instead from the `agent-installer-utils` image, it would be necessary to update accordingly the installer code (see https://github.com/openshift/installer/blob/56e85bee78490c18aaf33994e073cbc16181f66d/pkg/asset/agent/image/agentimage.go#L81)

https://github.com/openshift/installer/pull/7212

Story AGENT-594: Remove agent-tui binary from assisted-installer-agent

View the Description View the linked PRs

agent-tui is currently built and shipped using the assisted-installer-agent repo. Since it will be move into its own repository (agent-installer-utils), it's necessary to cleanup the previous code.

Remove agent-tui and nmstate dependencies from Docker image (https://github.com/openshift/assisted-installer-agent/blob/master/Dockerfile.ocp)
Remove source (and vendored code) (https://github.com/openshift/assisted-installer-agent/tree/master/src/agent_tui)

https://github.com/openshift/assisted-installer-agent/pull/563

Feature OCPSTRAT-245: Configure network interfaces interactively with agent-based installer

View the Description

Feature Overview

Allow users to interactively adjust the network configuration for a host after booting the agent ISO.

Goals

Configure network after host boots

The user has Static IPs, VLANs, and/or bonds to configure, but has no idea of the device names of the NICs. They don't enter any network config in agent-config.yaml. Instead they configure each host's network via the text console after it boots into the image.

Epic AGENT-385: Configure network interfaces interactively via the hardware console

View the Description

Epic Goal

Allow users to interactively adjust the network configuration for a host after booting the agent ISO, before starting processes that pull container images.

Why is this important?

Configuring the network prior to booting a host is difficult and error-prone. Not only is the nmstate syntax fairly arcane, but the advent of 'predictable' interface names means that interfaces retain the same name across reboots but it is nearly impossible to predict what they will be. Applying configuration to the correct hosts requires correct knowledge and input of MAC addresses. All of these present opportunities for things to go wrong, and when they do the user is forced to return to the beginning of the process and generate a new ISO, then boot all of the hosts in the cluster with it again.

Scenarios

The user has Static IPs, VLANs, and/or bonds to configure, but has no idea of the device names of the NICs. They don't enter any network config in agent-config.yaml. Instead they configure each host's network via the text console after it boots into the image.
The user has Static IPs, VLANs, and/or bonds to configure, but makes an error entering the configuration in agent-config.yaml so that (at least) one host will not be able to pull container images from the release payload. They correct the configuration for that host via the text console before proceeding with the installation.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Bug OCPBUGS-8390: agent-tui: display additional checks only when primary check fails

View the Description View the linked PRs

Currently the agent-tui displays always the additional checks (nslookup/ping/http get), even when the primary check (pull image) passes. This may cause some confusion to the user, due the fact that the additional checks do not prevent the agent-tui to complete successfully but they are just informative, to allow a better troubleshooting of the issue (so not needed in the positive case).

The additional checks should then be shown only when the primary check fails for any reason.

https://github.com/openshift/assisted-installer-agent/pull/512

Bug OCPBUGS-7262: Handle event messages generated when UI is active

View the Description View the linked PRs

When the UI is active in the console events messages that are generated will distort the interface and make it difficult for the user to view the configuration and select options. An example is shown in the attached screenshot.

https://github.com/openshift/installer/pull/6925

Bug OCPBUGS-8677: Allow the user to scroll the content of the agent-tui details view

View the Description View the linked PRs

When the agent-tui is shown during the initial host boot, if the pull release image check fails then an additional checks box is shown along with a details text view.
The content of the details view gets continuosly updated with the details of failed check, but the user cannot move the focus over the details box (using the arrow/tab keys), thus cannot scroll its content (using the up/down arrow keys)

https://github.com/openshift/assisted-installer-agent/pull/514

Task AGENT-502: Ship the agent-tui executable within the release payload

View the Description View the linked PRs

The openshift-install agent create image will need to fetch the agent-tui executable so that it could be embedded within the agent ISO. For this reason the agent-tui must be available in the release payload, so that it could be retrieved even when the command is invoked in a disconnected environment.

https://github.com/openshift/installer/pull/6978

Bug OCPBUGS-8094: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6919

Feature OCPSTRAT-254: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic SDN-3604: OpenShift North-South IPsec Implementation

View the Description

Epic Goal

Full support of North-South (cluster egress-ingress) IPsec that shares an encryption back-end with the current East-West implementation, allows for IPsec offload to capable SmartNICs, can be enabled and disabled at runtime, and allows for FIPS compliance (including install-time configuration and disabling of runtime configuration).

Why is this important?

Customers went end-to-end default encryption with external servers and/or clients.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Must allow for the possibility of offloading the IPsec encryption to a SmartNIC.

Dependencies (internal and external)

ITUP-44 - OpenShift support for North-South OVN IPSec
~~HATSTRAT-33~~ - Encrypt All Traffic to/from Cluster (aka IPSec as a Service)

Previous Work (Optional):

~~SDN-717~~ - Support IPSEC on ovn-kubernetes

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Bug OCPBUGS-18871: IPSec enablement is broken on OVNK

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17380~~. The following is the description of the original issue:
—
Description of problem:

Enable IPSec pre/post install on OVN IC cluster

$ oc patch networks.operator.openshift.io cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'
network.operator.openshift.io/cluster patched


ovn-ipsec containers complaining:

ovs-monitor-ipsec | ERR | Failed to import certificate into NSS.
b'certutil:  unable to open "/etc/openvswitch/keys/ipsec-cacert.pem" for reading (-5950, 2).\n'



$ oc rsh ovn-ipsec-d7rx9
Defaulted container "ovn-ipsec" out of: ovn-ipsec, ovn-keys (init)
sh-5.1# certutil -L -d /var/lib/ipsec/nss Certificate Nickname                                         Trust Attributes
                                                             SSL,S/MIME,JAR/XPIovs_certkey_db961f9a-7de4-4f1d-a2fb-a8306d4079c5             u,u,u 

sh-5.1# cat /var/log/openvswitch/libreswan.log
Aug  4 15:12:46.808394: Initializing NSS using read-write database "sql:/var/lib/ipsec/nss"
Aug  4 15:12:46.837350: FIPS Mode: NO
Aug  4 15:12:46.837370: NSS crypto library initialized
Aug  4 15:12:46.837387: FIPS mode disabled for pluto daemon
Aug  4 15:12:46.837390: FIPS HMAC integrity support [disabled]
Aug  4 15:12:46.837541: libcap-ng support [enabled]
Aug  4 15:12:46.837550: Linux audit support [enabled]
Aug  4 15:12:46.837576: Linux audit activated
Aug  4 15:12:46.837580: Starting Pluto (Libreswan Version 4.9 IKEv2 IKEv1 XFRM XFRMI esp-hw-offload FORK PTHREAD_SETSCHEDPRIO GCC_EXCEPTIONS NSS (IPsec profile) (NSS-KDF) DNSSEC SYSTEMD_WATCHDOG LABELED_IPSEC (SELINUX) SECCOMP LIBCAP_NG LINUX_AUDIT AUTH_PAM NETWORKMANAGER CURL(non-NSS) LDAP(non-NSS)) pid:147
Aug  4 15:12:46.837583: core dump dir: /run/pluto
Aug  4 15:12:46.837585: secrets file: /etc/ipsec.secrets
Aug  4 15:12:46.837587: leak-detective enabled
Aug  4 15:12:46.837589: NSS crypto [enabled]
Aug  4 15:12:46.837591: XAUTH PAM support [enabled]
Aug  4 15:12:46.837604: initializing libevent in pthreads mode: headers: 2.1.12-stable (2010c00); library: 2.1.12-stable (2010c00)
Aug  4 15:12:46.837664: NAT-Traversal support  [enabled]
Aug  4 15:12:46.837803: Encryption algorithms:
Aug  4 15:12:46.837814:   AES_CCM_16         {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm, aes_ccm_c
Aug  4 15:12:46.837820:   AES_CCM_12         {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm_b
Aug  4 15:12:46.837826:   AES_CCM_8          {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_ccm_a
Aug  4 15:12:46.837831:   3DES_CBC           [*192]         IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CBC)     3des
Aug  4 15:12:46.837837:   CAMELLIA_CTR       {256,192,*128} IKEv1:     ESP     IKEv2:     ESP                      
Aug  4 15:12:46.837843:   CAMELLIA_CBC       {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP          NSS(CBC)     camellia
Aug  4 15:12:46.837849:   AES_GCM_16         {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm, aes_gcm_c
Aug  4 15:12:46.837855:   AES_GCM_12         {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm_b
Aug  4 15:12:46.837861:   AES_GCM_8          {256,192,*128} IKEv1:     ESP     IKEv2: IKE ESP     FIPS NSS(GCM)     aes_gcm_a
Aug  4 15:12:46.837867:   AES_CTR            {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CTR)     aesctr
Aug  4 15:12:46.837872:   AES_CBC            {256,192,*128} IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS NSS(CBC)     aes
Aug  4 15:12:46.837878:   NULL_AUTH_AES_GMAC {256,192,*128} IKEv1:     ESP     IKEv2:     ESP     FIPS              aes_gmac
Aug  4 15:12:46.837883:   NULL               []             IKEv1:     ESP     IKEv2:     ESP                      
Aug  4 15:12:46.837889:   CHACHA20_POLY1305  [*256]         IKEv1:             IKEv2: IKE ESP          NSS(AEAD)    chacha20poly1305
Aug  4 15:12:46.837892: Hash algorithms:
Aug  4 15:12:46.837896:   MD5                               IKEv1: IKE         IKEv2:                  NSS         
Aug  4 15:12:46.837901:   SHA1                              IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha
Aug  4 15:12:46.837906:   SHA2_256                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha2, sha256
Aug  4 15:12:46.837910:   SHA2_384                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha384
Aug  4 15:12:46.837915:   SHA2_512                          IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha512
Aug  4 15:12:46.837919:   IDENTITY                          IKEv1:             IKEv2:             FIPS             
Aug  4 15:12:46.837922: PRF algorithms:
Aug  4 15:12:46.837927:   HMAC_MD5                          IKEv1: IKE         IKEv2: IKE              native(HMAC) md5
Aug  4 15:12:46.837931:   HMAC_SHA1                         IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha, sha1
Aug  4 15:12:46.837936:   HMAC_SHA2_256                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha2, sha256, sha2_256
Aug  4 15:12:46.837950:   HMAC_SHA2_384                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha384, sha2_384
Aug  4 15:12:46.837955:   HMAC_SHA2_512                     IKEv1: IKE         IKEv2: IKE         FIPS NSS          sha512, sha2_512
Aug  4 15:12:46.837959:   AES_XCBC                          IKEv1:             IKEv2: IKE              native(XCBC) aes128_xcbc
Aug  4 15:12:46.837962: Integrity algorithms:
Aug  4 15:12:46.837966:   HMAC_MD5_96                       IKEv1: IKE ESP AH  IKEv2: IKE ESP AH       native(HMAC) md5, hmac_md5
Aug  4 15:12:46.837984:   HMAC_SHA1_96                      IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha, sha1, sha1_96, hmac_sha1
Aug  4 15:12:46.837995:   HMAC_SHA2_512_256                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha512, sha2_512, sha2_512_256, hmac_sha2_512
Aug  4 15:12:46.837999:   HMAC_SHA2_384_192                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha384, sha2_384, sha2_384_192, hmac_sha2_384
Aug  4 15:12:46.838005:   HMAC_SHA2_256_128                 IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS          sha2, sha256, sha2_256, sha2_256_128, hmac_sha2_256
Aug  4 15:12:46.838008:   HMAC_SHA2_256_TRUNCBUG            IKEv1:     ESP AH  IKEv2:         AH                   
Aug  4 15:12:46.838014:   AES_XCBC_96                       IKEv1:     ESP AH  IKEv2: IKE ESP AH       native(XCBC) aes_xcbc, aes128_xcbc, aes128_xcbc_96
Aug  4 15:12:46.838018:   AES_CMAC_96                       IKEv1:     ESP AH  IKEv2:     ESP AH  FIPS              aes_cmac
Aug  4 15:12:46.838023:   NONE                              IKEv1:     ESP     IKEv2: IKE ESP     FIPS              null
Aug  4 15:12:46.838026: DH algorithms:
Aug  4 15:12:46.838031:   NONE                              IKEv1:             IKEv2: IKE ESP AH  FIPS NSS(MODP)    null, dh0
Aug  4 15:12:46.838035:   MODP1536                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH       NSS(MODP)    dh5
Aug  4 15:12:46.838039:   MODP2048                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh14
Aug  4 15:12:46.838044:   MODP3072                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh15
Aug  4 15:12:46.838048:   MODP4096                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh16
Aug  4 15:12:46.838053:   MODP6144                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh17
Aug  4 15:12:46.838057:   MODP8192                          IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS NSS(MODP)    dh18
Aug  4 15:12:46.838061:   DH19                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_256, ecp256
Aug  4 15:12:46.838066:   DH20                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_384, ecp384
Aug  4 15:12:46.838070:   DH21                              IKEv1: IKE         IKEv2: IKE ESP AH  FIPS NSS(ECP)     ecp_521, ecp521
Aug  4 15:12:46.838074:   DH31                              IKEv1: IKE         IKEv2: IKE ESP AH       NSS(ECP)     curve25519
Aug  4 15:12:46.838077: IPCOMP algorithms:
Aug  4 15:12:46.838081:   DEFLATE                           IKEv1:     ESP AH  IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838085:   LZS                               IKEv1:             IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838089:   LZJH                              IKEv1:             IKEv2:     ESP AH  FIPS             
Aug  4 15:12:46.838093: testing CAMELLIA_CBC:
Aug  4 15:12:46.838096:   Camellia: 16 bytes with 128-bit key
Aug  4 15:12:46.838162:   Camellia: 16 bytes with 128-bit key
Aug  4 15:12:46.838201:   Camellia: 16 bytes with 256-bit key
Aug  4 15:12:46.838243:   Camellia: 16 bytes with 256-bit key
Aug  4 15:12:46.838280: testing AES_GCM_16:
Aug  4 15:12:46.838284:   empty string
Aug  4 15:12:46.838319:   one block
Aug  4 15:12:46.838352:   two blocks
Aug  4 15:12:46.838385:   two blocks with associated data
Aug  4 15:12:46.838424: testing AES_CTR:
Aug  4 15:12:46.838428:   Encrypting 16 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838464:   Encrypting 32 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838502:   Encrypting 36 octets using AES-CTR with 128-bit key
Aug  4 15:12:46.838541:   Encrypting 16 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838576:   Encrypting 32 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838613:   Encrypting 36 octets using AES-CTR with 192-bit key
Aug  4 15:12:46.838651:   Encrypting 16 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838687:   Encrypting 32 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838724:   Encrypting 36 octets using AES-CTR with 256-bit key
Aug  4 15:12:46.838763: testing AES_CBC:
Aug  4 15:12:46.838766:   Encrypting 16 bytes (1 block) using AES-CBC with 128-bit key
Aug  4 15:12:46.838801:   Encrypting 32 bytes (2 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838841:   Encrypting 48 bytes (3 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838881:   Encrypting 64 bytes (4 blocks) using AES-CBC with 128-bit key
Aug  4 15:12:46.838928: testing AES_XCBC:
Aug  4 15:12:46.838932:   RFC 3566 Test Case 1: AES-XCBC-MAC-96 with 0-byte input
Aug  4 15:12:46.839126:   RFC 3566 Test Case 2: AES-XCBC-MAC-96 with 3-byte input
Aug  4 15:12:46.839291:   RFC 3566 Test Case 3: AES-XCBC-MAC-96 with 16-byte input
Aug  4 15:12:46.839444:   RFC 3566 Test Case 4: AES-XCBC-MAC-96 with 20-byte input
Aug  4 15:12:46.839600:   RFC 3566 Test Case 5: AES-XCBC-MAC-96 with 32-byte input
Aug  4 15:12:46.839756:   RFC 3566 Test Case 6: AES-XCBC-MAC-96 with 34-byte input
Aug  4 15:12:46.839937:   RFC 3566 Test Case 7: AES-XCBC-MAC-96 with 1000-byte input
Aug  4 15:12:46.840373:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 16)
Aug  4 15:12:46.840529:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 10)
Aug  4 15:12:46.840698:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 18)
Aug  4 15:12:46.840990: testing HMAC_MD5:
Aug  4 15:12:46.840997:   RFC 2104: MD5_HMAC test 1
Aug  4 15:12:46.841200:   RFC 2104: MD5_HMAC test 2
Aug  4 15:12:46.841390:   RFC 2104: MD5_HMAC test 3
Aug  4 15:12:46.841582: testing HMAC_SHA1:
Aug  4 15:12:46.841585:   CAVP: IKEv2 key derivation with HMAC-SHA1
Aug  4 15:12:46.842055: 8 CPU cores online
Aug  4 15:12:46.842062: starting up 7 helper threads
Aug  4 15:12:46.842128: started thread for helper 0
Aug  4 15:12:46.842174: helper(1) seccomp security disabled for crypto helper 1
Aug  4 15:12:46.842188: started thread for helper 1
Aug  4 15:12:46.842219: helper(2) seccomp security disabled for crypto helper 2
Aug  4 15:12:46.842236: started thread for helper 2
Aug  4 15:12:46.842258: helper(3) seccomp security disabled for crypto helper 3
Aug  4 15:12:46.842269: started thread for helper 3
Aug  4 15:12:46.842296: helper(4) seccomp security disabled for crypto helper 4
Aug  4 15:12:46.842311: started thread for helper 4
Aug  4 15:12:46.842323: helper(5) seccomp security disabled for crypto helper 5
Aug  4 15:12:46.842346: started thread for helper 5
Aug  4 15:12:46.842369: helper(6) seccomp security disabled for crypto helper 6
Aug  4 15:12:46.842376: started thread for helper 6
Aug  4 15:12:46.842390: using Linux xfrm kernel support code on #1 SMP PREEMPT_DYNAMIC Thu Jul 20 09:11:28 EDT 2023
Aug  4 15:12:46.842393: helper(7) seccomp security disabled for crypto helper 7
Aug  4 15:12:46.842707: selinux support is NOT enabled.
Aug  4 15:12:46.842728: systemd watchdog not enabled - not sending watchdog keepalives
Aug  4 15:12:46.843813: seccomp security disabled
Aug  4 15:12:46.848083: listening for IKE messages
Aug  4 15:12:46.848252: Kernel supports NIC esp-hw-offload
Aug  4 15:12:46.848534: adding UDP interface ovn-k8s-mp0 10.129.0.2:500
Aug  4 15:12:46.848624: adding UDP interface ovn-k8s-mp0 10.129.0.2:4500
Aug  4 15:12:46.848654: adding UDP interface br-ex 169.254.169.2:500
Aug  4 15:12:46.848681: adding UDP interface br-ex 169.254.169.2:4500
Aug  4 15:12:46.848713: adding UDP interface br-ex 10.0.0.8:500
Aug  4 15:12:46.848740: adding UDP interface br-ex 10.0.0.8:4500
Aug  4 15:12:46.848767: adding UDP interface lo 127.0.0.1:500
Aug  4 15:12:46.848793: adding UDP interface lo 127.0.0.1:4500
Aug  4 15:12:46.848824: adding UDP interface lo [::1]:500
Aug  4 15:12:46.848853: adding UDP interface lo [::1]:4500
Aug  4 15:12:46.851160: loading secrets from "/etc/ipsec.secrets"
Aug  4 15:12:46.851214: no secrets filename matched "/etc/ipsec.d/*.secrets"
Aug  4 15:12:47.053369: loading secrets from "/etc/ipsec.secrets"

sh-4.4# tcpdump -i any esp
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes^C
0 packets capturedsh-5.1# ovn-nbctl --no-leader-only get nb_global . ipsec
false

Version-Release number of selected component (if applicable):

openshift/cluster-network-operator#1874

How reproducible:

Always

Steps to Reproduce:

1.Install OVN cluster and enable IPSec in runtime
2.
3.

Actual results:

no esp packets seen across the nodes

Expected results:

esp traffic should be seen across the nodes

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1997

Feature OCPSTRAT-277: OC mirror improvements

View the Description

OC mirror is GA product as of Openshift 4.11 .

The goal of this feature is to solve any future customer request for new features or capabilities in OC mirror

Epic CFE-793: oc-mirror 4.14 tracker

View the Description

Overview

This epic is a simple tracker epic for the proposed work and analysis for 4.14 delivery

Story CFE-825: As a oc-mirror user, I would like mirrored operator catalogs to have valid caches

View the Description View the linked PRs

As a oc-mirror user, I would like mirrored operator catalogs to have valid caches that reflect the contents of the catalog (configs folder) based on the filtering done in the ImageSetConfig for that catalog

so that the catalog image starts efficiently in a cluster.

Tasks:

white-out /tmp on all manifests (per platform)
Recreate the cache under /tmp/cache using
- extract the whole catalog
- use the opm binary included in the extracted catalog to call (command line)

opm serve /configs –-cache-dir /tmp/cache –-cache-only

Create a new layer from /configs and /tmp/cache
- the /tmp is compatible with all platforms
oc-mirror should not change the CMD nor ENTRYPOINT of the image
Rebuild catalog image up to the index (manifest list)

Acceptance criteria:

Run the catalog container with command opm serve <configDir> --cache-dir=<cacheDir> --cache-only --cache-enforce-integrity to verify the integrity of the cache
4.14 catalogs mirrored with oc-mirror v4.14 run correctly in a cluster
- when mirrored with mirrorToMirror workflow
- when mirrored with mirrorToMirror workflow with --include-oci-local-catalogs
- when mirrored with mirrorToDisk + diskToMirror workflow
4.14 catalogs mirrored with oc-mirror v4.14 use the pre-computed cache (not sure how to test this)
catalogs<= 4.13 mirrored with oc-mirror v4.14 run correctly in a cluster (this is not something we publish as supported)

https://github.com/openshift/oc-mirror/pull/651

Bug OCPBUGS-11922: [TELCO:CASE] Limit the nested repository path while mirroring the images using oc-mirror for those who cant have nested paths in their container registry

View the Description View the linked PRs

Description of problem:

Customer was able to limit the nested repository path with "oc adm catalog mirror" by using the argument "--max-components" but there is no alternate solution along with "oc-mirror" binary while we are suggesting to use "oc-mirror" binary for mirroring.for example:
Mirroring will work if we mirror like below
oc mirror --config=./imageset-config.yaml docker://registry.gitlab.com/xxx/yyy
Mirroring will fail with 401 unauthorized if we add one more nested path like below
oc mirror --config=./imageset-config.yaml docker://registry.gitlab.com/xxx/yyy/zzz

Version-Release number of selected component (if applicable):

How reproducible:

We can reproduce the issue by using a repository which is not supported deep nested paths

Steps to Reproduce:

1. Create a imageset to mirror any operator

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: ./oc-mirror-metadata
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.12
    packages:
    - name: local-storage-operator
      channels:
      - name: stable

2. Do the mirroring to a registry where its not supported deep nested repository path, Here its gitlab and its doesnt not support netsting beyond 3 levels deep.

oc mirror --config=./imageset-config.yaml docker://registry.gitlab.com/xxx/yyy/zzz

this mirroring will fail with 401 unauthorized error
 
3. if  try to mirror the same imageset by removing one path it will work without any issues, like below

oc mirror --config=./imageset-config.yaml docker://registry.gitlab.com/xxx/yyy

Actual results:

Expected results:

Need a alternative option of "--max-components" to limit the nested path in "oc-mirror"

Additional info:

Story CFE-865: Implement support a locally addressable filesystem representation of an operator catalog image as an input source in an ImageSetConfig

View the Description

Proposed title of this feature request

Achieve feature parity for recently introduced functionality for all modes of operation

Nature and description of the request

Currently there are gaps in functionality within oc mirror that we would like addressed.

1. Support oci: references within mirror.operators[].catalog in an ImageSetConfiguration when running in all modes of operation with the full functionality provided by oc mirror.

Currently oci: references such as the following are allowed only in limited circumstances:

mirror:
   operators:
   - catalog: oci:///tmp/oci/ocp11840
   - catalog: icr.io/cpopen/ibm-operator-catalog

Currently supported scenarios

Mirror to Mirror

In this mode of operation the images are fetched from the oci: reference rather than being pulled from a source docker image repository. These catalogs are processed through similar (yet different) mechanisms compared to docker image references. The end result in this scenario is that the catalog is potentially modified and images (i.e. catalog, bundle, related images, etc.) are pushed to their final docker image repository. This provides the full capabilities offered by oc mirror (e.g. catalog "filtering", image pruning, metadata manipulation to keep track of what has been mirrored, etc.)

Desired scenarios
In the following scenarios we would like oci: references to be processed in a similar way to how docker references are handled (as close as possible anyway given the different APIs involved). Ultimately we want oci: catalog references to provide the full set of functionality currently available for catalogs provided as a docker image reference. In other words we want full feature parity (e.g. catalog "filtering", image pruning, metadata manipulation to keep track of what has been mirrored, etc.)

Mirror to Disk

In this mode of operation the images are fetched from the oci: reference rather than being pulled from a docker image repository. These catalogs are processed through similar yet different mechanisms compared to docker image references. The end result of this scenario is that all mappings and catalogs are packaged into tar archives (i.e. the "imageset").

Disk to Mirror

In this mode of operation the tar archives (i.e. the "imageset") are processed via the "publish mechanism" which means unpacking the tar archives, processing the metadata, pruning images, rebuilding catalogs, and pushing images to their destination. In theory if the mirror-to-disk scenario is handled properly, then this mode should "just work".

Below the line was the original RFE for requesting the OCI feature and is only provided for reference.

Sub-task CFE-875: Implementation of mirrorToDisk

View the Description View the linked PRs

Overview

Design , code and implementation of the mirrorToDisk functionality

https://github.com/openshift/oc-mirror/pull/662

Story CFE-783: As IBM user, I'd like oc-mirror to support multi-arch catalogs

View the Description View the linked PRs

... so that I can use that along with the OCI FBC feature

https://github.com/openshift/oc-mirror/pull/611

Feature OCPSTRAT-281: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic NE-982: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story NE-1145: Add e2e test for ingress controller and router

View the linked PRs

https://github.com/openshift/cluster-ingress-operator/pull/872

Story NE-1140: Add logic in the cluster-ingress-operator to populate header env vars in the router deployment

View the linked PRs

https://github.com/openshift/cluster-ingress-operator/pull/872

Story NE-1141: Add Logic in the router code to set headers when headers are provided via Route API or IngressController API

View the linked PRs

https://github.com/openshift/router/pull/438

Feature OCPSTRAT-285: Upgrade OpenShift Router to HAProxy 2.6

View the Description

Goal:
As a cluster administrator, I want OpenShift to include a recent HAProxy version, so that I have the latest available performance and security fixes.

Description:
We should strive to follow upstream HAProxy releases by bumping the HAProxy version that we ship in OpenShift with every 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgent fix necessitates bumping to the latest HAProxy release. This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.

For OpenShift 4.13, this means bumping to 2.6.

Epic NE-680: [Tracking Upstream] Switch OpenShift router to HAProxy 2.6

View the Description

As a cluster administrator,

I want OpenShift to include a recent HAProxy version,

so that I have the latest available performance and security fixes.

We should strive to follow upstream HAProxy releases by bumping the HAProxy version that we ship in OpenShift with every 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgent fix necessitates bumping to the latest HAProxy release. This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.

For OpenShift 4.14, this means bumping to 2.6.

Story NE-1304: Bump to HAProxy 2.6.13

View the Description View the linked PRs

Bump the HAProxy version in dist-git so that OCP 4.13 ships HAProxy 2.6.13, with this patch added on top: https://git.haproxy.org/?p=haproxy-2.6.git;a=commit;h=2b0aafdc92f691bc4b987300c9001a7cc3fb8d08. The patch fixes the segfault that was being tracked as ~~OCPBUGS-13232~~.

This patch is in HAProxy 2.6.14, so we can stop carrying the patch once we bump to HAProxy 2.6.14 or newer in a subsequent OCP release.

Feature OCPSTRAT-295: Offline network-bound disk encryption provisioning

View the Description

Feature Overview (aka. Goal Summary)

Tang-enforced, network-bound disk encryption has been available in OpenShift for some time, but all intended Tang-endpoints contributing unique key material to the process must be reachable during RHEL CoreOS provisioning in order to complete deployment.

If a user wants to require 3 of 6 tang servers be reachable than all 6 must be reachable during the provisioning process. This might not be possible due to maintenance, outage, or simply network policy during deployment.

Enabling offline provisioning for first boot will help all of these scenarios.

Goals (aka. expected user outcomes)

The user can now provision a cluster with some or none of the Tang servers being reachable on first boot. Second boot, of course, will be subject to the Tang requirements being configured.

Epic MCO-587: Tang offline provisioning support

View the Description

Done when:

Ignition spec default has been updated to 3.4
reconcile field (dependent on ignition 3.4)
consider Tang rotation? (write another epic)

Story MCO-588: Update ignition spec to 3.4

View the Description View the linked PRs

This requires messy/complex work of grepping through for prior references to ignition and updating golang types that reference other versions.

Assumption that existing tests are sufficient to catch discrepancies.

https://github.com/openshift/machine-config-operator/pull/3814

Feature OCPSTRAT-323: Install on vSphere using an existing OpenShift 4 OVA image in vSphere

View the Description

Goal

Allow to point to an existing OVA image stored in vSphere from the OpenShift installer, replacing the current method that uploads the OVA template every time an OpenShift cluster is installed.

Why is this important?

This is an improvement that makes the installation more efficient by not having to upload an OVA from where openshift-install is running every time a cluster is installed, saving time and bandwidth use. For example if an administrating is installing from a VPN then the OVA is upload through it to the target cluster every time an OpenShift cluster is installed. This makes the administration process more efficient by having a OVA centralised ready to use to install new clusters without uploading it from where the installer is run.

Epic SPLAT-995: Allow the use of pre-existing RHCOS virtual machine or template in installer platform spec

View the Description View the linked PRs

Epic Goal

To allow the use of a pre-existing RHCOS virtual machine or template via the IPI installer.

Why is this important?

It is a very common workflow in vSphere to upload a OVA. In the disconnected scenario the requirement of using a local web server, copying an ova to that webserver and then running the installer is a poor experience.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/installer/pull/6995

Feature OCPSTRAT-339: [Phase 2] Add a new platform type ("external") to identify clusters with non-integrated partner components enabled

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Feature Goal

Enable platform=external to support onboarding new partners, e.g. Oracle Cloud Infrastructure and VCSP partners.
Create a new platform type, working name "External", that will signify when a cluster is deployed on a partner infrastructure where core cluster components have been replaced by the partner. “External” is different from our current platform types in that it will signal that the infrastructure is specifically not “None” or any of the known providers (eg AWS, GCP, etc). This will allow infrastructure partners to clearly designate when their OpenShift deployments contain components that replace the core Red Hat components.

This work will require updates to the core OpenShift API repository to add the new platform type, and then a distribution of this change to all components that use the platform type information. For components that partners might replace, per-component action will need to be taken, with the project team's guidance, to ensure that the component properly handles the "External" platform. These changes will look slightly different for each component.

To integrate these changes more easily into OpenShift, it is possible to take a multi-phase approach which could be spread over a release boundary (eg phase 1 is done in 4.X, phase 2 is done in 4.X+1).

~~OCPBU-5~~: Phase 1

Write platform “External” enhancement.
Evaluate changes to cluster capability annotations to ensure coverage for all replaceable components.
Meet with component teams to plan specific changes that will allow for supplement or replacement under platform "External".
Start implementing changes towards Phase 2.

~~OCPBU-510~~: Phase 2

Update OpenShift API with new platform and ensure all components have updated dependencies.
Update capabilities API to include coverage for all replaceable components.
Ensure all Red Hat operators tolerate the "External" platform and treat it the same as "None" platform.

OCPBU-329: Phase.Next

Why is this important?

As partners begin to supplement OpenShift's core functionality with their own platform specific components, having a way to recognize clusters that are in this state helps Red Hat created components to know when they should expect their functionality to be replaced or supplemented. Adding a new platform type is a significant data point that will allow Red Hat components to understand the cluster configuration and make any specific adjustments to their operation while a partner's component may be performing a similar duty.
The new platform type also helps with support to give a clear signal that a cluster has modifications to its core components that might require additional interaction with the partner instead of Red Hat. When combined with the cluster capabilities configuration, the platform "External" can be used to positively identify when a cluster is being supplemented by a partner, and which components are being supplemented or replaced.

Scenarios

A partner wishes to replace the Machine controller with a custom version that they have written for their infrastructure. Setting the platform to "External" and advertising the Machine API capability gives a clear signal to the Red Hat created Machine API components that they should start the infrastructure generic controllers but not start a Machine controller.
A partner wishes to add their own Cloud Controller Manager (CCM) written for their infrastructure. Setting the platform to "External" and advertising the CCM capability gives a clear to the Red Hat created CCM operator that the cluster should be configured for an external CCM that will be managed outside the operator. Although the Red Hat operator will not provide this functionality, it will configure the cluster to expect a CCM.

Acceptance Criteria

Phase 1

Partners can read "External" platform enhancement and plan for their platform integrations.
Teams can view jira cards for component changes and capability updates and plan their work as appropriate.

Phase 2

Components running in cluster can detect the “External” platform through the Infrastructure config API
Components running in cluster react to “External” platform as if it is “None” platform
Partners can disable any of the platform specific components through the capabilities API

Phase 3

Components running in cluster react to the “External” platform based on their function.
- for example, the Machine API Operator needs to run a set of controllers that are platform agnostic when running in platform “External” mode.
- the specific component reactions are difficult to predict currently, this criteria could change based on the output of phase 1.

Dependencies (internal and external)

Previous Work (Optional):

Identifying OpenShift Components for Install Flexibility

Open questions::

Phase 1 requires talking with several component teams, the specific action that will be needed will depend on the needs of the specific component. At the least the components need to treat platform "External" as "None", but there could be more changes depending on the component (eg Machine API Operator running non-platform specific controllers).

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic OCPCLOUD-2011: Update External platform with CCM settings

View the Description

Epic Goal

Empower External platform type user to specify when they will run their own CCM

Why is this important?

For partners wishing to use components that require zonal awareness provided by the infrastructure (for example CSI drivers), they will need to exercise their own cloud controller managers. This epic is about adding the proper configuration to OpenShift to allow users of External platform types to run their own CCMs.

Scenarios

As a Red Hat partner, I would like to deploy OpenShift with my own CSI driver. To do this I need my CCM deployed as well. Having a way to instruct OpenShift to expect an external CCM deployment would allow me to do this.

Acceptance Criteria

CI - A new periodic test based on the External platform test would be ideal
Release Technical Enablement - Provide necessary release enablement details and documents.
- Update docs.ci.openshift.org with CCM docs

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPCLOUD-2010: Add CCM support to External platform API

View the Description View the linked PRs

User Story

As a Red Hat Partner installing OpenShift using the External platform type, I would like to install my own Cloud Controller Manager(CCM). Having a field in the Infrastructure configuration object to signal that I will install my own CCM and that Kubernetes should be configured to expect an external CCM will allow me to run my own CCM on new OpenShift deployments.

Background

This work has been defined in the External platform enhancement , and had previously been part of openshift/api . The CCM API pieces were removed for the 4.13 release of OpenShift to ensure that we did not ship unused portions of the API.

In addition to the API changes, library-go will need to have an update to the IsCloudProviderExternal function to detect the if the External platform is selected and if the CCM should be enabled for external mode.

We will also need to check the ObserveCloudVolumePlugin function to ensure that it is not affected by the external changes and that it continues to use the external volume plugin.

After updating openshift/library-go, it will need to be re-vendored into the MCO , KCMO , and CCCMO (although this is not as critical as the other 2).

Steps

update openshift/api with new CCM fields (re-revert #1409)
revendor api to library-go
update IsCloudProviderExternal in library-go to observe the new API fields
investigate ObserveCloudVolumePlugin to see if it requires changes
revendor library-go to MCO, KCMO, and CCCMO
update enhancement doc to reflect state

Stakeholders

openshift eng
oracle cloud install effort

Definition of Done

openshift can be installed with External platform type with kubelet, and related components, using the external cloud provider flags.

Docs

this will need to be documented in the API and as part of ~~OCPCLOUD-1581~~

Testing

this will need validation through unit test, integration testing may be difficult as we will need a new e2e built off the external platform with a ccm

Story OCPCLOUD-2036: Update installer to support External platform type

View the Description View the linked PRs

User Story

As a user I want to use the openshift installer to create clusters of platform type External so that I can use openshift more effectively on a partner provider platform.

Background

To fully support the External platform type for partners and users, it will be useful to be able to have the installer understand when it sees the external platform type in the install-config.yaml, and then to properly populate the resulting infrastructure config object with the external platform type and platform name.

As defined in https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go#L241 , the external platform type allows the user to specify a name for the platform. This card is about updating the installer so that a user can provide both the external type and a platform name that will be expressed in the infrastructure manifest.

Aside from this information, the installer should continue with a normal platform "None" installation.

Steps

update installer to allow platform "External" specified in the install-config.yaml
update installer to allow platform name to specified as part of the External platform configuration

Stakeholders

openshift cloud infra team
openshift installer team
openshift assisted installer team

Definition of Done

user can specify external platform in the install-config.yaml and have a cluster with External platform type and a name for the platform.
cluster installs as expected for platform external (similar to none)

Docs

the initial documentation for this will be contained in the developer generated docs defined by https://issues.redhat.com/browse/OCPCLOUD-1581

Testing

this feature should allow us to update our external platform tests to make the installation easier, tests should be updated to include this methodology

https://github.com/openshift/installer/pull/7217

Feature OCPSTRAT-35: Layering ON Cluster Build: Dev Preview

View the Description

Feature Overview

In the initial delivery of CoreOS Layering, it is required that administrators provide their own build environment to customize RHCOS images. That could be a traditional RHEL environment or potentially an enterprising administrator with some knowledge of OCP Builds could set theirs up on-cluster.

The primary virtue of an on-cluster build path is to continue using the cluster to manage the cluster. No external dependency, batteries-included.

MVP: bring the off-cluster build environment on-cluster

- Repo control
  - rpm-ostree needs repo management commands
- Entitlement management

Epic MCO-505: On Cluster Build: Dev Preview

View the Description

In the context of the Machine Config Operator (MCO) in Red Hat OpenShift, on-cluster builds refer to the process of building an OS image directly on the OpenShift cluster, rather than building them outside the cluster (such as on a local machine or continuous integration (CI) pipeline) and then making a configuration change so that the cluster uses them. By doing this, we enable cluster administrators to have more control over the contents and configuration of their clusters’ OS image through a familiar interface (MachineConfigs and in the future, Dockerfiles).

Story MCO-573: Wire up security/trust/pull secrets between rpm-ostree and internal registry

View the Description View the linked PRs

This is the "consumption" side of the security – rpm-ostree needs to be able to retrieve images from the internal registry seamlessly.

This will involve setting up (or using some existing) pull secrets, and then getting them to the proper location on disk so that rpm-ostree can use them to pull images.

https://github.com/openshift/machine-config-operator/pull/3806

Story MCO-729: Cluster admins need to be able to inject a custom Dockerfile

View the Description View the linked PRs

At the layering sync meeting on Thursday, August 10th, it was decided that for this to be considered ready for Dev / Tech Preview, cluster admins need a way to inject custom Dockerfiles into their on-cluster builds.

(Commentary: It was also decided 4 months ago that this was not an MVP requirement in https://docs.google.com/document/d/1QSsq0mCgOSUoKZ2TpCWjzrQpKfMUL9thUFBMaPxYSLY/edit#heading=h.jqagm7kwv0lg. And quite frankly, this requirement should have been known at that point in time as opposed to the week before tech preview.)

https://github.com/openshift/machine-config-operator/pull/3847

Story MCO-585: Land BuildController

View the Description View the linked PRs

The first phase of the layering effort involved creating a BuildController, whose job is to start and manage builds using the OpenShift Build API. We can use the work done to create the BuildController as the basis for our MVP. However, what we need from BuildController right now is less than BuildController currently provides. With that in mind, we need to remove certain parts of BuildController to create a more streamlined and simpler implementation ideal for an MVP.

Done when a version of BuildController is landed which does the following things:

Listens for all MachineConfigPool events. If a MachineConfigPool with a specific label or annotation (e.g., machineconfiguration.openshift.io/layering-enabled), the BuildController should retrieve the latest rendered MachineConfig associated with the MachineConfigPool, generate a series of inputs to a builder backend (for now, the OpenShift Build API can be the first backend), then update the MachineConfigPool with the outcome of that action. In the case of a successful build, the MachineConfigPool should be updated with the image pullspec for the newly-built image. For now, this can come in the form of an annotation or a label (e.g., machineconfiguration.openshift.io/desired-os-image = "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/coreos@sha256:abcdef1234567890...). But eventually, it should be a Status field on the MachineConfigPool object.
Reads from a ConfigMap which contains the following items (let's call it machine-os-builder-config for now):
- Name of the base OS image pull secret.
- Name of the final OS image push secret.
- Target container registry and org / repo information for where to push the final OS image (e.g., image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/coreos).

All functionality around managing ImageStreams and OpenShift Builds is removed or decoupled. In the case of the OpenShift Build functionality, it will be decoupled instead of completely removed. Additionally, it should not use BuildConfigs. It should instead create and manage image Build objects directly.
Use contexts for handling shutdowns and timeouts.
Unit tests are written for the major BuildController functionalities using either FakeClient or EnvTest.
The modified BuildController and its tests are merged into the master branch of the MCO. Note: This does not mean that it will be immediately active in the MCO's execution path. However, tests will be executed in CI.

https://github.com/openshift/machine-config-operator/pull/3731

Story MCO-564: Make other MCO components aware of BuildController

View the Description View the linked PRs

The second phase of the layering effort involved creating a BuildController, whose job is to start and manage builds of OS images. While it should be able to perform those functions on its own, getting the built OS image onto each of the cluster nodes involves modifying other parts of the MCO to be layering-aware. To that end, there are three pieces involve, some of which will require modification:

Render Controller

Right now, the render controller listens for incoming MachineConfig changes. It generates the rendered config which is comprised of all of the MachineConfigs for a given MachineConfigPool. Once rendered, the Render Controller updates the MachineConfigPool to point to the new config. This portion of the MCO will not likely need any modification that I'm aware of at the moment.

Node Controller

The Node Controller listens for MachineConfigPool config changes. Whenever it identifies that a change has occurred, it applies the machineconfiguration.openshift.io/desiredConfig annotation to all the nodes in the targeted MachineConfigPool which causes the Machine Config Daemon (MCD) to apply the new configs. With this new layering mechanism, we'll need to add the additional annotation of machineconfiguration.openshift.io/desiredOSimage which will contain the fully-qualified pullspec for the new OS image (referenced by the image SHA256 sum). To be clear, we will not be replacing the desiredConfig annotation with the desiredOSimage annotation; both will still be used. This will allow Config Drift Monitor to continue to function the way it does with no modification required.

Machine Config Daemon

Right now, the MCD listens to Node objects for changes to the machineconfiguration.openshift.io/desiredConfig annotation. With the new desiredOSimage annotation being present, the MCD will need to skip the parts of the update loop which write files and systemd units to disk. Instead, it will skip directly to the rpm-ostree application phase (after making sure the correct pull secrets are in place, etc.).

Done When:

The above modifications are made.
Each modification has been done with appropriate unit tests where feasible.

Story MCO-565: Create initial stub for Machine OS Builder binary

View the Description View the linked PRs

To speed development for on-cluster builds and avoid a lot of complex code paths, the decision was made to put all functionality related to building OS images and managing internal registries into a separate binary within the MCO.

Eventually, this binary will be responsible for running the productionized BuildController and know how to respond to Machine OS Builder API objects. However, until the productionized BuildController and opt-in portions are ready, the first pass of this binary will be much simpler: For now, it can connect to the API server and print a "Hello World".

Done When:

We have a new binary under cmd/machine-os-builder. This binary will be built alongside the current MCO components and will be baked into the MCO image.
The Dockerfile, Makefile, and build scripts will need some modification so that they how to build cmd/machine-os-builder.
A Deployment manifest is created under manifests/ which is set up to start up a single instance of the new binary though we don't want it to start up by default right now since it won't do anything useful.

Feature OCPSTRAT-350: Add support to AWS Local Zones (Phase II)

View the Description

Feature Overview

Support OpenShift to be deployed on AWS Local Zones

Goals

Support OpenShift to be deployed from day-0 on AWS Local Zones
Support an existing OpenShift cluster to deploy compute Nodes on AWS Local Zones (day-2)

AWS Local Zones support - feature delivered in phases:

Phase 0 (~~OCPPLAN-9630~~): Document how to create compute nodes on AWS Local Zones in day-0 (~~SPLAT-635~~)
Phase 1 ( ~~OCPBU-2~~): Create edge compute pool to generate MachineSets for node with NoSchedule taints when installing a cluster in existing VPC with AWS Local Zone subnets (~~SPLAT-636~~)
Phase 2 (~~OCPBU-351~~): Installer automates network resources creation on Local Zone based on the edge compute pool (~~SPLAT-657~~)

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Epic SPLAT-657: AWS Local Zones - Phase II - IPI automation - Installer support to create resources in Local Zone for edge pool

View the Description View the linked PRs

Epic Goal

Fully automated installation creating subnets in AWS Local Zones when the zone names are added to the edge compute pool on install-config.yaml.

The installer should create the subnets on the Local Zones according to the configuration of the "edge" compute pool, provided on install-config.yaml

Why is this important?

Users can extend the presence of worker nodes closer to the metropolitan regions, where the users or on-premises workloads are running, decreasing the time to deliver their workloads to their clients.

Scenarios

As a cluster admin, I would like to install OpenShift clusters, extending the compute nodes to the Local Zones in my day-zero operations without needing to set up the network and compute dependencies, so I can speed up the edge adoption in my organization using OCP.

Acceptance Criteria

CI - MUST be running successfully with tests automated.
CI - custom jobs should be added to test Local Zone provisioning
Release Technical Enablement - Provide necessary release enablement details and documents.
The PR on the installer repo should be merged after being approved by the Installer team, QE, and docs
The product documentation has been created

Dependencies (internal and external)

~~SPLAT-636~~ : install a cluster in existing VPC extending workers to Local Zones
~~OCPBUGSM-46513~~ : Bug - Ingress Controller should not add Local Zones subnets to network routers/LBs (Classic/NLB)

Previous Work (Optional):

Enhancement 1232
~~SPLAT-636~~ : AWS Local Zones - Phase 1 IPI edge pool - Installer support to automatically create the MachineSets when installing in existing VPC

Open questions:

Done Checklist

CI - CI is running, tests are automated and merged.
- Local Zones BYON: https://github.com/openshift/release/pull/39902
- Local Zones Full IPI: https://github.com/openshift/release/pull/40031
Release Enablement <link to Feature Enablement Presentation>
DEV - Enhancement Proposal merged: https://github.com/openshift/enhancements/pull/1232
DEV - Upstream code and tests merged: https://github.com/openshift/installer/pull/7137
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitems?query=trello%3ASPLAT%5C-657
QE - Automated tests merged: https://github.com/openshift/release/pull/41574
DOC - Downstream documentation merged: https://github.com/openshift/openshift-docs/pull/60771

https://github.com/openshift/installer/pull/7137

Feature OCPSTRAT-36: OpenShift Optional Capabilities (Phase 4)

View the Description

Feature Overview

As a Cluster Administrator, I want to opt-out of certain operators at deployment time using any of the supported installation methods (UPI, IPI, Assisted Installer, Agent-based Installer) from UI (e.g. OCP Console, OCM, Assisted Installer), CLI (e.g. oc, rosa), and API.

As a Cluster Administrator, I want to opt-in to previously-disabled operators (at deployment time) from UI (e.g. OCP Console, OCM, Assisted Installer), CLI (e.g. oc, rosa), and API.
As a ROSA service administrator, I want to exclude/disable Cluster Monitoring when I deploy OpenShift with HyperShift — using any of the supported installation methods including the ROSA wizard in OCM and rosa cli — since I get cluster metrics from the control plane. This configuration should be persisted through not only through initial deployment but also through cluster lifecycle operations like upgrades.
As a ROSA service administrator, I want to exclude/disable Ingress Operator when I deploy OpenShift with HyperShift — using any of the supported installation methods including the ROSA wizard in OCM and rosa cli — as I want to use my preferred load balancer (i.e. AWS load balancer). This configuration should be persisted through not only through initial deployment but also through cluster lifecycle operations like upgrades.

Goals

Make it possible for customers and Red Hat teams producing OCP distributions/topologies/experiences to enable/disable some CVO components while still keeping their cluster supported.

Scenarios

This feature must consider the different deployment footprints including self-managed and managed OpenShift, connected vs. disconnected (restricted and air-gapped), supported topologies (standard HA, compact cluster, SNO), etc.
Enabled/disabled configuration must persist throughout cluster lifecycle including upgrades.
If there's any risk/impact of data loss or service unavailability (for Day 2 operations), the System must provide guidance on what the risks are and let user decide if risk worth undertaking.

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer…

Out of Scope

Background, and strategic fit

This part of the overall multiple release Composable OpenShift (OCPPLAN-9638 effort), which is being delivered in multiple phases:

Phase 1 (OpenShift 4.11): ~~OCPPLAN-7589~~ Provide a way with CVO to allow disabling and enabling of operators

~~CORS-1873~~ Installer to allow users to select OpenShift components to be included/excluded
~~OTA-555~~ Provide a way with CVO to allow disabling and enabling of operators
~~OLM-2415~~ Make the marketplace operator optional
~~SO-11~~ Make samples operator optional
~~METAL-162~~ Make cluster baremetal operator optional
~~OCPPLAN-8286~~ CI Job for disabled optional capabilities

Phase 2 (OpenShift 4.12): ~~OCPPLAN-7589~~ Provide a way with CVO to allow disabling and enabling of operators

~~CONSOLE-3160~~ Make console operator optional
~~CCXDEV-8079~~ Make Insights operator optional
~~CNF-5645~~ Make storage operator optional
~~CNF-5646~~ Make csi-snapshot-controller optional

Phase 3 (OpenShift 4.13): ~~OCPBU-117~~

~~OTA-554~~ Make oc aware of cluster capabilities
~~PSAP-741~~ Make Node Tuning Operator (including PAO controllers) optional

Phase 4 (OpenShift 4.14): ~~OCPSTRAT-36~~ (formerly ~~OCPBU-236~~)

~~CCO-186~~ ccoctl support for credentialing optional capabilities
~~MCO-499~~ MCD should manage certificates via a separate, non-MC path (formerly ~~IR-230~~ Make node-ca managed by CVO)
~~CNF-5642~~ Make cluster autoscaler optional
~~CNF-5643~~ - Make machine-api operator optional
~~WRKLDS-695~~ - Make DeploymentConfig API + controller optional
~~~~CNV-16274~~ OpenShift Virtualization on the Red Hat Application Cloud (not applicable)~~
~~CNF-9115~~ - Leverage Composable OpenShift feature to make control-plane-machine-set optional

Phase 5 (OpenShift 4.15): ~~OCPSTRAT-421~~ (formerly) ~~OCPBU-519~~

~~OCPBU-352~~ Make Ingress Operator optional
~~BUILD-565~~ - Make Build v1 API + controller optional
OBSDA-242 Make Cluster Monitoring Operator optional
~~OCPVE-630~~ (formerly ~~CNF-5647~~) Leverage Composable OpenShift feature to make image-registry optional (replaces ~~IR-351~~ - Make Image Registry Operator optional)
~~CNF-9114~~ - Leverage Composable OpenShift feature to make olm optional
~~CNF-9118~~ - Leverage Composable OpenShift feature to make cloud-credential optional
~~CNF-9119~~ - Leverage Composable OpenShift feature to make cloud-controller-manager optional

Phase 6 (OpenShift 4.16): ~~OCPSTRAT-731~~

References

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic OTA-888: Make oc aware of cluster capabilities (Phase 2)

View the Description

Per https://github.com/openshift/enhancements/pull/922 we need `oc adm release new` to parse the resource manifests for `capability` annotations and generate a yaml file that lists the valid capability names, to embed in the release image.

This file can be used by the installer to error or warn when the install config lists capabilities for enable/disable that are not valid capability names.

Note: Moved the couple of cards from ~~OTA-554~~ to this epic as these cards are relatively less priority for 4.13 release and we could not mark these done.

Story OTA-559: oc adm release extract: provide option for extracting included manifests

View the Description View the linked PRs

oc adm release extract --included ... or some such, that only works when no release pullspec is given, where oc connects to the cluster to ask after the current release image (as it does today when you leave off a pullspec) but also collects FeatureGates and cluster profile and all that sort of stuff so it can write only the manifests it expects the CVO to be attempting to reconcile.

This would be narrowly useful for ccoctl (see ~~CCO-178~~ and ~~CCO-186~~), because with this extract option, ccoctl wouldn't need to try to reproduce "which of these CredentialsRequests manifests does the cluster actually want filled?" locally.

It also seems like it would be useful for anyone trying to get a better feel for what the CVO is up to in their cluster, for the same reason that it reduces distracting manifests that don't apply.

The downside is that if we screw up the inclusion logic, we could have oc diverging from the CVO, and end up increasing confusion instead of decreasing confusion. If we move the inclusion logic to library-go, that reduces the risk a bit, but there's always the possibility that users are using an oc that is older or newer than the cluster's CVO. Some way to have oc warn when the option is used but the version differs from the current CVO version would be useful, but possibly complicated to implement, unless we take shortcuts like assuming that the currently running CVO has a version matched to the ClusterVersion's status.desired target.

Definition of done (more details in the ~~OTA-692~~ spike comment):

Add a new --included flag to $ oc adm release extract --to <dir path> <pull-spec or version-number>. The --included flag filters extracted manifests to those that are expected to be included with the cluster.
- Move overrides handling here and here into library-go.

here is a sketch of code which W. Trevor King suggested

Story OTA-824: Add CI presubmits for critical 'oc adm release ...' pathways

View the Description View the linked PRs

While working on ~~OTA-559~~, my oc#1237 broke JSON output, and needed a follow-up fix. To avoid destabilizing folks who consume the dev-tip oc, we should grow CI presubmits to exercise critical oc adm release ... pathways, to avoid that kind of accidental breakage.

https://github.com/openshift/origin/pull/27822

Story OTA-994: Reduce duplication in 'oc adm release extract'

View the Description View the linked PRs

So it's easier to make adjustments without having to copy/paste code between branches.

https://github.com/openshift/oc/pull/1404

Epic IR-351: Make Image Registry Operator Optional

View the Description

Epic Goal

Add an optional capability that allows disabling the image registry operator entirely

Why is this important?

It is already possibly to run a cluster with no instantiated image registry, but the image registry operator itself always runs. This is an unnecessary use of resources for clusters that don't need/want a registry. Making it possible to disable this will reduce the resource footprint as well as bug risks for clusters that don't need it, such as SNO and OKE.

Acceptance Criteria

CI - MUST be running successfully with tests automated (we have an existing CI job that runs a cluster with all optional capabilities disabled. Passing that job will require disabling certain image registry tests when the cap is disabled)
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

~~MCO-499~~ must be completed first because we still need the CA management logic running even if the image registry operator is not running.

Previous Work (Optional):

The optional cap architecture and guidance for adding a new capability is described here: https://github.com/openshift/enhancements/blob/master/enhancements/installer/component-selection.md

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task IR-390: Expose registry CAs as one to MCO

View the Description View the linked PRs

To enable the MCO to replace the node-ca, the registry operator needs to provide its own CAs in isolation.

Currently, the registry provides its own CAs via the "image-registry-certificates" configmap. This configmap is a merge of the service ca, storage ca, and additionalTrustedCA (from images.config.openshift.io/cluster).

Because the MCO already has access to additionalTrustedCA, the new secret does not need to contain it.

ACCEPTANCE CRITERIA

TBD

https://github.com/openshift/cluster-image-registry-operator/pull/880

Feature OCPSTRAT-370: Update ETCD datastore encryption to use AES-GCM instead of AES-CBC

View the Description

Proposed title of this feature request:

Update ETCD datastore encryption to use AES-GCM instead of AES-CBC

2. What is the nature and description of the request?

The current ETCD datastore encryption solution uses the aes-cbc cipher. This cipher is now considered "weak" and is susceptible to padding oracle attack. Upstream recommends using the AES-GCM cipher. AES-GCM will require automation to rotate secrets for every 200k writes.

The cipher used is hard coded.

3. Why is this needed? (List the business requirements here).

Security conscious customers will not accept the presence and use of weak ciphers in an OpenShift cluster. Continuing to use the AES-CBC cipher will create friction in sales and, for existing customers, may result in OpenShift being blocked from being deployed in production.

4. List any affected packages or components.

Epic API-1509: APIserver encryption cipher for etcd

View the Description

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

The Kube APIserver is used to set the encryption of data stored in etcd. See https://docs.openshift.com/container-platform/4.11/security/encrypting-etcd.html

Today with OpenShift 4.11 or earlier, only aescbc is allowed as the encryption field type.

~~RFE-3095~~ is asking that aesgcm (which is an updated and more recent type) be supported. Furthermore ~~RFE-3338~~ is asking for more customizability which brings us to how we have implemented cipher customzation with tlsSecurityProfile. See https://docs.openshift.com/container-platform/4.11/security/tls-security-profiles.html

Why is this important? (mandatory)

AES-CBC is considered as a weak cipher

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Bug OCPBUGS-10041: Enable aesgcm encryption provider by default in openshift/cluster-authentication-operator

View the Description View the linked PRs

The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.

https://github.com/openshift/cluster-authentication-operator/pull/601

Bug OCPBUGS-8711: AES-GCM encryption at rest is not supported by kube-apiserver-operator

View the Description View the linked PRs

AES-GCM encryption was enabled in cluster-openshift-apiserver-operator and cluster-openshift-autenthication-operator, but not in the cluster-kube-apiserver-operator. When trying to enable aesgcm encryption in the apiserver config, the kas-operator will produce an error saying that the aesgcm provider is not supported.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1449

Bug OCPBUGS-10040: Enable aesgcm encryption provider by default in openshift/cluster-openshift-apiserver-operator

View the Description View the linked PRs

The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/526

Bug OCPBUGS-10037: Enable aesgcm encryption provider by default in openshift/cluster-config-operator

View the Description View the linked PRs

The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.

https://github.com/openshift/cluster-config-operator/pull/289

Bug OCPBUGS-10039: Enable aesgcm encryption provider by default in openshift/cluster-kube-apiserver-operator

View the Description View the linked PRs

The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1462

Feature OCPSTRAT-371: Platform external support in Agent-Based Installer (OCI only)

View the Description

Feature Overview

Support Platform external to allow installing with agent on OCI, with focus on https://www.oracle.com/cloud/cloud-at-customer/dedicated-region/faq/ for disconnected, on-prem.

Related / parent feature

~~OCPSTRAT-510~~ OpenShift on Oracle Cloud Infrastructure (OCI) with VMs

Epic AGENT-656: Platform external support in Agent-Based Installer (OCI only)

View the Description

Feature Overview

Support Platform external to allow installing with agent on OCI, with focus on https://www.oracle.com/cloud/cloud-at-customer/dedicated-region/faq/ for disconnected, on-prem

Story AGENT-702: Generate the minimal ISO in the installer when the platform type is set to external/oci

View the Description View the linked PRs

User Story:

As a user, I want to be able to:

generate the minimal ISO in the installer when the platform type is set to external/oci

so that I can achieve

successful cluster installation
any custom agent features such as network tui should be available when booting from minimal ISO

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7478

Story AGENT-694: Update assisted service to support external platform in agentClusterInstall

View the Description View the linked PRs

User Story:

As a user of the agent-based installer, I want to be able to:

validate the external platform type in the agent cluster install by providing the external platform type in the install-config.yaml

so that I can achieve

create agent artifacts ( ISO, PXE files)

Acceptance Criteria:

Description of criteria:

install-config.yaml accepts the new platform type "external"
agent-based installer validates the supported platforms
agent ISO and PXE assets should be created successfully
Required k8s API support is added

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/assisted-service/pull/5438

Story AGENT-693: Support external platform for ABI

View the Description View the linked PRs

User Story:

As a user of the agent-based installer, I want to be able to:

create agent ISO as well as PXE assets by providing the install-config.yaml

so that I can achieve

create a cluster for external cloud provider platform type (OCI)

Acceptance Criteria:

Description of criteria:

install-config.yaml accepts the new platform type "external"
validate install-config so that platformName can only be set to `oci` when platform is external
agent-based installer validates the supported platforms
agent ISO and PXE assets should be created successfully
necessary unit tests and integration tests are added

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7442

Feature OCPSTRAT-377: Support preexisting Route53 for Shared VPC clusters

View the Description

Feature Overview (aka. Goal Summary)

Support OpenShift installation in AWS Shared VPC [1] scenario where AWS infrastructure resources (at least the Private Hosted Zone) belong to an account separate from the cluster installation target account.

Goals (aka. expected user outcomes)

As a user I need to use a Shared VPC [1] when installing OpenShift on AWS into an existing VPC. Which will at least require the use of a preexisting Route53 hosted zone where I am not allowed the user "participant" of the shared VPC to automatically create Route53 private zones.

Requirements (aka. Acceptance Criteria):

The Installer is able to successfully deploy OpenShift on AWS with a Shared VPC [1], and the cluster is able to successfully pass osde2e testing. This will include at least the scenario when private hostedZone belongs to different account (Account A) than cluster resources (Account B)

[1] https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html

Epic CORS-2613: AWS: Cross-Account Shared VPC Installs

View the Description View the linked PRs

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Enable/confirm installation in AWS shared VPC scenario where Private Hosted Zone belongs to an account separate from the cluster installation target account

Why is this important?

AWS best practices suggest this setup

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/installer/pull/7225

Story CORS-2645: Adjust permissions checks for cross-account perms

View the Description View the linked PRs

User Story:

I want

the installer to check for appropriate permissions based on whether the installation is using an existing hosted zone and whether that hosted zone is in another account

so that I can

be sure that my credentials have sufficient and minimal permissions before beginning install

Acceptance Criteria:

Description of criteria:

When specifying platform.aws.hostedZoneRole. Route53:CreateHostedZone and Route53:DeleteHostedZone are not required

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/aws/permissions.go

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7253

Epic NE-1286: Ingress Operator support preexisting Route53 for Shared VPC clusters

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic —

Links:

Enhancement PR: https://github.com/openshift/enhancements/pull/1397

API PR: https://github.com/openshift/api/pull/1460

Ingress Operator PR: https://github.com/openshift/cluster-ingress-operator/pull/928

Background

Feature Goal: Support OpenShift installation in AWS Shared VPC scenario where AWS infrastructure resources (at least the Private Hosted Zone) belong to an account separate from the cluster installation target account.

The ingress operator is responsible for creating DNS records in AWS Route53 for cluster ingress. Prior to the implementation of this epic, the ingress operator doesn't have the capability to add DNS records into an existing Route 53 hosted zone in the shared VPC.

Epic Goal

Add support to the ingress operator for creating DNS records in preexisting Route53 private hosted zones for Shared VPC clusters

Non-Goals

Ingress operator support for day-2 operations (i.e. changes to the AWS IAM Role value after installation)
E2E testing (will be handled by the Installer Team)

Design

As described in the WIP PR https://github.com/openshift/cluster-ingress-operator/pull/928, the ingress operator will consume a new API field that contains the IAM Role ARN for configuring DNS records in the private hosted zone. If this field is present, then the ingress operator will use this account to create all private hosted zone records. The API fields will be described in the Enhancement PR.

The ingress operator code will accomplish this by defining a new provider implementation that wraps two other DNS providers, using one of them to publish records to the public zone and the other to publish records to the private zone.

External DNS Operator Impact

See ~~NE-1299~~

AWS Load Balancer Operator (ALBO) Impact

See ~~NE-1299~~

Why is this important?

Without this ingress operator support, OpenShift users are unable to create DNS records in a preexisting Route53 private hosted zone which means OpenShift users can't share the Route53 component with a Shared VPC
Shared VPCs are considers AWS best practice

Scenarios

Acceptance Criteria

Unit tests must be written and automatically run in CI (E2E tests will be handled by the Installer Team)
Release Technical Enablement - Provide necessary release enablement details and documents.
Ingress Operator creates DNS Records in preexisting Route53 private hosted zones for shared VPC Clusters
Network Edge Team has reviewed all of the related enhancements and code changes for Route53 in Shared VPC Clusters

Dependencies (internal and external)

Installer Team is adding the new API fields required for enabling sharing Route53 with in Shared VPCs in https://issues.redhat.com/browse/CORS-2613
Testing this epic requires having access to two AWS account

Previous Work (Optional):

Significant discussion was done in this thread: https://redhat-internal.slack.com/archives/C68TNFWA2/p1681997102492889?thread_ts=1681837202.378159&cid=C68TNFWA2

Slack channel #tmp-xcmbu-114

Open questions:

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story NE-1294: Update cluster-ingress-operator to support AWS shared VPC

View the Description View the linked PRs

Develop the implementation for supporting AWS Shared VPC pre-existing Route53 as it is described in the enhancement: https://github.com/openshift/enhancements/pull/1397

https://github.com/openshift/cluster-ingress-operator/pull/928

Feature OCPSTRAT-378: Secure token usage with oc client

View the Description

Feature Overview (aka. Goal Summary)

During oc login with a token, pasting the token on command line with oc login --token command is insecure. The token is logged in bash history, and appears in a "ps" command when ran precisely at the time the oc login command runs. Moreover, the token gets logged and is searchable by any sysadmin.

Customers/Users would like either the "--web" command, or a command that prompt for a token. There should be no way to pass a secret on a command line with --token command.

For environments where no web browser is available, a "--ask-token" option should be provided that prompts for a token instead of passing it on the command line.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic AUTH-349: Secure token usage with oc client while doing oc login

View the Description

Epic Goal*

Customers/Users would like either the "--web" command, or a command that prompt for a token. There should be no way to pass a secret on a command line with --token command.

For environments where no web browser is available, a "--ask-token" option should be provided that prompts for a token instead of passing it on the command line.

Why is this important? (mandatory)

Pasting the token on command line with oc login --token command is insecure

Scenarios (mandatory)

Customers/Users would like either the "--web" command. There should be no way to pass a secret on a command line with --token command.

For environments where no web browser is available, a "--ask-token" option should be provided that prompts for a token instead of passing it on the command line.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be "Release Pending"

Task AUTH-355: Add OAuth2 Authorization Code Grant Flow login to oc

View the Description View the linked PRs

In order to secure token usage during oc login, we need to add the capability to oc to login using the OAuth2 Authorization Code Grant Flow through a browser. This will be possible by providing a command line option to oc:

oc login --web

https://github.com/openshift/oc/pull/1402

Task AUTH-377: OSIN library E2E tests for redirect URI validation

View the Description View the linked PRs

Add e2e tests in the OSIN library for redirect URI validation without ports on non-loopback links.

https://github.com/openshift/origin/pull/27922

Task AUTH-356: Add a new OAuthClient to the CAO that obtains tokens through PKCE

View the Description View the linked PRs

In order for the OAuth2 Authorization Code Grant Flow to work in oc browser login, we need a new OAuthClient that can obtain tokens through [PKCE|https://datatracker.ietf.org/doc/html/rfc7636,] as the existing clients do not have this capability. The new client will be called openshift-cli-client and will have the loopback addresses as valid Redirect URIs.

https://github.com/openshift/cluster-authentication-operator/pull/606

Task AUTH-357: Update the OSIN dependency of oauth-server to the latest version

View the Description View the linked PRs

In order for the OAuth2 Authorization Code Grant Flow to work in oc browser login, the OSIN server must ignore any port used in the Redirect URIs of the flow when the URIs are the loopback addresses. This has already been added to OSIN; we need to update the oauth-server to use the latest version of OSIN in order to make use of this capability.

https://github.com/openshift/oauth-server/pull/121

Feature OCPSTRAT-379: OVN Interconnect work: Part2

View the Description

Review the OVN Interconnect proposal, figure out the work that needs to be done in ovn-kubernetes to be able to move to this new OVN architecture.

Phase-2 of this project in continuation of what was delivered in the earlier release.

Why is this important?

OVN IC will be the model used in Hypershift.

Epic SDN-3733: OVN Interconnect work: Part2 (GA)

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Why is this important?

See https://docs.google.com/presentation/d/17wipFv5wNjn1KfFZBUaVHN3mAKVkMgGWgQYcvss2yQQ/edit#slide=id.g547716335e_0_220

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Bug SDN-4042: Increase timers for IC upgrades on AWS and GCP to 95 minutes

View the Description View the linked PRs

For interconnect upgrades - i.e when moving from OCP 4.13 to OCP 4.14 where IC is enabled, we do a 2 phase rollout of ovnkube-master and ovnkube-node pods in the openshift-ovn-kubernetes namespace. This is to ensure we have minimum disruption since major architectural components are being brought from control-plane down to the data-plane.

Since its a two phase roll out with each phase taking taking approximately 10mins, we effectively double the time it takes for OVNK component to upgrade thereby increasing the timeout thresholds on AWS.

See https://redhat-internal.slack.com/archives/C050MC61LVA/p1689768779938889 for some more details.

See sample runs:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1679589472833900544

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1679589451010936832

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1678480739743567872

I have noticed this happening once on GCP:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1680563737225859072

This has not happened on Azure which has 95mins allowance. So this card tracks the work to increase the timers on AWS/GCP. This was brought up in the TRT team sync that happened yesterday (July 19th 2023) and Scott Dodson has agreed to approve this under the condition that we bring it down back to the current values in release 4.15.

SDN team is confident the time will drop back to normal for future upgrades going from 4.14 -> 4.15 and so on. This will be tracked via https://issues.redhat.com/browse/OTA-999

Story SDN-3664: [Scale] Optimize the memory footprint of ovnkube-node pod with IC {PostFeatureFreeze: Approved}

View the Description View the linked PRs

Work with https://issues.redhat.com/browse/SDN-3654 card to get data from scale team as needed and continue to improvise the numbers.

https://github.com/openshift/cluster-network-operator/pull/1971

Bug OCPBUGS-19481: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2008

Story SDN-3687: [Debugging] Change must-gather - gather DBs from all nodes?

View the Description View the linked PRs

In the non-IC world, we have centralised DB, running a trace is easy, in IC world, we'd need all the local DBs from each node to even run a pod2pod trace fully else we can only run half traces with one side DB.

Goal of this card:

Open a PR against `oc` repo to get all dbs (minimum requirement)

https://github.com/openshift/must-gather/pull/370

Feature OCPSTRAT-38: Support for AWS Placement Groups

View the Description

Users would desire to create EFA instance MachineSet in the same AWS placement group to get best network performance within that AWS placement group.

The Scope of this Epic is only to support placement groups. Customers will create them.
The customer ask is that placement groups don't need to be created by the OpenShift Container Platform
OpenShift Container Platform only needs to be able to consume them and assign machines out of a machineset to a specific Placement Group.

Epic OCPCLOUD-2112: Support for AWS Placement Groups

View the Description

Users would desire to create EFA instance MachineSet in the same AWS placement group to get best network performance within that AWS placement group.

Note: This Epic was previously connected to https://issues.redhat.com/browse/OCPPLAN-8106 and has been updated to ~~OCPBU-327~~.

Scope

Story OCPCLOUD-2113: Backport placement group support for MAPI

View the Description View the linked PRs

Background

In CAPI, the AWS provider supports the user supplying the name of a pre-existing placement group. Which will then be used to create the instances.

https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/4273

We need to add the same field to our API and then pass the information through in the same way, to allow users to leverage placement groups.

Steps

Review the upstream code linked above
Backport the feature
Drop old code for placement group controller that is currently disabled

Stakeholders

Cluster Infra

Definition of Done

Users may provide a pre-existing placement group name and have their instances created within that placement group

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

https://github.com/openshift/machine-api-provider-aws/pull/77

Feature OCPSTRAT-402: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic CONSOLE-3571: OCP 4.14 - OLM Epic

View the Description

This epic contains all the OLM related stories for OCP release-4.14

Epic Goal

Track all the stories under a single epic

Story CONSOLE-3600: Cluster nodes OS type filtering in OperatorHub - Console

View the Description View the linked PRs

Console operator should be building up a set of cluster nodes OS types, which he should supply to console, so it renders only operators that could be installed on the cluster.

This will be needed when we will support different OS types on the cluster.

We need to scan through the compute nodes and build a set of supported OS from those. Each node on the cluster has a label for its operating system: e.g. kubernetes.io/os=linux,

AC:

Implement logic in the console repo
1. Add additional flag
2. populate the supported OS types into SERVER_FLAGS
3. update the filtering logic in the operator hub

https://github.com/openshift/console/pull/12707

Bug OCPBUGS-10619: Add a scroll bar for the resource list in the Uninstall Operator pops-up window

View the Description View the linked PRs

1. Proposed title of this feature request

Add a scroll bar for the resource list in the Uninstall Operator pops-up window
2. What is the nature and description of the request?

To make user easy to check the list of all resources
3. Why does the customer need this? (List the business requirements here)

For customers, one operator may have multiple resources, it would be easy for them to check them all in Uninstall Operator pops-up window with the scroll bar
4. List any affected packages or components.

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12680

Story CONSOLE-3279: Cluster nodes OS type filtering in OperatorHub - Console operator

View the Description View the linked PRs

Console operator should be building up a set of cluster nodes OS types, which he should supply to console, so it renders only operators that could be installed on the cluster.

This will be needed when we will support different OS types on the cluster.

We need to scan through the compute nodes and build a set of supported OS from those. Each node on the cluster has a label for its operating system: e.g. kubernetes.io/os=linux,

AC:

Implement logic in the console-operator that will scan though all the nodes and build a set of all the OS types that the cluster nodes run on and pass it to the console-config.yaml . This set of OS types will be then used by console frontend.
Add unit and e2e test cases in the console-operator repository.

https://github.com/openshift/console-operator/pull/742

Story CONSOLE-3372: Console supports installing non-latest Operator versions

View the Description View the linked PRs

Goal: OperatorHub/OLM users get a more intuitive UX around discovering and selecting Operator versions to install.

Problem statement: Today it's not possible to install an older version of an Operator unless the user exactly nows the CSV semantic version. This is not exposed however through any API. `packageserver` as of today only shows the latest version per channel.

Why is this important: There are many reasons why a user would want to choose not to install the latest version - whether it's lack of testing or known problems. It should be easy for a user to discovers what versions of an Operator OLM has in its catalogs and update graphs and expose this information in a consumable way to the user.

Acceptance Criteria:

Users can choose from a list of "available versions" of an Operator based on the "selected channel" on the 'OperatorHub' page in the console.
Users can see/examine Operator metadata (e.g. descriptions, version, capability level, links, etc) per selected channel/version to confirm the exact version they are going to install on the OperatorHub page.
The selected channel/version info will be carried over from the 'OperatorHub' page to 'Install Operator' page in the console.
Note that "installing an older version" means "no automatic update"; hence, when users select a non-latest Operator version, this implies the "Update" field would be changed to "Manual".
Operator details sidebar data will update based on the selected channel. `createdAt` `containerImage` and `capability level`

Out of scope:

provide a version selector for updatres in case of existing installed operators

Related info

UX designs: http://openshift.github.io/openshift-origin-design/designs/administrator/olm/select-install-operator-version/
linked OLM jira: https://issues.redhat.com/browse/OPRUN-1399
where you can see the downstream PR: https://github.com/openshift/operator-framework-olm/pull/437/files
specifically: https://github.com/awgreene/operator-framework-olm/blob/f430b2fdea8bedd177550c95ec[…]r/pkg/package-server/apis/operators/v1/packagemanifest_types.go i.e., you can get a list of available versions in PackageChannel stanza from the packagemanifest API
You can reach out to OLM lead Alex Greene for any question regarding this too, thanks

https://github.com/openshift/console/pull/12743

Epic CONSOLE-3433: Phase 4 - Unite Consoles (ACM, OCP) - GA

View the Description

Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?

Single interface to accomplish their tasks
Consistent UX and patterns
Easily accessible: One URL, one set of credentials

Why we want this?

Shared code - improve the velocity of both teams and most importantly ensure consistency of the experience at the code level
Pre-built PF4 components
Accessibility & i18n
Remove barriers for enabling ACM

Phase 2 Goal: Productization of the united Console

Enable user to quickly change context from fleet view to single cluster view
1. Add Cluster selector with “All Cluster” Option. “All Cluster” = ACM
2. Shared SSO across the fleet
3. Hub OCP Console can connect to remote clusters API
4. When ACM Installed the user starts from the fleet overview aka “All Clusters”
Share UX between views
1. ACM Search —> resource list across fleet -> resource details that are consistent with single cluster details view
2. Add Cluster List to OCP —> Create Cluster

Bug OCPBUGS-7111: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/724

Story CONSOLE-2843: Support metrics for spoke clusters

View the Description

~~We need a way to show metrics for workloads running on spoke clusters. This depends on ~~ACM-876~~, which lets the console discover the monitoring endpoints.~~

~~Console operator must discover the external URLs for monitoring~~
~~Console operator must pass the URLs and CA files as part of the cluster config to the console backend~~
~~Console backend must set up proxies for each endpoint (as it does for the API server endpoints)~~
~~Console frontend must include the cluster in metrics requests~~

~~Open Issues:~~

~~We will depend on ACM to create a route on each spoke cluster for the prometheus tenancy service, which is required for metrics for normal users.~~

Openshift console backend should proxy managed cluster monitoring requests through the MCE cluster proxy addon to prometheus services on the managed cluster. This depends on https://issues.redhat.com/browse/ACM-1188

Sub-task CONSOLE-3393: Update console backend to use MCE cluster proxy for metrics

View the linked PRs

https://github.com/openshift/console/pull/12360

Feature OCPSTRAT-403: Automated backups of etcd (local destination)

View the Description

BU Priority Overview

Initiative: Improve etcd disaster recovery experience (part1)

Goals

The current etcd backup and recovery process is described in our docs https://docs.openshift.com/container-platform/4.12/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html

The current process leaves up to the cluster-admin to figure out a way to do consistent backups following the documented procedure.

This feature is part of a progressive delivery to improve the cluster-admin experience for backup and restore of etcd clusters to a healthy state.

Scope of this feature:

etcd quorum loss (2 node failure) on a 3 nodes OCP control plane
etcd degradation (1 node failure) on a 3 nodes OCP control plane

Execution Plans

Improve etcd disaster recovery e2e test coverage
Design automated backup API. Initial target is local destination
Should provide a way (e.g. script or tool) for cluster-admin to validate backup files remains valid over time (e.g. account for disk failures corrupting the backup)
Should document updated manual steps to restore from local backup. These steps should be part of the e2e test coverage.
Should document manual manual steps to copy backups files to destination outside the cluster. (e.g. ssh copy a cluster admin can use in a CronJob)

Epic ETCD-81: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story ETCD-443: Add controller for periodic etcd backups

View the Description

Given that we have a controller that processes one time etcd backup requests via the "operator.openshift.io/v1alpha1 EtcdBackup" CR, we need another controller that processes the "config.openshift.io/v1alpha1 Backup" CR so we can have periodic backups according the the schedule in the CR spec.

See https://github.com/openshift/api/pull/1482 for the APIs

The workflow for this controller should roughly be:

Watches the `config.openshift.io/v1alpha1 Backup` CR as created by an admin
Creates a CronJob for the specified schedule and timezone that would in turn create `operator.openshift.io/v1alpha1 EtcdBackup` CRs at the desired schedule
Updates the CronJob for any changes in the schedule or timezone

Along with this controller we would also need to provide the workload or Go command for the pod that is created periodically by the CronJob. This cmd e.g "create-etcdbackup-cr" effectively creates a new `operator.openshift.io/v1alpha1 EtcdBackup` CR via the following workflow:

Read the Backup CR to get the pvcName (and anything else) required to populate an `EtcdBackup` CR

Create the `operator.openshift.io/v1alpha1 EtcdBackup` CR

Lastly to fulfill the retention policy (None, number of backups saved, or total size of backups), we can employ the following workflow:

Have another command e.g "prune-backups" cmd that runs prior to the "create-etcdbackup-cr" command that deletes existing backups per the retention policy.

This cmd is run before the cmd to create the etcdbackup CR. This could be done via an init container on the CronJob execution pod.
This would require the backup controller to populate the CronJob spec with the pvc name from the Backup spec that would allowing mounting the PV on the execution pod for pruning the backups in the init container.

Sub-task ETCD-448: Add backups pruning logic

View the Description View the linked PRs

Lastly to fulfill the retention policy (None, number of backups saved, or total size of backups), we can employ the following workflow:

Have another command e.g "prune-backups" cmd that runs prior to the "create-etcdbackup-cr" command that deletes existing backups per the retention policy.
The retention policy type can either be read from the `config.openshift.io/v1alpha1 Backup` CR
- Or easier yet, the backup controller can pass set the retention policy arg in the CronJob template spec

This cmd is run before the cmd to create the etcdbackup CR. This could be done via an init container on the CronJob execution pod.
This would require the backup controller to populate the CronJob spec with the pvc name from the Backup spec that would allowing mounting the PV on the execution pod for pruning the backups in the init container.

Sub-task ETCD-446: Controller for creating CronJob

View the Description View the linked PRs

See the parent story for more context.
As the first part to this story we need a controller with the following workflow:

Watches the `config.openshift.io/v1alpha1 Backup` CR as created by an admin
Creates a CronJob for the specified schedule and timezone that would ultimately create `operator.openshift.io/v1alpha1 EtcdBackup` CRs at the desired schedule
Updates the CronJob for any changes in the schedule or timezone

Since we also want to preserve a history of successful and failed backup attempts for the periodic config, the CronJob should utilize cronjob history limits to preserve successful and failed jobs.
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#jobs-history-limits

To begin with we can set this to a reasonable default of 5 successful and 10 failed jobs.

https://github.com/openshift/cluster-etcd-operator/pull/1074

Story ETCD-417: Restore Test - The test can perform backup and restore workflows

View the Description View the linked PRs

For testing the automated backups feature we will require an e2e test that validates the backups by ensuring the restore procedure works for a quorum loss disaster recovery scenario.

See the following doc for more background:
https://docs.google.com/document/d/1NkdOwo53mkNBCktV5tkUnbM4vi7bG4fO5rwMR0wGSw8/edit?usp=sharing

This story targets the milestone 2,3 and 4 of the restore test to ensure that the test has the ability to perform a backup and then restore from that backup in a disaster recovery scenario.

While the automated backups API is still in progress, the test will rely on the existing backup script to trigger a backup. Later on when we have a functional backup API behind a feature gate, the test can switch over to using that API to trigger backups.

We're starting with a basic crash-looping member restore first. The quorum loss scenario will be done in ~~ETCD-423~~.

https://github.com/openshift/origin/pull/27875

Story ETCD-460: add basic backup tests to e2e-operator

View the Description View the linked PRs

We should add some basic backup e2e tests into our operator:

one off backups can be run via API
periodic backups can be run (also multiple times in succession)
- retention should work

The e2e workflow should be TechPreview enabled already.

https://github.com/openshift/cluster-etcd-operator/pull/1089

Story ETCD-399: Restore Test - Create scaffolding for platform agnostic ssh access

View the Description View the linked PRs

For testing the automated backups feature we will require an e2e test that validates the backups by ensuring the restore procedure works for a quorum loss disaster recovery scenario.

See the following doc for more background:
https://docs.google.com/document/d/1NkdOwo53mkNBCktV5tkUnbM4vi7bG4fO5rwMR0wGSw8/edit?usp=sharing

This story targets the first milestone of the restore test to ensure we have a platform agnostic way to be able to ssh access all masters in a test cluster so that we can perform the necessary backup, restore and validation workflows.

The suggested approach is to create a static pod that can do those ssh checks and actions from within the cluster but other alternatives can also be explored as part of this story.

https://github.com/openshift/origin/pull/27869

Story ETCD-436: Add controller for one time EtcdBackup

View the Description View the linked PRs

To fulfill one time backup requests there needs to be a new controller that reconciles an EtcdBackup CustomResource (CR) object and executes and saves a one time backup of the etcd cluster.

Similar to the upgradebackupcontroller the controller would be triggered to create a backup pod/job which would save the backup to the PersistentVolume specified by the spec of the EtcdBackup CR object.

The controller would also need to honor the retention policy specified by the EtcdBackup spec and update the status accordingly.

See the following enhancement and API PRs for more details and potential updates to the API and workflow for the one time backup:
https://github.com/openshift/enhancements/pull/1370
https://github.com/openshift/api/pull/1482

https://github.com/openshift/cluster-etcd-operator/pull/1066

Feature OCPSTRAT-445: Pipeline Builder should access tasks from ArtifactHub

View the Description

< High-Level description of the feature ie: Executive Summary >

Goals

< Who benefits from this feature, and how? What is the difference between today's current state and a world with this feature? >

Requirements

Requirements	Notes	IS MVP

- (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

<Defines what is not included in this story>

Dependencies

< Link or at least explain any known dependencies. >

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

What does success look like?

< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact?>

QE Contact

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Impact

< If the feature is ordered with other work, state the impact of this feature on the other work>

Related Architecture/Technical Documents

<links>

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Epic ODC-7246: Migration of querying of tasks in pipeline builder from Tekton Hub to Artifacthub

View the Description

What's the problem

Currently pipeline builder in dev console directly queries tekton hub APIs for searching tasks. As upstream community and Red Hat is moving to artifacthub, we need to query artifacthub API for searching tasks.

Acceptance criteria

Update the pipeline builder code so that if the API to retrieve tasks is not available, there will be no errors in the UI.
Perform a spike to estimate the amount of work it will take to have the pipeline builder use the artifact hub API to retrieve tasks, rather than using the tekton hub API.

Story ODC-7322: Implement a proxy to hit the Artifacthub.io API end point

View the Description View the linked PRs

Description

Hitting the Artifacthub.io search endpoint fails sometimes due to a CORS error and the Version API endpoint always fails due to a CORS error. So, we need a Proxy to hit the Artifacthub. end point to get the data.

Acceptance Criteria

Create a proxy to hit the Artifacthub.io endpoint.

Additional Details:

Search endpoint: https://artifacthub.io/docs/api/#/Packages/searchPackages

eg.: https://artifacthub.io/api/v1/packages/search?offset=0&limit=20&facets=false&ts_query_web=git&kind=7&deprecated=false&sort=relevance

Version endpoint: https://artifacthub.io/docs/api/#/Packages/getTektonTaskVersionDetails

eg: https://artifacthub.io/api/v1/packages/tekton-task/tekton-catalog-tasks/git-clone/0.9.0

https://github.com/openshift/console/pull/12905

Feature OCPSTRAT-461: [Tech-preview] Support x86 control-plane with Arm data-plane for HyperShift on AWS

View the Description

Feature Overview (aka. Goal Summary):

This feature will allow an x86 control plane to operate with compute nodes of type Arm in a HyperShift environment.

Goals (aka. expected user outcomes):

Enable an x86 control plane to operate with an Arm data-plane in a HyperShift environment.

Requirements (aka. Acceptance Criteria):

The feature must allow an x86 control plane and an Arm data-plane to be used together in a HyperShift environment.
The feature must provide documentation on how to set up and use the x86 control plane with an Arm data-plane in a HyperShift environment.
The feature must be tested and verified to work reliably and securely in a production environment.

Customer Considerations:

Customers who require a mix of x86 control plane and Arm data-plane for their HyperShift environment will benefit from this feature.

Documentation Considerations:

Documentation should include clear instructions on how to set up and use the x86 control plane with an Arm data-plane in a HyperShift environment.
Documentation will live on docs.openshift.com

Interoperability Considerations:

This feature should not impact other OpenShift layered products and versions in the portfolio.

Epic ARMOCP-194: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story ARMOCP-412: Add arm nodepool to x86 managed cluster (AWS)

View the Description View the linked PRs

As a user, I would like to deploy a hypershift cluster to an x86 managed cluster with an arm nodepool.

Starting point:
https://github.com/openshift/hypershift/blob/main/hypershift-operator/controllers/nodepool/nodepool_controller.go

https://github.com/openshift/hypershift/blob/c7c2b57c98caa518231c15eceefe62e98eece832/hypershift-operator/controllers/nodepool/nodepool_controller.go#L1323

AC:

Complete necessary code changes to allow ARM AWS nodepools to be added to an x86 managed cluster.

https://github.com/openshift/hypershift/pull/1594

Feature OCPSTRAT-465: Agent-based Installer add cluster configs after booting image

View the Description

Goal

Numerous partners are asking for ways to pre-image servers in some central location before shipping them to an edge site where they can be configured as an OpenShift cluster: OpenShift-based Appliance.

A number of these cases are a good fit for a solution based on writing an image equivalent to the agent ISO, but without the cluster configuration, to disk at the central location and then configuring and running the installation when the servers reach their final location. (Notably, some others are not a good fit, and will require OpenShift to be fully installed, using the Agent-based installer or another, at the central location.)

While each partner will require a different image, usually incorporating some of their own software to drive the process as well, some basic building blocks of the image pipeline will be widely shared across partners.

Extended documentation

OpenShift-based Appliance

Building Blocks for Agent-based Installer Partner Solutions

Interactive Workflow work (~~OCPBU-132~~)

This work must "avoid conflict with the requirements for any future interactive workflow (see Interactive Agent Installer), and build towards it where the requirements coincide. This includes a graphical user interface (future assisted installer consistency).

Epic AGENT-559: Install configuration image

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Allow the user to use the openshift-installer to generate a configuration ISO that they can attach to a server running the unconfigured agent ISO from ~~AGENT-558~~. This would act as alternative to the GUI, effectively leaving the interactive flow and rejoining the automation flow by doing an automatic installation using the configuration contained on the ISO.

Why is this important?

Helps standardise implementations of the automation flow where an agent ISO image is pre-installed on a physical disk.

Scenarios

The user purchases hardware with a pre-installed unconfigured agent image. They use openshift-installer to generate a config ISO from an install config, and attach this ISO to the server as virtual media to a group of servers to cause them to install OpenShift and form a cluster.
The user has a pool of servers that share the same boot mechanism (e.g. PXE). Each server is booted from a common interactive agent image, and automation can install any subset of them as a cluster by attaching the same configuration ISO to each.
A cloud user could boot a group of VMs using a publicly-available unconfigured agent image (e.g. an AMI), and install them as a cluster by attaching a configuration ISO to them.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

~~AGENT-556~~ - we'll need to block startup of services until configuration is provided
~~AGENT-558~~ - this won't be useful without an unconfigured image to use it with
AGENT-560 - enables ~~AGENT-556~~ to block in an image generated with ~~AGENT-558~~

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story AGENT-563: Create config ISO

View the Description View the linked PRs

Add a new installer subcommand, openshift-install agent create config-image.

The should create a small ISO (i.e. not a CoreOS boot image) containing just the configuration files from the automation flow:

rendezvousIP config file
ClusterDeployment manifest
AgentPullSecret manifest
AgentClusterInstall manifest
TLS certs for admin kubeconfig
password hash for kubeadmin console password
NMStateConfig
extra manifests
hostnames
hostconfig (roles, root device hints)
ClusterImageSet manifest (for version verification)

The contents in the disk could be in any format, but should be optimised to make it simple for the service in ~~AGENT-562~~ to read.

https://github.com/openshift/installer/pull/7157

Story AGENT-562: Load config from config ISO

View the Description View the linked PRs

Implement a systemd service in the unconfigured agent ISO (~~AGENT-558~~) that watches for disks to be mounted, then searches them for agent installer configuration. If such configuration is found, then copy it to the relevant places in the running system.

The rendezvousIP must be copied last, as the presence of this is what will trigger the services to start (~~AGENT-556~~).

To the extent possible, the service should be agnostic as to the method by which the config disk was mounted (e.g. virtual media, USB stick, floppy disk, &c.). It may be possible to get systemd to trigger on volume mount, avoiding the need to poll anything.

The configuration drive must contain:

rendezvousIP config file
ClusterDeployment manifest
AgentPullSecret manifest
AgentClusterInstall manifest
TLS certs for admin kubeconfig
password hash for kubeadmin console password
ClusterImageSet manifest (for version verification)

it may optionally contain:

NMStateConfig
extra manifests
hostnames
hostconfig (roles, root device hints)

The ClusterImageSet manifest must match the one already present in the image for the config to be accepted.

https://github.com/openshift/installer/pull/7200

Feature OCPSTRAT-476: Support for GCP pd-balanced disk type

View the Description

Support pd-balanced disk types for GCP deployments

OpenShift installer and Machine API should support creation and management of computing resources with disk type "pd-balanced"

Why does the customer need this?

pd-balanced are ssd disks with performances comparable to pd-ssd but with a lower price

Epic CORS-1770: Support pd-balanced disk types for GCP deployments

View the Description View the linked PRs

Epic Goal

Support pd-balanced disk types for GCP deployments

Why is this important?

Customers will be able to reduce costs on GCP while using `pd-balanced` disk types with a comparable performance to `pd-ssd` ones.

Scenarios

Enable `pd-balanced` disk types when deploying a cluster in GCP. Right now only `pd-ssd` and `pd-standard` are supported.

Overview:

To enable support for pd-balanced disk types during cluster deployment in Google Cloud Platform (GCP) for Openshift Installer.
Currently, only pd-ssd and pd-standard disk types are supported.
`pd-balanced` disks on GCP will offer cost reduction and comparable performance to `pd-ssd` disks, providing increased flexibility and performance for deployments.

Acceptance Criteria:

The Openshift Installer should be updated to include pd-balanced as a valid disk type option in the installer configuration process.
When pd-balanced disk type is selected during cluster deployment, the installer should handle the configuration of the disks accordingly.
CI (Continuous Integration) must be running successfully with tests automated.
Release Technical Enablement details and documents should be provided.

Done Checklist:

CI is running, tests are automated, and merged.
Release Enablement Presentation: [link to Feature Enablement Presentation].
Upstream code and tests merged: [link to meaningful PR or GitHub Issue].
Upstream documentation merged: [link to meaningful PR or GitHub Issue].
Downstream build attached to advisory: [link to errata].
Test plans in Polarion: [link or reference to Polarion].
Automated tests merged: [link or reference to automated tests].
Downstream documentation merged: [link to meaningful PR].

Dependencies:

Google Cloud Platform Account
Access to GCP ‘Installer’ Project
Any required permissions, authentication, access controls or CLI needed to provision pd-balanced disk types should be properly configured.

Testing:

Develop and conduct test cases and scenarios to verify the proper functioning of pd-balanced disk type implementation.
Address any bugs or issues identified during testing.

Documentation:

Update documentation to reflect the support for pd-balanced disk types in GCP deployments.

Success Metrics:

Successful deployment of Openshift clusters using the pd-balanced disk type in GCP.
Minimal or no disruption to existing functionality and deployment options.

https://github.com/openshift/installer/pull/7337

Feature OCPSTRAT-482: Support custom RHCOS image location for GCP and Azure

View the Description

Feature Overview

Enable user custom RHCOS images location for Installer IPI provisioned OpenShift clusters on Google Cloud and Azure

Goals

The Installer to accept custom locations for RHCOS images while deploying OpenShift on Google Cloud and Azure as we support already for AWS via `platform.aws.amiID` for control plane and compute nodes.
As a user, I want to be able to specify a custom RHCOS image location to be used for control plane and compute nodes while deploying OpenShift on Google Cloud and Azure so that I cab be complaint with my company security policies.

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Background, and strategic fit

Many enterprises have strict security policies where all the software must be pulled from a trusted or private source. For these scenarios the RHCOS image used to bootstrap the cluster is usually coming from shared public locations that some companies don't accept as a trusted source.

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic CORS-2722: Azure: Support Marketplace images for all nodes

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Simplify ARO's workflow by allowing Azure marketplace images to be specified in the `install-config.yaml` for all nodes (compute, control plane, and bootstrap).

Why is this important?

ARO is a first party Azure service and has a number of requirements/restrictions. These requirements include the following: it must not request anything from outside of Azure and it must consume RHCOS VM images from a trusted source (marketplace).

At the same time upstream OCP does the following:

1. It uses quay.io to get container images.
2. Uses a random blob as a RHCOS VM image such as this. This VHD blob is then uploaded by the Installer to an Image Gallery in the user’s Storage Account where two boot images are created: a HyperV gen1 and a HyperV gen2. See here.
  To meet the requirements ARO team currently does the following as part of the release process:

1. Mirror container images from quay.io to Azure Container Registry to avoid leaving Azure boundaries.
2. Copy VM image from the blob in someone else's Azure subscription into the blob on the subscription ARO team manages and then publish a VM image on Azure Marketplace (publisher: azureopenshift, offer: aro4. See az vm image list --publisher azureopenshift --all). ARO does not bill for these images.

ARO has to carry their own changes on top of the Installer code to allow them to specify their own images for the cluster deployment.

Scenarios

Acceptance Criteria

Custom RHCOS images can be specified in the install-config for compute, controlPlane and defaultMachinePlatform and they are used for the installation instead of the default RHCOS VHD.

Out of scope

A VHD blob will still be uploaded to the user's Storage Account even though it won't be used during installation. That cannot be changed for now.

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2503: Azure: support Marketplace images for all nodes

View the Description View the linked PRs

Description of problem:

ARO needs to copy RHCOS image blobs to their own Azure Marketplace offering since, as a first party Azure service, they must not request anything from outside of Azure and must consume RHCOS VM images from a trusted source (marketplace).
To meet the requirements ARO team currently does the following as part of the release process:

 1. Mirror container images from quay.io to Azure Container Registry to avoid leaving Azure boundaries.
 2. Copy VM image from the blob in someone else's Azure subscription
 into the blob on the subscription ARO team manages and then we publish a VM image on Azure Marketplace (publisher: azureopenshift, offer: aro4. See az vm image list --publisher azureopenshift --all). We do not bill for these images.

The usage of Marketplace images in the installer was already implemented as part of CORS-1823. This single line [1] needs to be refactored to enable ARO from the installer code perspective: on ARO we don't need to set type to AzureImageTypeMarketplaceWithPlan.

However, in OCPPLAN-7556 and related CORS-1823 it was mentioned that using Marketplace images is out of scope for nodes other than compute. For ARO we need to be able to use marketplace images for all nodes.

[1] https://github.com/openshift/installer/blob/f912534f12491721e3874e2bf64f7fa8d44aa7f5/pkg/asset/machines/azure/machines.go#L107

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Set RHCOS image from Azure Marketplace in the installconfig
2. Deploy a cluster
3.

Actual results:

Only compute nodes use the Marketplace image.

Expected results:

All nodes created by the Installer use RHCOS image coming from Azure Marketplace.

Additional info:

https://github.com/openshift/installer/pull/6890

Epic CORS-2378: GCP: Support RHCOS Image coming from a custom source

View the Description

Epic Goal

As a customer, I need to make sure that the RHCOS image I leverage is coming from a trusted source.

Why is this important?

For customer who have a very restricted security policies imposed by their InfoSec teams they need to be able to manually specify a custom location for the RHCOS image to use for the Cluster Nodes.

Scenarios

As a customer, I want to specify a custom location for the RHCOS image to be used for the cluster Nodes

Acceptance Criteria

A user is able to specify a custom location in the Installer manifest for the RHCOS image to be used for bootstrap and cluster Nodes. This is the similar approach we support already for AWS with the compute.platform.aws.amiID option

Previous Work (Optional):

https://issues.redhat.com/browse/CORS-1103

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2660: [GCP] deprecate the Licenses field in install-config

View the Description View the linked PRs

User Story:

Some background on the Licenses field:

https://github.com/openshift/installer/pull/3808#issuecomment-663153787

https://github.com/openshift/installer/pull/4696

So we do not want to allow licenses to be specified (it's up to customers to create a custom image with licenses embedded and supply that to the Installer) when pre-built images are specified (current behaviour). Since we don't need to specify licenses for RHCOs images anymore, the Licenses field is useless and should be deprecated.

Acceptance Criteria:

Description of criteria:

License field deprecated
Any dev docs mentioning Licenses is updated.

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7397

Story CORS-2445: [GCP] Add bootimage override in install-config

View the Description View the linked PRs

User Story:

As a user, I want to be able to:

Specify a RHCOS image coming from a custom source in the install config to override the installer's internal choice of bootimage

so that I can achieve

a custom location in the install config for the RHCOS image to use for the Cluster Nodes

Acceptance Criteria:

(optional) Out of Scope:

Engineering Details:

Feature OCPSTRAT-505: CSI Migration

View the Description

Epic Goal

Enable the migration from a storage intree driver to a CSI based driver with minimal impact to the end user, applications and cluster
These migrations would include, but are not limited to:
- CSI driver for Azure (file and disk)
- CSI driver for VMware vSphere

Why is this important?

OpenShift needs to maintain it's ability to enable PVCs and PVs of the main storage types
CSI Migration is getting close to GA, we need to have the feature fully tested and enabled in OpenShift
Upstream intree drivers are being deprecated to make way for the CSI drivers prior to intree driver removal

Scenarios

User initiated move to from intree to CSI driver
Upgrade initiated move from intree to CSI driver
Upgrade from EUS to EUS

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic STOR-1265: Opt-in to CSI migration on vSphere

View the Description

Epic Goal*

Kubernetes upstream has chosen to allow users to opt-out from CSI volume migration in Kubernetes 1.26 (1.27 PR, 1.26 backport). It is still GA there, but allows opt-out due to non-trivial risk with late CSI driver availability.

We want a similar capability in OCP - a cluster admin should be able to opt-in to CSI migration on vSphere in 4.13. Once they opt-in, they can't opt-out (at least in this epic).

Why is this important? (mandatory)

See an internal OCP doc if / how we should allow a similar opt-in/opt-out in OCP.

Scenarios (mandatory)

Upgrade

Admin upgrades 4.12 -> 4.13 as usual
Storage CR has CSI migration disabled (or nil), in-tree volume plugin handles in-tree PVs.
At the same time, external CCM runs, however, due to kubelet running with –cloud-provider=vsphere, it does not do kubelet’s job.

Admin can opt-in to CSI migration by editing Storage CR. That enables OPENSHIFT_DO_VSPHERE_MIGRATION env. var. everywhere + runs kubelet with –cloud-provider=external.
1. If we have time, it should not be hard to opt out, just remove the env. var + update kubelet cmdline. Storage / in-tree volume plugin will handle in-tree PVs again, not sure about implications on external CCM.
Once opted-in, it’s not possible to opt out.

Both with opt-in and without it, the cluster is Upgradeable=true. Admin can upgrade to 4.14, CSI migration will be forced there.

New install

Admin installs a new 4.13 vSphere cluster, with UPI, IPI, Assisted Installer, or Agent-based Installer.
During installation, Storage CR is created with CSI migration enabled.
(We want to have it enabled for a new cluster to enable external CCM and have zonal. This avoids new clusters from having in-tree as default and then having to go through migration later.)
Resulting cluster has OPENSHIFT_DO_VSPHERE_MIGRATION env. var set + kubelet with –cloud-provider=external + topology support.
Admin cannot opt-out after installation, we expect that they use CSI volumes for everything.

If the admin really wants, they can opt-out before installation by adding a Storage install manifest with CSI migration disabled.

EUS to EUS (4.12 -> 4.14)

Will have CSI migration enabled once in 4.14
During the upgrade, a cluster will have 4.13 masters with CSI migration disabled (see regular upgrade to 4.13 above) + 4.12 kubelets.
Once the masters are 4.14, CSI migration is force-enabled there, still, 4.14 KCM + in-tree volume plugin in it will handle in-tree volume attachments required by kubelets that still have 4.12 (that’s what kcm --external-cloud-volume-plugin=vsphere does).
Once both masters + kubelets are 4.14, CSI migration is force enabled everywhere, in-tree volume plugin + cloud provider in KCM is still enabled by --external-cloud-volume-plugin, but it’s not used.

Keep in-tree storage class by default
A CSI storage class is already available since 4.10
Recommend to switch default to CSI

Can’t opt out from migration
Dependencies (internal and external) (mandatory)

We need a new FeatureSet in openshift/api that disables CSIMigrationvSphere feature gate.
We need kube-apiserver-operator, kube-controller-manager-operator, kube-scheduler-operator, MCO must reconfigure their operands to use in-tree vSphere cloud provider when they see CSIMigrationvSphere FeatureGate disabled.
We need cloud controller manager operator to disable its operand when it sees CSIMigrationvSphere FeatureGate disabled.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Story STOR-1272: cluster-storage-operator must re-create in-tree StorageClass

View the Description View the linked PRs

When CSIMigrationvSphere is disabled, cluster-storage-operator must re-create in-tree StorageClass.

vmware-vsphere-csi-driver-operator's StorageClass must not be marked as the default there (IMO we already have code for that).

This also means we need to fix the Disable SC e2e test to ignore StorageClasses for the in-tree driver. Otherwise we will reintroduce OCPBUGS-7623.

https://github.com/openshift/origin/pull/27814

Feature OCPSTRAT-506: ARO Managed Identity

View the Description

Feature Overview

Customers want to create and manage OpenShift clusters using managed identities for Azure resources for authentication.

Goals

A customer using ARO wants to spin up an OpenShift cluster with "az aro create" without needing additional input, i.e. without the need for an AD account or service principal credentials, and the identity used is never visible to the customer and cannot appear in the cluster.
As an administrator, I want to deploy OpenShift 4 and run Operators on Azure using access controls (IAM roles) with temporary, limited privilege credentials.

Requirements

Azure managed identities must work for installation with all install methods including IPI and UPI, work with upgrades, and day-to-day cluster lifecycle operations.
Support HyperShift and non-HyperShift clusters.
Support use of Operators with Azure managed identities.
Support in all Azure regions where Azure managed identity is available. Note: Federated credentials is associated with Azure Managed Identity, and federated credentials is not available in all Azure regions.

More details at ARO managed identity scope and impact.

This Section: A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer…

Out of Scope

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

References

Epic CCO-282: Azure OpenShift role granularity for Azure managed identity

View the Description

Epic Goal

Build list of specific permissions to run Openshift on Azure - Components grant roles, but we need more granularity.
Determine and document the Azure roles and required permissions for Azure managed identity.

Why is this important?

Many of our customers have security policies in their organization that restrict credentials to only minimal permissions that conflict with the documented list of permissions needed for OpenShift. Customers need to know the explicit list of permissions minimally needed for deploying and running OpenShift and what they're used for so they can request the right permissions. Without this information, it can/will block adoption of OpenShift 4 in many cases.

Scenarios

Acceptance Criteria

Document explicit list of required credential permissions for installing (Day 1) OpenShift on Azure using the IPI and UPI deployment workflows and what each of the permissions are used for.
Document explicit list of required role and credential permissions for the operation (Day 2) of an OpenShift cluster on Azure and what each of the permissions are used for
Verify minimum list of permissions for Azure with IPI and UPI installation workflows
(Day 2) operations of OpenShift on Azure - MUST complete successfully with automated tests
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Installer [both UPI & IPI Workflows]
Control Plane
- Kube Controller Manager
Compute [Managed Identity]
Cloud API enabled components
- Cloud Credential Operator
- Machine API
- Internal Registry
- Ingress
?

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPCLOUD-2149: Azure: Convert Cloud Controller Manager and Node Manager to use CCO provided credentials instead of system-assigned identity

View the Description View the linked PRs

User Story

As a cluster admin, I want the CCM and Node manager to utilize credentials generated by CCO so that the permissions granted to the identity can be scoped with least privilege on clusters utilizing Azure AD Workload Identity.

Background

The Cloud Controller Manager Operator creates a CredentialsRequest as part of CVO manifests which describes credentials that should be created for the CCM and Node manager to utilize. CCM and the Node Manager do not use the credentials created as a product of the CredentialsRequest in existing "passthrough" based Azure clusters or within Azure AD Workload Identity based Azure clusters. CCM and the Node Manager instead use a system-assigned identity which is attached to the Azure cluster VMs.

The system-assigned identity attached to the VMs is granted the "Contributor" role within the cluster's Azure resource group. In order to use the system-assigned identity, a pod must have sufficient privilege to use the host network to contact the Azure instance metadata service (IMDS).

For Azure AD Workload Identity based clusters, administrators must process the CredentialsRequests extracted from the release image which includes the CredentialsRequest from CCCMO manifests. This CredentialsRequest processing results in the creation of a user-assigned managed identity which is not utilized by the cluster. Additionally, the permissions granted to the identity are currently scoped broadly to grant the "Contributor" role within the cluster's Azure resource group. If the CCM and Node Manager were to utilize the identity then we could scope the permissions granted to the identity to be more granular. It may be confusing to administrators to need to create this unused user-assigned managed identity with broad permissions access.

Steps

Modify CCM and Node manager deployments to use the CCCMO's Azure credentials injector as an init-container to merge the provided CCO credentials secret with the /etc/kube/cloud.conf file used to configure cloud-provider-azure as used within CCM and the Node Manager. An example of the init-container can be found within the azure-file-csi-driver-operator.
Validate that the provided credentials are used by CCM and the Node Manager and that they continue to operate normally.
Scope permissions specified in the CCCMO CredentialsRequest to only those permissions needed for operation rather than "Contributor" within the Azure resource group.

Stakeholders

<Who is interested in this/where did they request this>

Definition of Done

CCM and Node Manager use credentials provided by CCO rather than the system-assigned identity attached to the VMs.

Docs

<Add docs requirements for this card>

Testing

e2e tests validate that the CCM and Node manager operate normally with the credentials provided by CCO.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/268

Story NE-1244: Update Azure Credentials Request manifest of the Cluster Ingress Operator to use new API field for requesting permissions

View the Description View the linked PRs

Update Azure Credentials Request manifest of the Cluster Ingress Operator to use new API field for requesting permissions to enable ~~OCPBU-8~~?

https://github.com/openshift/cluster-ingress-operator/pull/929

Story STOR-1274: Update Azure Credentials Request manifest of the Cluster Storage Operator to use new API field for requesting permissions

View the Description View the linked PRs

Update Azure Credentials Request manifest of the Cluster Storage Operator to use new API field for requesting permissions

https://github.com/openshift/cluster-storage-operator/pull/388

Story OCPCLOUD-2013: Update Azure Credentials Request manifest of the Cloud Controller Manager Operator to use new API field for requesting permissions

View the Description View the linked PRs

User Story

As a [user|developer|<other>] I want [some goal] so that [some reason]

<Describes high level purpose and goal for this story. Answers the questions: Who is impacted, what is it and why do we need it?>

Background

<Describes the context or background related to this story>

Steps

<Add steps to complete this card if appropriate>

Stakeholders

<Who is interested in this/where did they request this>

Definition of Done

<Add items that need to be completed for this card>

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/274

Story OCPCLOUD-2014: Update Azure Credentials Request manifest of the Machine API Operator to use new API field for requesting permissions

View the Description View the linked PRs

User Story

As a [user|developer|<other>] I want [some goal] so that [some reason]

<Describes high level purpose and goal for this story. Answers the questions: Who is impacted, what is it and why do we need it?>

Background

<Describes the context or background related to this story>

Steps

<Add steps to complete this card if appropriate>

Stakeholders

<Who is interested in this/where did they request this>

Definition of Done

<Add items that need to be completed for this card>

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

https://github.com/openshift/machine-api-operator/pull/1152

Story CCO-353: Add actuator code to create Azure custom roles for OpenShift components

View the Description View the linked PRs

Add actuator code to satisfy permissions specified in 'Permissions' API field. The implementation should create a new custom role with specified permissions and assign it to the generated user-assigned managed identity along with the predefined roles enumerated in CredReq.RoleBindings. The role we create for the CredentialsRequest should be discoverable so that it can be idempotently updated on re-invocation of ccoctl.

Questions to answer based on lessons learned from custom roles in GCP, assuming that we will create one custom role per identity,

Does Azure have soft/hard role deletion? ie. are custom roles retained for some period following deletion and if so do deleted roles count towards quota?
What is the default quota limitation for custom roles in Azure?
Does it make sense to create a custom role for each identity created based on quota limitations?
- If it doesn't make sense, how can the roles be condensed to satisfy the quota limitations?

https://github.com/openshift/cloud-credential-operator/pull/556

Story CCO-294: Update Azure Credentials Request manifest of the Cluster Network Operator to use new API field for requesting permissions

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/1922

Story CCO-413: Add a new API field for granular Azure dataPermissions in Credentials Request CR

View the Description View the linked PRs

Add a new field (DataPermissions) to the Azure Credentials Request CR, and plumb it into the custom role assigned to the generated user-assigned managed identity's data actions.

https://github.com/openshift/cloud-credential-operator/pull/584

Story IR-363: Update Azure Credentials Request manifest of the Cluster Image Registry Operator to use new API field for requesting permissions

View the Description View the linked PRs

Update Azure Credentials Request manifest of the Cluster Image Registry Operator to use new API field for requesting permissions

https://github.com/openshift/cluster-image-registry-operator/pull/890

Epic IR-364: Azure Managed [Workload] Identity Support

View the Description

Epic Goal

CIRO can consume azure workload identity tokens
CIRO's Azure credential request uses new API field for requesting permissions

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story IR-370: Update image-registry to consume Azure workload identity tokens

View the Description View the linked PRs

This effort is dependent on the completion of work for ~~CCO-187~~, and effort in dependent modules is planned to be worked on by the CCO team unless individual repo owners can help. Operators owners/teams will be expected to review merge requests and complete appropriate QE effort for an openshift release.

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

ACCEPTANCE CRITERIA

image-registry uses latest openshift/docker-distribution
CIRO can detect when the creds it gets from CCO are for federated workload identity (the credentials secret will contain a "azure_federated_token_file")
when using federated workload identity, CIRO adds the "AZURE_FEDERATED_TOKEN_FILE" env var to the image-registry deployment
when using federated workload identity, CIRO does not add the "REGISTRY_STORAGE_AZURE_ACCOUNTKEY" env var to the image-registry deployment
the image-registry operates normally when using federated workload identity

https://github.com/openshift/cluster-image-registry-operator/pull/857

Story IR-371: Update docker-distribution to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

ACCEPTANCE CRITERIA

Upstream distribution/distribution uses azure identity sdk 1.3.0
openshift/docker-distribution uses the latest upstream distribution/distribution (after the above has merged)
Green CI
Every storage driver passes regression tests

OPEN QUESTIONS

Can DefaultAzureCredential be relied on to transparently use workload identities? (in this case the operator would need to export environment varialbes that DefaultAzureCredential expects for workload identities)
- I have tested manually exporting the required env vars and DefaultAzureCredential correctly detects and attempts to authenticate using federated workload identity, so it works as expected.

Story IR-369: Update cluster-image-registry-operator to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

ACCEPTANCE CRITERIA

CIRO should retrieve the "azure_resourcegroup" from the cluster Infrastructure object instead of the CCO created secret (this key will not be present when workload identity is in use)
CIRO's CredentialsRequest specifies the service account names (see the: cluster-storage-operator for an example)
CIRO is able to create storage accounts and containers when configured with azure workload identity.

https://github.com/openshift/cluster-image-registry-operator/pull/857

Epic CORS-1888: Support for Azure Managed Identities for new OpenShift deployments

View the Description

Epic Goal

Enable the OpenShift Installer to authenticate using authentication methods supported by both the azure sdk for go and the terraform azure provider
Future proofing to enable Terraform support for workload identity authentication when it is enabled upstream

Why is this important?

This ties in to the larger OpenShift goal of: as an infrastructure owner, I want to deploy OpenShift on Azure using Azure Managed Identities (vs. using Azure Service Principal) for authentication and authorization.
Customers want support for using Azure managed identities in lieu of using an Azure service principal. In the OpenShift documentation, we are directed to use an Azure Service Principal - "Azure offers the ability to create service accounts, which access, manage, or create components within Azure. The service account grants API access to specific services". However, Microsoft and the customer would prefer that we use User Managed Identities to keep from putting the Service Principal and principal password in clear text within the azure.conf file.
See https://docs.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation for additional information.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2372: Azure: Auth Installer with Managed Identity from VM

View the Description View the linked PRs

User Story:

As a cluster admin I want to be able to:

use the managed identity from the installer host VM (running in Azure)

so that I can

install a cluster without copying credentials to the installer host

Acceptance Criteria:

Description of criteria:

Installer (azure sdk) & terraform authenticate using identity from host VM (not client secret in file ~/.azure/servicePrincipal.json)
Cluster credential is handled appropriately (presumably we force manual mode)

Engineering Details:

Azure session: https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/azure/session.go#L67-L75
New `azidentity` supports managed identities: https://github.com/Azure/azure-sdk-for-go/blob/main/sdk/azidentity/MIGRATION.md#azidentity-3
Creating a Managed Identity: https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/qs-configure-cli-windows-vm

https://github.com/openshift/installer/pull/7108

Epic CCO-187: Azure Managed Identity (Workload Identity) Support

View the Description

Epic Overview

Enable customers to create and manage OpenShift clusters using managed identities for Azure resources for authentication.
A customer using ARO wants to spin up an OpenShift cluster with "az aro create" without needing additional input, i.e. without the need for an AD account or service principal credentials, and the identity used is never visible to the customer and cannot appear in the cluster.

Epic Goal

A customer creates an OpenShift cluster ("az aro create") using Azure managed identity.
Azure managed identities must work for installation with all install methods including IPI and UPI, work with upgrades, and day-to-day cluster lifecycle operations.
After Azure failed to implement workable golang API changes after deprecation of their old API, we have removed mint mode and work entirely in passthrough mode. Azure has plans to implement pod/workload identity similar to how they have been implemented in AWS and GCP, and when this feature is available, we should implement permissions similar to AWS/GCP
This work cannot start until Azure have implemented this feature - as such, this Epic is a placeholder to track the effort when available.

Why is this important?

Microsoft and the customer would prefer that we use Managed Identities vs. Service Principal (which requires putting the Service Principal and principal password in clear text within the azure.conf file).

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CCO-325: Update azure-file-csi-driver-operator to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

Story CCO-401: Integrate Azure AD pod identity webhook into ART build process

View the Description View the linked PRs

Request brew and delivery repositories
Request ART image build and promotion for Azure AD pod identity webhook image

https://github.com/openshift/cloud-credential-operator/pull/586

Story CCO-321: Update machine-api-operator to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

Story CCO-346: Update machine-api-provider-azure to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

https://github.com/openshift/machine-api-provider-azure/pull/55

Story CCO-318: Update cluster-ingress-operator to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

Story CCO-319: Update cluster-storage-operator to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

https://github.com/openshift/cluster-storage-operator/pull/364

Story CCO-358: Update cluster-network-operator to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

https://github.com/openshift/cluster-network-operator/pull/1755

Story CCO-324: Update azure-disk-csi-driver-operator to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

Story CCO-402: ccoctl: Create Azure AD pod identity webhook config secret

View the Description View the linked PRs

Create a config secret in the openshift-cloud-credential-operator namespace which contains the AZURE_TENANT_ID to be used for configuring the Azure AD pod identity webhook deployment.

https://github.com/openshift/cloud-credential-operator/pull/573

Story CCO-233: Document Azure workload identity usage within CCO repo documentation

View the Description View the linked PRs

These docs should cover:

A general overview of the feature, what changes are made to Azure credentials secrets and how to install a new cluster.

A usage guide of `ccoctl azure` commands to create/manage infra required for Azure workload identity.

See existing documentation for:

https://github.com/openshift/cloud-credential-operator/pull/566

Story CCO-356: Update cloud-network-config-controller to consume Azure workload identity tokens

View the Description View the linked PRs

azure-sdk-for-go module dependency updated to support workload identity federation.
- Support for workload identity federation is not yet complete for azure-sdk-for-go. Support is being tracked in the following issues,
  - Azure/azure-sdk-for-go#15615
  - Azure/azure-sdk-for-go#17417
Mount the OIDC token in the operator pod. This needs to go in the deployment. See example from addition to the cluster-image-registry-operator here

Feature OCPSTRAT-507: RHCOS based on RHEL 9.2

View the Description

Feature Overview

RHEL CoreOS should be updated to RHEL 9.2 sources to take advantage of newer features, hardware support, and performance improvements.

Requirements

RHEL 9.x sources for RHCOS builds starting with OCP 4.13 and RHEL 9.2.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

~~9.2 Preview via Layering~~ No longer necessary assuming we stay the course of going all in on 9.2

Assumptions

Customer Considerations

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content Strategy.
What concepts do customers need to understand to be successful in [action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or Release Note)?

Epic OCPNODE-1538: Support cpu load balancing on cgroupv1 on RHEL 9

View the Description

Epic Goal

The Kernel API was updated for RHEL 9, so the old approach of setting the `sched_domain` in `/sys/kernel` is no longer available. Instead, cgroups have to be worked with directly.
Both CRI-O and PAO need to be updated to set the cpuset of containers and other processes correctly, as well as set the correct value for sched_load_balance

Why is this important?

CPU load balancing is a vital piece of real time execution for processes that need exclusive access to a CPU. Without this, CPU load balancing won't work on RHEL 9 with Openshift 4.13

Scenarios

As a developer on Openshift, I expect my pods to run with exclusive CPUs if I set the PAO configuration correctly

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPNODE-1539: PAO support for cpu load balancing on cgroupv1 and RHEL 9

View the Description View the linked PRs

Part of setting CPU load balancing on RHEL 9 involves disabling sched_load_balance on cgroups that contain a cpuset that should be exclusive. The PAO may be required to be responsible for this piece

https://github.com/openshift/cluster-node-tuning-operator/pull/601

Epic COS-1926: Move RHCOS to RHEL 9.2 in OCP 4.13

View the Description

This is the Epic to track the work to add RHCOS 9 in OCP 4.13 and to make OCP use it by default.

CURRENT STATUS: Landed in 4.14 and 4.13

Testing with layering

Another option given an existing e.g. 4.12 cluster is to use layering. First, get a digested pull spec for the current build:

$ skopeo inspect --format "{{.Name}}@{{.Digest}}" -n docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev:4.13-9.2
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b4cc3995d5fc11e3b22140d8f2f91f78834e86a210325cbf0525a62725f8e099

Create a MachineConfig that looks like this:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-override
spec:
  osImageURL: <digested pull spec>

If you want to also override the control plane, create a similar one for the master role.

We don't yet have auto-generated release images. However, if you want one, you can ask cluster bot to e.g. "launch https://github.com/openshift/machine-config-operator/pull/3485" with options you want (e.g. "azure" etc.) or just "build https://github.com/openshift/machine-config-operator/pull/3485" to get a release image.

Bug OCPBUGS-10787: Potential 4.12 to 4.13 upgrade failure due to NIC rename

View the Description View the linked PRs

STATUS: Code is merged for 4.13 and is believed to largely solve the problem.

Description of problem:

Upgrades to from OpenShift 4.12 to 4.13 will also upgrade the underlying RHCOS from 8.6 to 9.2. As part of that the names of the network interfaces may change. For example `eno1` may be renamed to `eno1np0`. If a host is using NetworkManager configuration files that rely on those names then the host will fail to connect to the network when it boots after the upgrade. For example, if the host had static IP addresses assigned it will instead boot using IP addresses assigned via DHCP.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always.

Steps to Reproduce:

1. Select hardware (or VMs) that will have different network interface names in RHCOS 8 and RHCOS 9, for example `eno1` in RHCOS 8 and `eno1np0` in RHCOS 9.

1. Install a 4.12 cluster with static network configuration using the `interface-name` field of NetworkManager interface configuration files to match the configuration to the network interface.

2. Upgrade the cluster to 4.13.

Actual results:

The NetworkManager configuration files are ignored because they don't longer match the NIC names. Instead the NICs get new IP addresses from DHCP.

Expected results:

The NetworkManager configuration files are updated as part of the upgrade to use the new NIC names.

Additional info:

Note this a hypothetical scenario. We have detected this potential problem in a slightly different scenario where we install a 4.13 cluster with the assisted installer. During the discovery phase we use RHCOS 8 and we generate the NetworkManager configuration files. Then we reboot into RHCOS 9, and the configuration files are ignored due to the change in the NICs. See MGMT-13970 for more details.

https://github.com/openshift/machine-config-operator/pull/3650

Feature OCPSTRAT-516: [Phase 1] Add a new platform type ("external") to identify clusters with non-integrated partner components enabled

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Create a new platform type, working name "External", that will signify when a cluster is deployed on a partner infrastructure where core cluster components have been replaced by the partner. “External” is different from our current platform types in that it will signal that the infrastructure is specifically not “None” or any of the known providers (eg AWS, GCP, etc). This will allow infrastructure partners to clearly designate when their OpenShift deployments contain components that replace the core Red Hat components.

Phase 1

Write platform “External” enhancement.
Evaluate changes to cluster capability annotations to ensure coverage for all replaceable components.
Meet with component teams to plan specific changes that will allow for supplement or replacement under platform "External".

Phase 2

Update OpenShift API with new platform and ensure all components have updated dependencies.
Update capabilities API to include coverage for all replaceable components.
Ensure all Red Hat operators tolerate the "External" platform and treat it the same as "None" platform.

Phase 3

Update components based on identified changes from phase 1
- Update Machine API operator to run core controllers in platform "External" mode.

Why is this important?

As partners begin to supplement OpenShift's core functionality with their own platform specific components, having a way to recognize clusters that are in this state helps Red Hat created components to know when they should expect their functionality to be replaced or supplemented. Adding a new platform type is a significant data point that will allow Red Hat components to understand the cluster configuration and make any specific adjustments to their operation while a partner's component may be performing a similar duty.
The new platform type also helps with support to give a clear signal that a cluster has modifications to its core components that might require additional interaction with the partner instead of Red Hat. When combined with the cluster capabilities configuration, the platform "External" can be used to positively identify when a cluster is being supplemented by a partner, and which components are being supplemented or replaced.

Scenarios

A partner wishes to replace the Machine controller with a custom version that they have written for their infrastructure. Setting the platform to "External" and advertising the Machine API capability gives a clear signal to the Red Hat created Machine API components that they should start the infrastructure generic controllers but not start a Machine controller.
A partner wishes to add their own Cloud Controller Manager (CCM) written for their infrastructure. Setting the platform to "External" and advertising the CCM capability gives a clear to the Red Hat created CCM operator that the cluster should be configured for an external CCM that will be managed outside the operator. Although the Red Hat operator will not provide this functionality, it will configure the cluster to expect a CCM.

Acceptance Criteria

Phase 1

Partners can read "External" platform enhancement and plan for their platform integrations.
Teams can view jira cards for component changes and capability updates and plan their work as appropriate.

Phase 2

Components running in cluster can detect the “External” platform through the Infrastructure config API
Components running in cluster react to “External” platform as if it is “None” platform
Partners can disable any of the platform specific components through the capabilities API

Phase 3

Components running in cluster react to the “External” platform based on their function.
- for example, the Machine API Operator needs to run a set of controllers that are platform agnostic when running in platform “External” mode.
- the specific component reactions are difficult to predict currently, this criteria could change based on the output of phase 1.

Dependencies (internal and external)

Previous Work (Optional):

Identifying OpenShift Components for Install Flexibility

Open questions::

Phase 1 requires talking with several component teams, the specific action that will be needed will depend on the needs of the specific component. At the least the components need to treat platform "External" as "None", but there could be more changes depending on the component (eg Machine API Operator running non-platform specific controllers).

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Epic OCPCLOUD-1772: Add External platform type and ensure operators do not react

View the Description

Epic Goal

As defined in the part (~~OCPBU-5~~), this epic is about adding the new "External" platform type and ensuring that the OpenShift operators which react to platform types treat the "External" platform as if it were a "None" platform.
Add an end-to-end test to exercise the "External" platform type

Why is this important?

This work lays the foundation for partners and users to customize OpenShift installations that might replace infrastructure level components.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPCLOUD-1779: Ensure CCCMO does not react to External platform type

View the Description View the linked PRs

Background

As described in the external platform enhancement , the cluster-cloud-controller-manager-opeartor should be modified to react to the external platform type in the same manner as platform none.

Steps

add an extra clause to the platform switch that will group "External" with "None"

Stakeholders

openshift eng

Definition of Done

CCCMO behaves as if platform None when External is selected

Docs

developer docs for CCCMO should be updated

Testing

see ~~OCPCLOUD-1782~~

Story OCPCLOUD-1778: Ensure MAO does not react to External platform type

View the Description View the linked PRs

Background

As described in the external platform enhancement , the machine-api-operator should be modified to react to the external platform type in the same manner as platform none.

Steps

add an extra clause to the platform switch that will group "External" with "None"

Stakeholders

openshift eng

Definition of Done

MAO behaves as if platform None when External is selected

Docs

developer docs for MAO should be updated

Testing

see ~~OCPCLOUD-1782~~

https://github.com/openshift/machine-api-operator/pull/1122

Feature OCPSTRAT-535: Apply user defined tags to all resources created by OpenShift (Azure) GA

View the Description

Feature Overview

Create a Azure cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in Azure) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.

Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.

Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").

Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.

Goals

Functionality on Azure Tech Preview
inclusion in the cluster backups
flexibility of changing tags during cluster lifetime, without recreating the whole cluster

Requirements

This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

List any affected packages or components.

Installer
Cluster Infrastructure
Storage
Node
NetworkEdge
Internal Registry
CCO

Epic CORS-2640: Apply user defined tags to all resources created by OpenShift (Azure) GA

View the Description

This is continuation of CORS-2249 / CFE-671 work, where support for Azure tags was delivered as TechPreview in 4.13 and to make it GA in 4.14. It would involve removing any reference to TechPreview in code and doc and to incorporate any feedback received from the users.

Story CFE-829: Remove all references about Azure tags TP made in code

View the Description View the linked PRs

Remove code references related to Azure Tags is for TechPreview in below list

installer/data/data/install.openshift.io_installconfigs.yaml (PR#6820)
installer/pkg/explain/printer_test.go (PR#6820)
installer/pkg/types/azure/platform.go (PR#6820)
installer/pkg/types/validation/installconfig.go (PR#6820)

https://github.com/openshift/installer/pull/7187

Feature OCPSTRAT-558: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic SDN-3514: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story SDN-3597: [alert] Netlink overflow

View the Description View the linked PRs

Create a severity warning alert to alert to admin that there is packet loss occurring due to failed ovs vswitchd lookups. This may occur if vswitchd is cpu constrained and there are also numerous lookups.

Use metric ovs_vswitchd_netlink_overflow which shows netlink messages dropped by the vswitchd daemon due to buffer overflow in userspace.

For the kernel equivalent, use metric ovs_vswitchd_dp_flows_lookup_lost . Both metrics usually have the same value but may differ if vswitchd may restart.

Both these metrics should be aggregate into a single alert if the value has increased recently.

DoD: QE test case, code merged to CNO, metrics document updated ( https://docs.google.com/document/d/1lItYV0tTt5-ivX77izb1KuzN9S8-7YgO9ndlhATaVUg/edit )

https://github.com/openshift/cluster-network-operator/pull/1630

Feature OCPSTRAT-559: Console customization of Web Terminal image and/or timeout period

View the Description

< High-Level description of the feature ie: Executive Summary >

Goals

< Who benefits from this feature, and how? What is the difference between today’s current state and a world with this feature? >

Requirements

Requirements	Notes	IS MVP

- (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

<Defines what is not included in this story>

Dependencies

< Link or at least explain any known dependencies. >

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

What does success look like?

< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact?>

QE Contact

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Impact

< If the feature is ordered with other work, state the impact of this feature on the other work>

Related Architecture/Technical Documents

<links>

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Epic ODC-7119: Allow cluster admin to provide default image and/or timeout period for all cluster users

View the Description

Problem:

There's no way in the UI for the cluster admin to

change the default timeout period for the Web Terminal for all users
select an image from an image repository to be used as the default image for the Web Terminal for all users

Goal:

Expose the ability for cluster admins to provide customization for all web terminal users through the UI which is available in wtoctl

Why is it important?

Acceptance criteria:

Cluster admin should be able to change the default timeout period for all new instances of the Web Terminal (it won't change settings)
Cluster admin should be able to provide a new image as the default image for all new instances of the Web Terminal (it won't change settings)

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Questions:

Where will this information be shared?
What CLI is used to accomplish this today? Get link to docs

Story ODC-7308: Update test cases of web terminal configuration page

View the Description View the linked PRs

Description

This is the follow up story for PR - https://github.com/openshift/console/pull/12718. Couple of tests, which are dependent on YAML are added as manual tests. Need to add proper tests for that.

Acceptance Criteria

Add automated tests instead of manual for PR - https://github.com/openshift/console/pull/12718.
After submit, switch to developer tab and comeback to web terminal tab and see the populated values. This should reflect the newly updated values.
Remove the delays added(cy.wait) in customization-of-web-terminal.ts[ |https://github.com/openshift/console/pull/12718/files#diff-bea278ba2b0622e97023a25a89e91b2194e3ba73824d81ea4b08046558ba8718]and make sure all the tests are passing
Write test cases for utils updatedWebTerminalExec and updatedWebTerminalTooling

Additional Details:

Refer PR - https://github.com/openshift/console/pull/12718 for more details

https://github.com/openshift/console/pull/12825

Story ODC-7315: Change help texts in initialize Terminal page

View the Description View the linked PRs

Description

Update the help texts in initialize Terminal page as below
**
1. "This Project will be used to initialize your command line terminal" to "Project used to initialize your command line terminal"

2. "Set timeout for the terminal." to "Pod timeout for your command line terminal"
3. "Set custom image for the terminal." to "Custom image used for your command line terminal

Acceptance Criteria

Additional Details:

https://github.com/openshift/console/pull/12824

Story ODC-7283: Add Web Terminal tab in cluster configuration page

View the Description View the linked PRs

Description

Allow cluster admin to provide default image and/or timeout period for all cluster users

Acceptance Criteria

Add Web Terminal tab in Cluster Configuration page under Developer tab
This tab should be visible only to cluster admins
Add 2 fields, one is to change default timeout and other is to change default image
Default values should be pre-populated in above fields from

Default Timeout -  WEB_TERMINAL_IDLE_TIMEOUT environment variable's value in the web-terminal-exec DevWorkspaceTemplate
Default Image - .spec.components[].container.image field in the web-terminal-tooling DevWorkspaceTemplate

5. Once user change this and save, need to update the same above resources(refer comment in epic https://issues.redhat.com/browse/ODC-7119 for more details)

6. If the user has read access to DevWorkspaceTemplate, then save button should not be enabled and if user don't have read access to DevWorkspaceTemplate then no need to show web terminal tab in configuration page

7. Add e2e tests

Additional Details:

Timeout and Image component should be similar to web terminal components (attached in ticket).
refer comment in epic https://issues.redhat.com/browse/ODC-7119 for more details

https://github.com/openshift/console/pull/12718

Feature OCPSTRAT-564: Document and Visualize Infrastructure and Hardware requirements for Hosted Control Planes

View the Description

Overview

HyperShift is being consumed by multiple providers. As a result, the need for documentation increases especially around infrastructure/hardware/resource requirements, networking, ..

Goal

Before the GA of Hosted Control Planes, we need to know/document:

What Infrastructure is managed by HCP?
What are the hardware requirements/prereqs for a hosted cluster?
What infrastructure resources get created by Kubernetes/OpenShift during hosted cluster lifecycle?
What networking requirements exist, e.g., open ports?
What are storage requirements?
What default quota limits are there (e.g., EIP default limits per region)? Do we tell the user to increase them in production?

DoD

The above questions are answered for all platforms we support, i.e., we need to answer for

[x] AWS
[x] Baremetal via the agent
[x] KubeVirt

Feature OCPSTRAT-578: Add support to NAT Gateway as outboundType for clusters in Azure (Technology Preview)

View the Description

Feature Overview (aka. Goal Summary)

Add support of NAT Gateways in Azure while deploying OpenShift on this cloud to manage the outbound network traffic and make this the default option for new deployments

Goals (aka. expected user outcomes)

While deploying OpenShift on Azure the Installer will configure NAT Gateways as the default method to handle the outbound network traffic so we can prevent existing issues on SNAT Port Exhaustion issue related to the configured outboundType by default.

Requirements (aka. Acceptance Criteria):

The installer will use the NAT Gateway object from Azure to manage the outbound traffic from OpenShift.

The installer will create a NAT Gateway object per AZ in Azure so the solution is HA.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Background

Using NAT Gateway for egress traffic is the recommended approach from Microsoft

This is also a common ask from different enterprise customers as with the actual solution used by OpenShift for outbound traffic management in Azure they are hitting SNAT Port Exhaustion issues.

Interoperability Considerations

Epic CORS-2564: Azure NAT Gateway support

View the Description

Epic Goal

Control Plane hosts should allow NAT Gateway for Internet egress for purposes of pulling images etc

Why is this important?

Our current architecture is limited to the Azure feature set from when OCP 4.x went GA on Azure and NAT Gateways were not an option
Today the preferred solution is to use a NAT Gateway for egress, see https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#scenarios

Scenarios

Install a new cluster, control plane hosts access the Internet via NAT Gateway rather than via the public load balancer
Install a new cluster, with user defined routing, control plane hosts access Internet via previously available UDR
Upgraded clusters maintain their existing architecture

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Open questions::

Control plane hosts are a must, but likely should just NAT gateway for all, need to understand pros/cons of doing so
It'd be nice to understand what a potential migration for legacy clusters to the new architecture looks like and what options we have to automate that in a non disruptive manner.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story CORS-2572: Azure: Update installer to use NAT gateway for egress

View the Description View the linked PRs

User Story:

As a administrator, I want to be able to:

Allow NAT Gateway as outboundType for clusters in Azure

so that I can achieve

Outbound access without exhausting SNAT ports

Acceptance Criteria:

Description of criteria:

NAT gateway as an outboundType in install-config

Engineering Details:

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/6933

Feature OCPSTRAT-588: oc mirror feature parity for IBM Power and IBM zSystem

View the Description

Feature Overview (aka. Goal Summary)

You can use the oc-mirror OpenShift CLI (oc) plugin to mirror all required OpenShift Container Platform content and other images to your mirror registry by using a single tool. It provides the following features:

Provides a centralized method to mirror OpenShift Container Platform releases, Operators, helm charts, and other images.
Maintains update paths for OpenShift Container Platform and Operators.
Uses a declarative image set configuration file to include only the OpenShift Container Platform releases, Operators, and images that your cluster needs.
Performs incremental mirroring, which reduces the size of future image sets.

This feature is track bring the oc mirror plugin to IBM Power and IBM zSystem architectures

Goals (aka. expected user outcomes)

Bring the oc mirror plugin to IBM Power and IBM zSystem architectures

Requirements (aka. Acceptance Criteria):

oc mirror plugin on IBM Power and IBM zSystems should behave exactly like it does on x86 platforms.

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic MULTIARCH-3395: oc mirror plugin (parity to x86)

View the Description

If this Epic is an RFE, please complete the following questions to the best of your ability:

Q1: Proposed title of this RFE

Support for oc mirror plugin (parity to x86)

Q2: What is the nature and description of the RFE?

oc mirror plugin will be the tool for mirror plugin

Q3: Why does the customer need this? (List the business requirements here)

install disconnected cluster without having x86 nodes available to manage the disconnected installation

Q4: List any affected packages or components

https://docs.openshift.com/container-platform/4.12/installing/disconnected_install/installing-mirroring-disconnected.html

Quay on the platform needs be available for saving the images.

Story MULTIARCH-3440: Build oc-mirror plugin on P

View the Description View the linked PRs

Ensure that oc-mirror plugin is buildable on P and that necessary PRs are filed to enable building alongside x86.
Populate https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/stable/ with ppc64le binary, so it is visible and downloadable from https://console.redhat.com/openshift/downloads#tool-oc-mirror-plugin

https://github.com/openshift/oc-mirror/pull/624

Feature OCPSTRAT-590: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic WRKLDS-700: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic OCPCLOUD-2022: Rebase Cluster Infrastructure Components onto 1.27

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Cluster Infrastructure owned components should be running on Kubernetes 1.27
This includes
- The cluster autoscaler (+operator)
- Machine API operator
  - Machine API controllers for:
    - AWS
    - Azure
    - GCP
    - vSphere
    - OpenStack
    - IBM
    - Nutanix
- Cloud Controller Manager Operator
  - Cloud controller managers for:
    - AWS
    - Azure
    - GCP
    - vSphere
    - OpenStack
    - IBM
    - Nutanix
- Cluster Machine Approver
- Cluster API Actuator Package
- Control Plane Machine Set Operator

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task OCPCLOUD-2042: Rebase/update to K8s 1.27 for Control Plane Machine Set Operator

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/213

Task OCPCLOUD-2058: Rebase/update to K8s 1.27 for Machine API Provider AWS

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-aws/pull/72

Bug OCPBUGS-13806: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1142

Task OCPCLOUD-2044: Rebase/update to K8s 1.27 for Cluster Machine Approver

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-machine-approver/pull/195

Task OCPCLOUD-2054: Rebase/update to K8s 1.27 for Machine API Provider IBM

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-ibmcloud/pull/21

Task OCPCLOUD-2046: Rebase/update to K8s 1.27 for Cloud Provider IBM

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-ibm/pull/51

Task OCPCLOUD-2057: Rebase/update to K8s 1.27 for Machine API Provider Azure

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-azure/pull/62

Task OCPCLOUD-2049: Rebase/update to K8s 1.27 for Cloud Provider GCP

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-gcp/pull/28

Task OCPCLOUD-2045: Rebase/update to K8s 1.27 for Cloud Provider Nutanix

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-nutanix/pull/16

Task OCPCLOUD-2050: Rebase/update to K8s 1.27 for Cloud Provider Azure

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-azure/pull/61

Task OCPCLOUD-2051: Rebase/update to K8s 1.27 for Cloud Provider AWS

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-aws/pull/41

Task OCPCLOUD-2053: Rebase/update to K8s 1.27 for Machine API Provider Nutanix

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-nutanix/pull/49

Task OCPCLOUD-2061: Rebase/update to K8s 1.27 for Cluster Autoscaler Operator

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-autoscaler-operator/pull/274

Task OCPCLOUD-2056: Rebase/update to K8s 1.27 for Machine API Provider GCP

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/machine-api-provider-gcp/pull/51

Task OCPCLOUD-2052: Rebase/update to K8s 1.27 for Cluster Cloud Controller Manager Operator

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/252

Task OCPCLOUD-2048: Rebase/update to K8s 1.27 for Cloud Provider vSphere

View the Description View the linked PRs

To align with the 4.14 release, dependencies need to be updated to 1.27. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cloud-provider-vsphere/pull/39

Epic STOR-1263: Update Control Plane Kubernetes Version to 1.27

View the Description View the linked PRs

Epic Goal*

What is our purpose in implementing this? What new capability will be available to customers?

Why is this important? (mandatory)

What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Epic STOR-1155: OCP 4.14 release chores

View the Description

Epic Goal

Update all images that we ship with OpenShift to the latest upstream releases and libraries.
Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.

Story STOR-1160: Chore: Update azure-file-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/azure-file-csi-driver/pull/29

Story STOR-1159: Chore: Update alibaba-cloud-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/alibaba-cloud-csi-driver/pull/29

Story STOR-1167: Chore: Update aws-ebs-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

Story STOR-1163: Chore: Update gcp-pd-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/gcp-pd-csi-driver/pull/37

Story STOR-1164: Chore: Update azure-disk-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/azure-disk-csi-driver/pull/43

Story STOR-1158: Chore: Update ibm-vpc-block-csi-driver to the latest release

View the Description View the linked PRs

Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.

This includes ibm-vpc-node-label-updater!

(Using separate cards for each driver because these updates can be more complicated)

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/38

Story STOR-1168: Chore: update libraries in all operators

View the Description View the linked PRs

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

Kubernetes:
- client-go
- controller-runtime
OCP:
- library-go
- openshift/api
- openshift/client-go
- operator-sdk

Operators:

aws-ebs-csi-driver-operator
aws-efs-csi-driver-operator
azure-disk-csi-driver-operator
azure-file-csi-driver-operator
openstack-cinder-csi-driver-operator
gcp-pd-csi-driver-operator
gcp-filestore-csi-driver-operator
csi-driver-manila-operator
vmware-vsphere-csi-driver-operator
alibaba-disk-csi-driver-operator
ibm-vpc-block-csi-driver-operator
csi-driver-shared-resource-operator
ibm-powervs-block-csi-driver-operator

cluster-storage-operator
cluster-csi-snapshot-controller-operator
local-storage-operator
vsphere-problem-detector

EOL, do not upgrade:

github.com/oVirt/csi-driver-operator

Story STOR-1169: Chore: update CSI sidecars

View the Description View the linked PRs

Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories

external-attacher
external-provisioner
external-resizer
external-snapshotter
node-driver-registrar
livenessprobe

Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.

This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.

Epic MCO-631: Upgrade to Kubernetes 1.27

View the Description

Epic Goal

The goal of this epic is to upgrade all OpenShift and Kubernetes components that MCO uses to v1.27 which will keep it on par with rest of the OpenShift components and the underlying cluster version.

Why is this important?

Uncover any possible issues with the openshift/kubernetes rebase before it merges.
MCO continues using the latest kubernetes/OpenShift libraries and the kubelet, kube-proxy components.
MCO e2e CI jobs pass on each of the supported platform with the updated components.

Acceptance Criteria

All stories in this epic must be completed.
Go version is upgraded for MCO components.
CI is running successfully with the upgraded components against the 4.14/master branch.

Dependencies (internal and external)

ART team creating the go 1.20 image for upgrade to go 1.20.
OpenShift/kubernetes repository downstream rebase PR merge.

Open questions::

Do we need a checklist for future upgrades as an outcome of this epic?-> yes, updated below.

Done Checklist

Step 1 - Upgrade go version to match rest of the OpenShift and Kubernetes upgraded components.
Step 2 - Upgrade Kubernetes client and controller-runtime dependencies (can be done in parallel with step 3)
Step 3 - Upgrade OpenShift client and API dependencies
Step 4 - Update kubelet and kube-proxy submodules in MCO repository
Step 5 - CI is running successfully with the upgraded components and libraries against the master branch.

Bug OCPBUGS-10115: Update 4.14 ose-machine-config-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-config-operator/pull/3598

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-config-operator/pull/3766

Epic OCPCLOUD-1986: Rebase Cluster API Components onto 1.26

View the Description

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Cluster Infrastructure owned components should be running on Kubernetes 1.26
This includes
- The cluster autoscaler (+operator)
- Machine API operator
  - Machine API controllers for:
    - AWS
    - Azure
    - GCP
    - vSphere
    - OpenStack
    - IBM
    - Nutanix
- Cloud Controller Manager Operator
  - Cloud controller managers for:
    - AWS
    - Azure
    - GCP
    - vSphere
    - OpenStack
    - IBM
    - Nutanix
- Cluster Machine Approver
- Cluster API Actuator Package
- Control Plane Machine Set Operator

Why is this important?

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task OCPCLOUD-1875: Rebase/update to K8s 1.26 for Cluster API provider vSphere

View the Description View the linked PRs

To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-provider-vsphere/pull/16

Task OCPCLOUD-1870: Rebase/update to K8s 1.26 for Cluster CAPI Operator

View the Description View the linked PRs

To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-capi-operator/pull/118

Task OCPCLOUD-1873: Rebase/update to K8s 1.26 for Cluster API Provider Azure

View the Description View the linked PRs

To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-provider-azure/pull/277

Task OCPCLOUD-1874: Rebase/update to K8s 1.26 for Cluster API Provider GCP

View the Description View the linked PRs

To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository

Task OCPCLOUD-1871: Rebase/update to K8s 1.26 for Cluster API Operator

View the Description View the linked PRs

To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository

https://github.com/openshift/cluster-api-operator/pull/22

Feature OCPSTRAT-592: PXE Support for Agent

View the Description

Feature Overview

Agent-based installer requires to boot the generated ISO on the target nodes manually. Support for PXE booting will allow customers to automate their installations via their DHCP/PXE infrastructure.

This feature allows generating installation ISOs ready to add to a customer-provided DHCP/PXE infrastructure.

Goals

As an OpenShift installation admin I want to PXE-boot the image generated by the openshift-install agent subcommand

Why is this important?

We have customers requesting this booting mechanism to make it easier to automate the booting of the nodes without having to actively place the generated image in a bootable device for each host.

Epic AGENT-356: Allow PXE-booting the agent image

View the Description View the linked PRs

Epic Goal

As an OpenShift installation admin I want to PXE-boot the image generated by the openshift-install agent subcommand

Why is this important?

We have customers requesting this booting mechanism to make it easier to automate the booting of the nodes without having to actively place the generated image in a bootable device for each host.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

https://github.com/openshift/installer/pull/6927

Story AGENT-510: Support interactive network console when PXE booting

View the Description View the linked PRs

As a user of the Agent-based Installer(ABI), I want to be able to perform the customizations via agent-tui in case of PXE booting so that I can modify network settings.

Implementation details:

Create a new baseImage asset that gets inherited by agentImage and agentpxefiles. The baseImage prepares the initrd along with the necessary ignition and the network tui which is now read by agentImage and agentpxefiles.

https://github.com/openshift/installer/pull/7185

Story AGENT-627: Decompress kernel on ARM

View the Description View the linked PRs

ARM kernels are compressed with gzip, but most versions of ipxe cannot handle this (it's not clear what happens with raw pxe). See https://github.com/coreos/fedora-coreos-tracker/issues/1019 for more info.

If the platform is aarch64 we'll need to decompress the kernel like we do in https://github.com/openshift/machine-os-images/commit/1ed36d657fa3db55fc649761275c1f89cd7e8abe

https://github.com/openshift/installer/pull/7276

Task AGENT-491: Support pxe base url

View the Description View the linked PRs

The new command {{agent create pxe-files }} reads - pxe-base-url from the agent-config.yaml. The field will be optional in the yaml file. If the URL is provided, then the command will generate an ipxe script specific to the given URL.

https://github.com/openshift/installer/pull/6723

Task AGENT-498: Get iPXE script template kernel parameters from ISO

View the Description View the linked PRs

Currently, we have the kernel parameters in the iPXE script statically defined from what Assisted Service generates. If the default parameters were to change in RHCOS that would be problematic. Thus, it would be much better if we were to extract them from the ISO.

The kernel parameters in the ISO are defined in EFI/redhat/grub.cfg (UEFI) and /isolinux/isolinux.cfg (legacy boot)

https://github.com/openshift/installer/pull/7150

Task AGENT-567: Re-enable 'create pxe-files' command

View the Description View the linked PRs

Revert the PR https://github.com/openshift/installer/pull/6927

https://github.com/openshift/installer/pull/7102

Feature OCPSTRAT-597: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic AGENT-569: Platform none support

View the Description

Epic Goal

Support deploying multi-node clusters using platform none.

Why is this important?

As of Jan 2023 we have almost 5,000 clusters reported using platform none installed on-prem (metal, vmware or other hypervisors with no platform integration) out of a total of about 12,000 reported clusters installed on-prem.

Platform none is desired by users to be able to install clusters across different host platforms (e.g. mixing virtual and physical) where Kubernetes platform integration isn't a requirement.

A goal of the Agent-Based Installer is to help users who currently can only deploy their topologies with UPI to be able to use the agent-based installer and get a simpler user experience while keeping all their flexibility.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task AGENT-648: Remove validation check limiting None platform to SNO

View the Description View the linked PRs

Currently there are validation checks for platform None in OptionalInstallConfig that limits the None platform to 1 control plane replica, 0 compute replicas, and the NetworkType to OVNKubernetes.

These validation should be removed so that the None platform can be installed on clusters of any configuration.

Acceptance Criteria:

SNO cluster should still continue to work.
SNO validation should still check only OVNKubernetes network type is allowed
A compact or HA cluster can be installed with platform None, given the user has configured and deployed an external load balancer.

https://github.com/openshift/installer/pull/7236

Feature OCPSTRAT-601: Make S3 deletion optional for AWS

View the Description

Feature Overview (aka. Goal Summary)

Add support to the Installer to make the S3 bucket deletion process during cluster bootstrap on AWS optional.

Goals (aka. expected user outcomes)

Allow the user to opt-out for deleting the S3 bucket created during the cluster bootstrap on AWS.

Requirements (aka. Acceptance Criteria):

The user will be able to opt-out from deleting the S3 bucket created during the cluster bootstrap on AWS via the install-config manifest so the Installer will not try to delete this resource when destroying the bootstrap instance and the S3 bucket.

The actual behavior will remain the default behavior while deploying OpenShift on AWS and both the bootstrap instance and the S3 bucket will be removed unless the user has opted-out for this via the install-config manifest.

Background

Some ROSA customers have SCP policies that prevent the deletion of any S3 bucket preventing ROSA adoption for these customers.

Documentation Considerations

There will be documentation required for this feature to explain how to prevent the Installer to remove the S3 bucket as well as an explanation on the security concerns while doing this since sensible the Installer will leave sensible data used to bootstrap the cluster in the S3 bucket.

Interoperability Considerations

Epic CORS-2598: Make S3 deletion optional for AWS

View the Description

Epic Goal

Allow the user to opt-out of deleting the S3 bucket created during the cluster bootstrap on AWS.

Why is this important?

Some ROSA customers have SCP policies that prevent the deletion of any S3 bucket preventing ROSA adoption for these customers.

Scenarios

As a user, I want to be able to instruct the Installer to keep the S3 bucket created during the cluster bootstrap so I can be compliant with my security policies where the account used to deploy OpenShift has not the privileges to remove any S3 bucket.

Acceptance Criteria

CI - MUST be running successfully with tests automated

Story CORS-2700: Allow users to make S3 deletion optional

View the Description View the linked PRs

User Story:

As a developer, I want to:

Make the deletion of S3 buckets in AWS optional during bootstrap destroy

so that I can

successfully install clusters in restricted environments.

Acceptance Criteria:

Description of criteria:

A field is added in the install config for the users to set the S3 deletion to optional.
Once the field is set, the bootstrap destroy stage does not delete the S3 buckets.

Engineering Details:

Adding a field in the install config and piping it to terraform.
If the S3 bucket creation is done in the cluster stage instead of bootstrap, the destroy bootstrap code does not delete the bucket.
Might need to create two instances of S3 buckets, one in the bootstrap and one in the cluster stage and control which one is created using the new field in the install config.

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/installer/pull/7288

Feature OCPSTRAT-604: Support for v1 Tekton API in OpenShift Console

View the Description

Feature Overview (aka. Goal Summary)

Console Support of OpenShift Pipelines Migration to Tekton v1 API

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic ODC-7259: Support for v1 Tekton API in Console

View the Description

Description of problem

Pipeline API version is upgrading to v1 with Red Hat Pipeline operator 1.11.0 release.
https://tekton.dev/vault/pipelines-main/migrating-v1beta1-to-v1/

Acceptance Criteria

Remove Resources tab from the pipeline details page
Remove Resources section in pipeline builder form

Questions

Does this have to be backward compatible?
Will the features be equivalent? Will the UX / tests / documentation have to be updated?

Story ODC-7277: Add support for API version v1 for Pipeline

View the Description View the linked PRs

Description

As a user,

Acceptance Criteria

should add support for API version v1 for Pipeline as per the doc https://tekton.dev/vault/pipelines-main/migrating-v1beta1-to-v1/
Update the tests and test data

Additional Details:

https://github.com/openshift/console/pull/12729

Bug OCPBUGS-14837: Latest pipeline plugin doesn't support old Pipelines Operator?

View the Description View the linked PRs

Description of problem:
When trying the old pipelines operator with the latest 4.14 build I couldn't see the Pipelines navigation items. The operator provides the Pipeline v1beta1, not v1.

Version-Release number of selected component (if applicable):
4.14 master only after https://github.com/openshift/console/pull/12729 was merged

How reproducible:
Always?

Steps to Reproduce:

Setup a 4.14 nightly cluster
Install Pipelines operator from https://artifacts.ospqa.com/builds/ (tested with 1.9.2/v4.13-2303061437)

Actual results:

Pipelines navigation wasn't shown

Expected results:

Pipelines plugin should work also with this Pipelines operator version?

Additional info:

https://github.com/openshift/console/pull/13007

Feature OCPSTRAT-621: Include Serverless Function Samples in Sample Catalog

View the Description

Feature Overview (aka. Goal Summary)

An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic ODC-7241: Include Serverless Function Samples in our Sample Catalog

View the Description

Problem:

As a developer of serverless functions, we don't provide any samples.

Goal:

Provide Serverless Function samples in the sample catalog. These would be utilizing the Builder Image capabilities.

Why is it important?

Use cases:

<case>

Acceptance criteria:

<criteria>

Dependencies (External/Internal):

Serverless team would need to provide sample repo for serverless function
Samples operator would need to be update

Design Artifacts:

Exploration:

Note:

Need to define the API and confirm with other stakeholders - need to support a serverless func image stream "tag"
Serverless team will need to provide updates to the existing Image Streams, as well as maintain the sample repositories which are referenced in the Image Streams.
Need to understand the relationship between ImageStream and Image Stream Tag
Should serverless function samples in the catalog have "builder image" tag? or should it be "serverless function"

Story ODC-7333: Update console-operator to latest API to provide the ConsoleSample CRD, add RBAC permissions

View the Description View the linked PRs

Description

As an operator author, I want to provide additional samples that are tied to an operator version, not an OpenShift release. For that, I want to create a resource to add new samples to the web console.

Acceptance Criteria

openshift/console-operator update so that new clusters have the new ConsoleSample CRD
Add RBAC permissions (roles and rolebinding?) so that all users have access to ConsoleSample resources

Additional Details:

Story ODC-7334: Show all ConsoleSample resources in the samples catalog

View the Description View the linked PRs

Description

Acceptance Criteria

Load all cluster-scoped ConsoleSamples resources and show them in the sample catalog
Filter duplicates based on the localization annotations (see enhancement proposal)
1. All localization labels are optional
2. Fallback for the name annotation should be metadata.name
3. Fallback for the language should be english/no annotation
4. Create a utils function with some unit tests
Ensure that the Samples Import also works with Serverless functions (func.yaml detection)
Show the new VSCode and IntelliJ extension cards from the "Add Serverless function" when importing a Serverless function sample.
Provide some ConsoleSample YAMLs in the PR description

Additional Details:

https://github.com/openshift/console/pull/12970

Feature OCPSTRAT-646: OCP on Arm for GCP

View the Description

Feature Overview (aka. Goal Summary)

As Arm adoption grows OpenShift on Arm is a key strategic initiative for Red Hat. Key to success is the support of all key cloud providers adopting this technology. Google have announced support for Arm in their GCP offering and we need to support OpenShift in this configuration.

Goals (aka. expected user outcomes)

The ability to have OCP on Arm running in a GCP instance

Requirements (aka. Acceptance Criteria):

OCP on Arm running in a GCP instance

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic ARMOCP-15: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OSDOCS-6319: Doc enablement for OCP on ARM for GCP

View the Description

Description:

Update 4.14 documentation to reflect new GCP support on ARM machines.

Updates:

Add google instance types for ARM
Add config parameters
Supported installation platforms
Release note

Acceptance criteria:

Dev and QE ack
PR is merged

Sub-task OSDOCS-6880: Create .md file for GCP ARM64 instances

View the Description View the linked PRs

Description:

In order to add instance types to the OCP documentation, there needs to be a .md file in the OpenShift installer repo that contains the 64-bit ARM machine types that have been texted and are supported on GCP.

Create a PR in the OpenShift installer repo that creates a new .md file that shows the supported instance types

Acceptance criteria:

Dev and QE ack from ARM side
Dev and QE ack from Installer side
Approval from installer product manager
PR is merged and ready to be used for OCP docs referencing

https://github.com/openshift/installer/pull/7320

Feature OCPSTRAT-652: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic HOSTEDCP-1033: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story HOSTEDCP-1062: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic HOSTEDCP-993: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story HOSTEDCP-1061: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2722

Epic HOSTEDCP-1037: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story HOSTEDCP-1063: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic HOSTEDCP-1066: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story HOSTEDCP-1090: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2778

Epic HOSTEDCP-992: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2662

Story HOSTEDCP-1060: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2668

Epic HOSTEDCP-200: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story HOSTEDCP-1065: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Epic HOSTEDCP-1038: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story HOSTEDCP-1064: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2792

Feature OCPSTRAT-666: Azure File CSI NFS support

View the Description

Feature Overview (aka. Goal Summary)

Azure File CSI supports both SMB and NFS protocol. Currently we only support SMB and there is a strong demand from IBM and individual customers to support NFS for posix compliance reasons.

Goals (aka. expected user outcomes)

Support Azure File CSI with NFS.

The Azure File operator will not automatically create a NFS storage class, we will document how to create one.

Requirements (aka. Acceptance Criteria):

There are some concerns on the way Azure File CSI deals with NFS. They don't respect the FSGroup policy supplied in the pod definition. This breaks kubernetes convention where a pod should be able to define its own FSGroup policy, instead Azure File CSI set a per driver policy that pods can't override.

We brought up this problem to MSFT but there is no fix planned on the driver, given the pressure from the field we are going to support NFS with a on root mismatch default and document this specific behavior in our documentation.

Use Cases (Optional):

As an OCP on Azure admin i want my user to be able to consume NFS based PVs through Azure File CSI.

As an OCP on Azure user i want to attach NFS based PVs to my pods.

As an ARO customer I want to consume NFS based PVs.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

Running two drivers, one for NFS and one for SMB to solve the FSGroup issue.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

This feature is candidate to be backported up to 4.12 if possible.

Documentation Considerations

Document that Azure File CSI NFS is supported, how to create a storage class as well as the FSGroup issue.

Epic STOR-989: Azure Files NFS Support (GA)

View the Description View the linked PRs

Azure File NFS Supportability
We currently have a CSI driver for Azure Files that supports SMB connectivity. In the interested of maintaining POSIX-compliance, supported NFS connectivity would be required.
The goal would be to have supported parity with the current AWS offerings that use "AWS EFS Provisioner" to automatically provision NFS volumes.

It's been decided to support the driver as it is today (see spike ~~STOR-992~~) knowing it violates fsGroupChangePolicy kubernetes standard where a pod is able to decide what FS group policy should be applied. Azure File with NFS applies a FS group policy at the driver level and pods cannot override it. We will keep the driver's default (on root mismatch) and document this non conventional behavior. Also, the Azure File CSI operator will not create a storage class for NFS, admins will need to create it manually this will be documented.

There is no need to specific development in the driver nor the operator, engineering will make sure we have a working CI.

https://github.com/openshift/azure-file-csi-driver-operator/pull/58

Feature OCPSTRAT-688: Enable privileged containers to view rootfs of other containers

View the Description

1. Proposed title of this feature request

Enable privileged containers to view rootfs of other containers

2. What is the nature and description of the request?

The skip_mount_home=true field in the /etc/containers/storage.conf causes the mount propegation of container mounts to not be private, which allows privileged containers to access the rootfs of other containers. This is a fix for bug 2065283 (see comment #32 [2]).

This RFE is to enable that field by default in Openshift, as well as verify there are no performance regressions when applying it.

3. Why does the customer need this? (List the business requirements here)

Customer's use case:

Our agent runs as a daemonset in k8s clusters and monitors the node.
Running with mount propagation set to HostToContainer allows the agent to access any container file, also containers which start running after agent startup. With this settings, when a new container starts, a new mount is created and added to the host mount namespace and also to the agent container and by that the agent can access the container files
e.g. the agent is mounted to /host and can access to the filesystem of other container by path
/host/var/lib/containers/storage/overlay/xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/test_file

This approach works in k8s clusters and OpenShift 3, but not in OpenShift 4. How can I make the agent pod to get noticed about any new mount which was created on the node and get access to it as well?

The workaround for that was provided in bug 2065283 (see comment #32 [2]).

4. List any affected packages or components.

CRI-O, Node, MCO.

Additional information in this Slack discussion [3].

[1] https://docs.openshift.com/container-platform/4.11/post_installation_configuration/machine-configuration-tasks.html#create-a-containerruntimeconfig_post-install-machine-configuration-tasks
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2065283#c32
[3] https://coreos.slack.com/archives/CK1AE4ZCK/p1670491480185299

Epic OCPNODE-1713: Enable privileged containers to view rootfs of other containers

View the Description

Epic Goal

use the `skip_mount_home` parameter of /etc/containers/storage.conf to allow containers to see other container's rootfs

Why is this important?

https://issues.redhat.com/browse/OCPSTRAT-688 / https://issues.redhat.com/browse/RFE-3527

Scenarios

As an author to a node agent, I would like my pods to be able to inspect the rootfs of other containers to gain insight into their behavior.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

https://issues.redhat.com/browse/PERFSCALE-2249

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story OCPNODE-1714: set skip_mount_home in openshift /etc/containers/storage.conf template

View the linked PRs

https://github.com/openshift/machine-config-operator/pull/3777

Feature OCPSTRAT-694: Ability to choose SCC for workload so as to avoid SCC preemption

View the Description

Feature Overview (aka. Goal Summary)

Currently, SCCs are part of the OpenShift API and are subject to modifications by customers. This leads to a constant stream of issues:

Modifications of out-of-the-box SCCs cause core workloads to malfunction
Addition of new higher priority SCCs may overrule existing pinned out-of-the-box SCCs during SCC admission and cause core workloads to malfunction

Goals (aka. expected user outcomes)

Create a way to prevent SCC preemption and modifications of out-of-the-box SCCs

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic AUTH-132: SCC protection mechanisms

View the Description

Summary (PM+lead)

Currently, SCCs are part of the OpenShift API and are subject to modifications by customers. This leads to a constant stream of issues:

Modifications of out-of-the-box SCCs may cause core workloads to malfunction
Addition of new higher priority SCCs may overrule existing pinned out-of-the-box SCCs during SCC admission and cause core workloads to malfunction

We need to find and implement schemes to protect core workloads while retaining the API guarantee for modifications of SCCs (unfortunately).

Motivation (PM+lead)

Goals (lead)

Non-Goals (lead)

Deliverables

Proposal (lead)

User Stories (PM)

Dependencies (internal and external, lead)

Previous Work (lead)

Open questions (lead)

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Task AUTH-401: Write e2e tests for required SCC annotation functionality

View the linked PRs

https://github.com/openshift/origin/pull/28092

Task AUTH-408: enable required-scc in openshift-apiserver

View the linked PRs

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/544

Feature OCPSTRAT-70: OCP Console support for short-lived token enablement of OLM-managed operators using AWS STS

View the Description

Feature Overview

Users of the OpenShift Console leverage a streamlined, visual experience when discovering and installing OLM-managed operators in clusters that run on cloud providers with support for short-lived token authentication enabled. Users are intuitively becoming aware when this is the case and are put on the happy path to configure OLM-managed operators with the necessary information to support AWS STS.

Goals:

Customers do not need to re-learn how to enable AWS STS authentication support for each and every OLM-managed operator that supports it. The experience is standardized and repeatable so customers spend less time with initial configuration and more team implementing business value. The process is so easy that OpenShift is perceived as enabler for an increased security posture.

Requirements:

based on ~~OCPBU-559~~ and ~~OCPBU-560~~, the installation and configuration experience for any OLM-managed operator using short-lived token authentication is streamlined using the OCP console in the form of a guided process that avoids misconfiguration or unexpected behavior of the operators in question
the OCP Console helps in detecting when the cluster itself is already using AWS STS for core functionality
the OCP Console helps discover operators capable of AWS STS authentication and their IAM permission requirements
the OCP Console drives the collection of the required information for AWS STS authentication at the right stages of the installation process and stops the process when the information is not provided
the OCP Console implements this process with minimal differences across different cloud providers and is capable of adjusting the terminology depending on the cloud provider that the cluster is running on

Use Cases:

High-level mockups found here: Operators & STS
A cluster admin browses the OperatorHub catalog and looks at the details view of a particular operator, there they discover that the cluster is configured for AWS STS
A cluster admin browsing the OperatorHub catalog content can filter for operators that support the AWS STS flow described in ~~OCPSTRAT-171~~
A cluster admin reviewing the details of a particular operator in the OperatorHub view can discover that this operator supports AWS STS authentication
A cluster admin installing a particular operator can get information about the AWS IAM permission requirements the operator has
A cluster admin installing a particular operator is asked to provide AWS ARN that is required for AWS STS prior to the actual installation step and is prevented from continuing without this information
A cluster admin reviewing an installed operators with support forAWS STS can discover the related CredentialRequest object that the operator created in an intuitive way (not generically via related objects that have an ownership reference or as part of the InstallPlan)

Out of Scope

update handling and blocking in case of increased permission requirements in the next / new version of the operator
more complex scenarios with multiple IAM roles/service principals resulting in multiple CredentialRequest objects used by a single operator

Background

The OpenShift Console today provides little to no support for configuring OLM-managed operators for short-lived token authentication. Users are generally unaware if their cluster runs on a cloud provider and is set up to use short-lived tokens for its core functionality and users are not aware which operators have support for that by implementing the respective flows defined in ~~OCPBU-559~~ and ~~OCPBU-560~~.

Customer Considerations

Customers may or may not be aware about short-lived token authentication support. They need to proper context and pointers to follow-up documentation to explain the general concept and the specific configuration flow the Console supports. It needs to become clear that the Console cannot 100% automate the overall process and some steps need to be run outside of the cluster/Console using Cloud-provider specific tooling.

Epic PORTENABLE-471: OSD STS Enablement Phase 1 (console)

View the Description

This epic is tracking the console work needed for STS enablement. As well as documentation needed for enabling operator teams to use this new flow. This does not track Hypershift inclusion of CCO.

Plan is to backport to 4.12

install flow:

User knows which operators do and don’t support STS on a ROSA STS cluster
User installs Operator
UI has an option to add a RoleARN the sub.config.env for the operator to add to the CredentialRequest during install
Operator creates CredentialsRequest with AWS_ROLE_ARN
Operator watches secret with special name (TBD) in the namespace
- Secret name propagated to operator via CRD field (example)
- Mount the bound service account token on the deployment (example)

CCO creates a secret in the pod namespace based on the CredentialsRequest created
Operator extracts AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN from the secret
Operator is able to configure the cloud provider SDK

Story CONSOLE-3611: User install flow includes STS configuration

View the Description View the linked PRs

As a user of the console, I would like to provide the required fields for tokenized auth at install time (wrapping and providing sane defaults for what I can do manually in the CLI).

The role ARN provided by the user should be added to the service account of the installed operator as an annotation.

Only manual subscription is supported in STS mode - the automatic option should be not be the default or should be grey'd out entirely

AC: Add input field to the operator install page, where user can provide the `roleARN` value. This value will be set on the operator's Subscription resource, when installing operator.

https://github.com/openshift/console/pull/12779

Story CONSOLE-3612: User is informed in the OperatorHub UI that the cluster is in STS mode

View the Description View the linked PRs

STS - Security Token Service

Cluster is in STS mode when:

AWS
credentialsMode in the `cloudcredential` resource is "Manual"
serviceAccountIssuer is non empty

AC: Inform user on the Operator Hub item details that the cluster is in the STS mode

https://github.com/openshift/console/pull/12777

Story CONSOLE-3610: Users know which Operators in OperatorHub support STS

View the Description View the linked PRs

As a user of the console I would like to know which operators are safe to install (i.e. support tokenized auth or don't talk to the cloud provider).

AC: Add filter to the Operator Hub for filtering operators which have Short Lived Token Enabled

https://github.com/openshift/console/pull/12778

Feature OCPSTRAT-746: PSa enforcement deliverables in 4.14

View the Description

Feature Overview (aka. Goal Summary)

Upstream K8s deprecated PodSecurityPolicy and replaced it with a new built-in admission controller that enforces the Pod Security Standards (See here for the motivations for deprecation).] There is an OpenShift-specific dedicated pod admission system called Security Context Constraints. Our aim is to keep the Security Context Constraints pod admission system while also allowing users to have access to the Kubernetes Pod Security Admission.

With OpenShift 4.11, we are turned on the Pod Security Admission with global "privileged" enforcement. Additionally we set the "restricted" profile for warnings and audit. This configuration made it possible for users to opt-in their namespaces to Pod Security Admission with the per-namespace labels. We also introduced a new mechanism that automatically synchronizes the Pod Security Admission "warn" and "audit" labels.

With OpenShift 4.15, we intend to move the global configuration to enforce the "restricted" pod security profile globally. With this change, the label synchronization mechanism will also switch into a mode where it synchronizes the "enforce" Pod Security Admission label rather than the "audit" and "warn".

In OpenShift 4.14, we intend to deliver functionality in code that will help accelerate moving to PSA enforcement. This feature tracks those deliverables.

Goals (aka. expected user outcomes)

The observable functionality that the user now has as a result of receiving this feature. Complete during New status.

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic AUTH-412: PSa enforcement deliverables in 4.14

View the Description

Epic Goal*

Deliver tools and code that helps toward PSa enforcement

Why is this important? (mandatory)

What are the benefits to the customer or Red Hat? Does it improve security, performance, supportability, etc? Why is work a priority?

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

Dependencies (internal and external) (mandatory)

What items must be delivered by other teams/groups to enable delivery of this epic.

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Story AUTH-413: Disable pod security label syncer for namespaces modified by the customer

View the Description View the linked PRs

What

Don't enforce system defaults on a namespace's pod security labels, if it is managed by a user.

Why

If the managedFields (https://kubernetes.io/docs/reference/using-api/server-side-apply/#field-management) indicate that a user changed the pod security labels, we should not enforce system defaults.

A user might not be aware that the label syncer can be turned off and tries to manually change the state of the pod security profiles.

This fight between a user and the label syncer can cause violations.

https://github.com/openshift/cluster-policy-controller/pull/127

Feature OCPSTRAT-77: Improved user telemetry from within OpenShift Console

View the Description

< High-Level description of the feature ie: Executive Summary >

Goals

< Who benefits from this feature, and how? What is the difference between today’s current state and a world with this feature? >

Requirements

Requirements	Notes	IS MVP

- (Optional) Use Cases

< What are we making, for who, and why/what problem are we solving?>

Out of scope

<Defines what is not included in this story>

Dependencies

< Link or at least explain any known dependencies. >

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

What does success look like?

< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact?>

QE Contact

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Impact

< If the feature is ordered with other work, state the impact of this feature on the other work>

Related Architecture/Technical Documents

<links>

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

Epic ODC-7248: [4.14] Improved user telemetry from within Console

View the Description

Goal

Additional improvements to segment, to enable the proper gathering of user telemetry and analysis

Problem

Currently, we have no accurate telemetry of the OpenShift Console usage across all fleet clusters. We should be able to utilize the auth and console telemetry to glean details which will allow us to get a picture of console usage by our customers.

There is no way to properly track specific pages

Page titles are localized
Details pages include the project name

Acceptance criteria

User telemetry page title for all the resource details pages should be changed to resource · tab-name format. Product name should not be part of user telemetry page title
Page title in UI for all the resource details pages should be changed to resource-name . resource . tab-name . Product-name format
User telemetry page title should be non-translated value for tracking purpose
Page title in UI should be translated value

Note:

do we need to do anything to be GDPR compliant?

Story ODC-7270: Change page title for all resource details pages to {resource-name} · {resource} · {tab-name} · OKD

View the Description View the linked PRs

Description

Change page title for all resource details pages to {resource-name} · {resource} · {tab-name} · OKD

Acceptance Criteria

Page title for all the resource details pages should be changed to {resource-name} · {resource} · {tab-name} · OKD format
If details page does not have tabs, then {tab-name} can be just "Details"
Page title should be translated value

Additional Details:

Need to check all the resource pages which have details page and change the title.

https://github.com/openshift/console/pull/12669

Story ODC-7272: Add non-translated title in user telemetry page event to have non-translated title in {resource-name} · {resource} · {tab-name} · OKD format

View the Description View the linked PRs

Description

Update page title to have non-translated title in {resource-name} · {resource} · {tab-name} · OKD format

All page titles of resource details page to be added as a non-translated value in {resource-name} · {resource} · {tab-name} · OKD format inside <title> component as attribute with name for ex, data-title-id and use this value in fireUrlChangeEvent to send it as title for telemetry page event. Refer spike https://issues.redhat.com/browse/ODC-7269 for more details

Acceptance Criteria

Add data-title-id attribute for all the resource details page title component
Use data-title-id as title value while sending URL change event to telemetry
If data-title-id attribute value is not present in title use page title value

Additional Details:

Refer spike https://issues.redhat.com/browse/ODC-7269 for more details

https://github.com/openshift/console/pull/12669

Story ODC-7300: Change method name for labelKeyForNodeKind to getTitleForNodeKind

View the Description View the linked PRs

labelKeyForNodeKind now returns translated value, before it used to return label key. So Change method name for labelKeyForNodeKind to getTitleForNodeKind

https://github.com/openshift/console/pull/12733

Feature OCPSTRAT-779: Support mirroring the multi-release payload

View the Description

Feature Overview (aka. Goal Summary)

One of the steps in doing a disconnected environment install is to mirror the images to a designated system. This feature enhances oc-mirror to not handle the multi release payload, that is the payload that contains all the platform images (x86, Arm, IBM Power, IBM Z). This is a key feature towards supporting disconnected installs in a multi-architecture compute i.e. mixed architecture cluster environment.

Goals (aka. expected user outcomes)

Customers will be able to use oc-mirror to enable the multi payload in a disconnected environment.

Requirements (aka. Acceptance Criteria):

Allow oc-mirror to mirror the multi release payload

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic IR-358: Manifest list improvements

View the Description

Epic Goal

Add 'oc new-app' support for creating image streams with manifest list support
Add 'oc new-build' support for creating image streams with manifest list support

Why is this important?

oc commands that create image streams should work correctly on multi-arch clusters

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Story IR-300: As an Openshift user I want to use `oc new-app` to deploy on heterogeneous clusters when working with manifestlisted images

View the Description View the linked PRs

ACCEPTANCE CRITERIA

When creating a workload with 'oc new-app' that points to a manifest-listed image, users should be able to set an "--import-mode=" flag to 'PreserveOriginal' in order to preserve all architectures of the manifest list
'oc new-app --name <name> <manifestlist-image> --import-mode=PreserveOriginal' should not cause pods to fail due to being the incorrect architecture
Ensure node scheduling happens properly on a heterogeneous cluster when running 'oc new-app' with '--import-mode=PreserveOriginal'

ImportMode api reference: https://github.com/openshift/api/blob/master/image/v1/types.go#L294

Original issue and discussion: https://coreos.slack.com/archives/CFFJUNP6C/p1664890804998069

https://github.com/openshift/oc/pull/1353

Story IR-301: As an Openshift user I want to use `oc new-build` to trigger builds on heterogeneous clusters when working with manifestlisted images

View the Description View the linked PRs

ACCEPTANCE CRITERIA

When creating a build with 'oc new-build' that points to a manifest-listed image, users should be able to set an "--import-mode=" flag to 'PreserveOriginal' to preserve all architectures of the manifest list and let any builder pods build from the manifestlisted image.
~~'oc new-build' should not cause pods to fail due to being the incorrect architecture~~
~~Ensure node scheduling happens properly on a heterogeneous cluster when running 'oc new-build'~~

ImportMode api reference: https://github.com/openshift/api/blob/master/image/v1/types.go#L294

https://github.com/openshift/oc/pull/1353

Feature OCPSTRAT-781: Support autoscaling from zero for heterogeneous compute node machine sets

View the Description

Feature Overview (aka. Goal Summary)

With this feature it will be possible to autoscale from zero, that is have machinesets that create new nodes without any existing current nodes, for use in a mixed architecture cluster configured with multi-architecture compute

Goals (aka. expected user outcomes)

To be able to create a machineset and scale from zero in a mixed architecture cluster environment

Requirements (aka. Acceptance Criteria):

Create a machineset and scale from zero in a mixed architecture cluster environment

Use Cases (Optional):

Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.

Questions to Answer (Optional):

Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.

Out of Scope

High-level list of items that are out of scope. Initial completion during Refinement status.

Background

Provide any additional context is needed to frame the feature. Initial completion during Refinement status.

Customer Considerations

Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.

Documentation Considerations

Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.

Interoperability Considerations

Epic MIXEDARCH-129: Support autoscaling from zero for heterogeneous worker machinesets

View the Description View the linked PRs

Filing a ticket based on this conversation here: https://github.com/openshift/enhancements/pull/1014#discussion_r798674314

Basically the tl;dr here is that we need a way to ensure that machinesets are properly advertising the architecture that the nodes will eventually have. This is needed so the autoscaler can predict the correct pool to scale up/down. This could be accomplished through user driven means like adding node arch labels to machinesets and if we have to do this automatically, we need to do some more research and figure out a way.

https://github.com/openshift/machine-api-operator/pull/1147

Story MIXEDARCH-256: Implement automatic setup of the kubernetes.io/arch label in the Machine API (AWS)

View the Description View the linked PRs

For autoscaling nodes in a multi-arch compute cluster, node architecture needs to be taken into account because such a cluster could potentially have nodes of upto 4 different architectures. Labels can be propagated today from the machineset to the node group, but they have to be injected manually.

This story explores whether the autoscaler can use cloud provider APIs to derive the architecture of an instance type and set the label accordingly rather than it needing to be a manual step.

https://github.com/openshift/machine-api-provider-aws/pull/71

Story MIXEDARCH-257: Implement automatic setup of the kubernetes.io/arch label in the Machine API (Azure)

View the Description View the linked PRs

This story explores whether the autoscaler can use cloud provider APIs to derive the architecture of an instance type and set the label accordingly rather than it needing to be a manual step.

https://github.com/openshift/machine-api-provider-azure/pull/65

Feature OCPSTRAT-82: Finalise vSphere CSI migration

View the Description

Feature Overview (aka. Goal Summary)

In 4.13 the vSphere CSI migration is in hybrid state. Greenfield 4.13 clusters have migration enabled by default while upgraded clusters have it turned off unless explicitely enabled by an administrator (referred as "opt-in").

This feature tracks the final work items required to enable vSphere CSI migration for all OCP clusters.

More information on the 4.13 vSphere CSI migration is available in the internal FAQ

Goals (aka. expected user outcomes)

Finalise vSphere CSI migration for all clusters ensuring that

Greenfield 4.14 clusters have migration enabled
Enable migration on clusters upgraded from 4.13 (for those that have it disable)
Enable migration on cluster upgraded from 4.14.
Disable the 4.13 featureset that allowed admins to opt in migration

Regardless of the clusters state (new or upgraded), which version it is upgrading from or status of CSI migration (enabled/disabled), they should all have CSI migration enabled.

This feature also includes upgrades checks in 4.12 & 4.13 to ensure that OCP is running on a recommended vSphere version (vSphere 7.0u3L+ or 8.0u2+)

Requirements (aka. Acceptance Criteria):

We should make sure that all issues that prevented us to enabled CSI migration by default in 4.13 are resolved. If some of these issues are fix in vSphere itself we might need to check for a certain vSphere build version before proceeding with the upgrade (from 4.12 or 4.13).

Use Cases (Optional):

New 4.14 clusters
Clusters upgraded from 4.13 with migration enabled
Clusters upgraded from 4.13 with migration disabled
Clusters upgraded from 4.12 (migration disabled)

Background

More information on the 4.13 vSphere CSI migration is available in the internal FAQ

Customer Considerations

Customers who upgraded from 4.12 will unlikely opt in migration so we will have quite a few clusters with migration enabled. Given we will enabled it in 4.14 for every clusters we need to be extra careful that all issues raised are fixed and set upgrade blockers if needed.

Documentation Considerations

Remove all migration opt-in occurences in the documentation.

Interoperability Considerations

We need to make sure that upgraded clusters are running on top of a vsphere version that contains all the required fixes.

Epic STOR-1275: Remove vSphere CSI migration opt in

View the Description

Epic Goal*

Remove FeatureSet InTreeVSphereVolumes that we added in 4.13.

Why is this important? (mandatory)

We assume that the CSI Migration will be GA and locked to default in Kubernetes 1.27 / OCP 4.14. Therefore the FeatureSet must be removed.

Scenarios (mandatory)

See https://issues.redhat.com/browse/STOR-1265 for upgrade from 4.13 to 4.14

Dependencies (internal and external) (mandatory)

Same as ~~STOR-1265~~, just the other way around ("a big revert")

Contributing Teams(and contacts) (mandatory)

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

Provide some (testable) examples of how we will know if we have achieved the epic goal.

Drawbacks or Risk (optional)

Done - Checklist (mandatory)

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

Bug OCPBUGS-13914: allow removal of vsphereStorageDriver field in 4.14

View the Description View the linked PRs

Description of problem:

The vsphereStorageDriver is deprecated and we should allow cluster admins to remove that field from the Storage object in 4.14.

This is the validation rule that prevents removing vsphereStorageDriver:
https://github.com/openshift/api/blob/0eef84f63102e9d2dfdb489b18fa22676f2bd0c4/operator/v1/types_storage.go#L42

This was originally put in place to ensure that CSI Migration is not disabled again once it has been enabled. However, in 4.14 there is no way to disable migration, and there is an explicit rule to prevent setting LegacyDeprecatedInTreeDriver. So it should be safe to allow removing the vsphereStorageDriver field in 4.14, as this will not disable migration, and the field will eventually be removed from the API in a future release.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Steps to Reproduce:

1. Set vsphereStorageDriver in the Storage object
2. Try to remove vsphereStorageDriver

Actual results:

* spec: Invalid value: "object": VSphereStorageDriver is required once set

Expected results:

should be allowed

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/378

Feature OCPSTRAT-832: MCO should manage certificates via non-MC path

View the Description

Feature Overview (aka. Goal Summary)

By moving MCO certificate management out of MachineConfigs, certificate rotation can happen any time, even when pools are paused and would generate no drain or reboot.

Goals (aka. expected user outcomes)

Eliminate problems causes by certificate rotations being blocked by paused pools. Keep certificates up-to-date without disruption to workloads.

Requirements (aka. Acceptance Criteria):

MCD reads certificates from our "controllerconfig" directly.

Interoperability Considerations

Windows MCO has been updated to work with this path.

Feature OCPSTRAT-844: Expose additional MCO metrics in Prometheus

View the Description

Feature Overview (aka. Goal Summary)

Having additional MCO metrics is helpful to customers who want to closely monitor the state of their Machines and MachineConfigPools.

Requirements (aka. Acceptance Criteria):

Add for each MCP:

- Paused
- Updated
- Updating
- Degraded
- Machinecount
- ReadyMachineCount
- UpdatedMachineCount
- DegradedMachineCount

Epic MCO-753: Enhance metrics in MCO - phase 1

View the Description

Creating this to version scope the improvements merged into 4.14. Since those changes were in a story, they need an epic.

Story MCO-407: Add/expose metrics in Prometheus for MachineConfigOperator

View the Description View the linked PRs

Customer like to have in Prometheus some metrics of MachineConfigOperator. For each MCP:

- Paused
- Updated
- Updating
- Degraded
- Machinecount
- ReadyMachineCount
- UpdatedMachineCount
- DegradedMachineCount

Why does the customer need this? (List the business requirements here)

These metrics would be really important, as it could show any MachineConfig action (updating, degraded, ...), which could also even trigger an alarm with a PrometheusRule. Having a dashboard of MachineConfig would be also really useful.

https://github.com/openshift/machine-config-operator/pull/3537

Feature TELCOSTRAT-30: Workload Partitioning For Multi Node Clusters

View the Description

Note: Replace text in red with details of your feature request.

Feature Overview

Extend the Workload Partitioning feature to support multi-node clusters.

Goals

Customers running RAN workloads on C-RAN Hubs (i.e. multi-node clusters) that want to maximize the cores available to the workloads (DU) should be able to utilize WP to isolate CP processes to reserved cores.

Requirements

A list of specific needs or objectives that a Feature must deliver to satisfy the Feature. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.

requirement	Notes	isMvp?

Describe Use Cases (if needed)

< How will the user interact with this feature? >

< Which users will use this and when will they use it? >

< Is this feature used as part of current user interface? >

Out of Scope

Background, and strategic fit

< What does the person writing code, testing, documenting need to know? >

Assumptions

< Are there assumptions being made regarding prerequisites and dependencies?>

< Are there assumptions about hardware, software or people resources?>

Customer Considerations

< Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>

< Are there Upgrade considerations that customers need to account for or that the feature should address on behalf of the customer?>

Documentation Considerations

< What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)? >

< What does success look like?>

< Does this feature have doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact>

< If unsure and no Technical Writer is available, please contact Content Strategy. If yes, complete the following.>

<What concepts do customers need to understand to be successful in [action]?>

<How do we expect customers will use the feature? For what purpose(s)?>

<What reference material might a customer want/need to complete [action]?>

<Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available. >

<What is the doc impact (New Content, Updates to existing content, or Release Note)?>

Interoperability Considerations

< Which other products and versions in our portfolio does this feature impact?>

< What interoperability test scenarios should be factored by the layered product(s)?>

Questions

Question	Outcome

Epic OCPEDGE-38: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-483: Write CI Test for Pod Affinity

View the Description

Write a test to execute a known management pods and create a management pod to verify that it adheres to the CPU Affinity and CPU Shares

Ex:

pgrep kube-apiserver | while read i; do taskset -cp $i; done

Story OCPEDGE-424: Fix Deployment and DaemonSet Test Flakes

View the Description

DaemonSet and Deployment resource checks seem to flake, need to be resolved.

https://testgrid.k8s.io/redhat-openshift-ocp-release-4.14-informing#periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-cpu-partitioning&include-filter-by-regex=CPU%20Partitioning

Bug OCPBUGS-11999: Remove feature set check for cpu partitioning

View the Description View the linked PRs

Check on FeatureSet Techpreview no longer needed on the installer, removing the check from the code.

Story OCPEDGE-532: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story OCPEDGE-468: Remove Skip from current Tests

View the Description

Make validation tests run on all platforms by removing skips.

Bug OCPEDGE-12: Default Bootstrap State Does Not Enforce Desired CPU Share

View the Description

The original implementation of workload partitioning tried to leverage default behavior for CRIO to allow full use of CPU Sets when no Performance Profile is supplied by the user while still being a CPU partitioned cluster. This works fine for CPU affinity however because we don't supply a config and allow the default behavior to kick in, CRIO does not alter the CPU share and gives all pods 2 CPU Share value.

We need to supply a config for CRIO with an empty string for CPU Set to support both CPU share and CPU affinity behavior when NO performance profile is supplied, so that the `resource.requests` which get altered to CPU Share, are correctly being applied in a default state.

Note, this is not an issue with CPU affinity, that still behaves as expected and when a performance profile is supplied things work as intended as well. The CPU share mismatch is the only issue being identified here.

Epic CNF-5562: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Story CNF-5652: Write CI Test for Cluster Configuration

View the Description View the linked PRs

Create generic validation tests in Origin and Release repo to check that a cluster is correctly configured. E2E tests running in a cpu partitioned cluster should run successfully.

https://github.com/openshift/origin/pull/27770

This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled

Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	Self-managed (but reusable in managed with xCM)
Classic (standalone cluster)	N/A
Hosted control planes	Applicable
Multi node, Compact (three node), or Single node (SNO), or all	N/A
Connected / Restricted Network	Applicable
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	Applicable
Operator compatibility	Observability Operator (ObO)
Backport needed (list applicable versions)	N/A
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	OpenShift Console, dynamic plugin
Other (please specify)	N/A

Requirement	Notes
OCI Bare Metal Shapes must be certified with RHEL	It must also work with RHCOS (see iSCSI boot notes) as OCI BM standard shapes require RHCOS iSCSI to boot (~~OCPSTRAT-1246~~) Certified shapes: https://catalog.redhat.com/cloud/detail/249287
Successfully passing the OpenShift Provider conformance testing – this should be fairly similar to the results from the OCI VM test results.	Oracle will do these tests.
Updating Oracle Terraform files
Making the Assisted Installer modifications needed to address the CCM changes and surface the necessary configurations.	Support Oracle Cloud in Assisted-Installer CI: ~~MGMT-14039~~

Requirement	Notes	isMvp?
A declarative mechanism to automate the catalog update process in file-based catalog (FBC) with newly-published bundle references.		Yes
A declarative mechanism to publish Operator releases in file-based catalog (FBC) to multiple OCP releases.		Yes
A declarative mechanism to convert file-based catalog (FBC) to sqlite database format so it can be publish to OCP versions without FBC supports.		Yes
A declarative mechanism to convert existing catalog from sqlite database to file-based catalog (FBC) basic template.		Yes
A declarative mechanism to convert existing catalog from sqlite database to file-based catalog (FBC) semver template when possible and/or highlights the uncompleted sections so users can easier identify the gaps.		NO
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	Yes
Release Technical Enablement	Provide necessary release enablement details and documents.	Yes

Role	Contact
PM	Peter Lauterbach
Documentation Owner	TBD
Delivery Owner	(See assignee)
Quality Engineer	(See QA contact)

Who	What	Reference
DEV	Upstream roadmap issue	<link to GitHub Issue>
DEV	Upstream code and tests merged	<link to meaningful PR or GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR or GitHub Issue>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	N/A details in user stories.
QE	Automated tests merged	N/A details in user stories.
DOC	Downstream documentation merged	<link to meaningful PR>

Bug MGMT-14781: Make LSO operator to support all CPU architectures

View the Description View the linked PRs

Description of the problem:

Feature support of LSO is currently supporting only x86, this was an error due to https://github.com/openshift/assisted-service/blob/ca339ae3515df6c1394af8b43187e5be13d6306e/internal/operators/lso/ls_operator.go#L103

https://github.com/openshift/assisted-service/pull/5262

Bug OCPBUGS-11448: Multus admission controller must have "hypershift.openshift.io/release-image" annotation when CNO is managed by Hypershift

View the Description View the linked PRs

Description of problem:

When CNO is managed by Hypershift, it's deployment has "hypershift.openshift.io/release-image" template metadata annotation. The annotation's value is used to track progress of cluster control plane version upgrades. Example:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      generation: 24
      labels:
        hypershift.openshift.io/managed-by: control-plane-operator
      name: cluster-network-operator
      namespace: master-cg319sf10ghnddkvo8j0
    ...
    spec:
      progressDeadlineSeconds: 600
      ...
      template:
        metadata:
          annotations:
            hypershift.openshift.io/release-image: us.icr.io/armada-master/ocp-release:4.12.7-x86_64
            target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
      ...

The same annotation must be set by CNO on multus-admission-controller deployment so that service providers can track its version upgrades as well.

CNO need a code fix to implement this annotation propagation logic.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Check deployment template metadata annotations on multus-admission-controller

Actual results:

No "hypershift.openshift.io/release-image" deployment template metadata annotation exists

Expected results:

"hypershift.openshift.io/release-image" annotation must be present

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1770

Bug OCPBUGS-13113: Metric for control plane upgrade time

View the Description View the linked PRs

Description of problem:
Control plane upgrades takes about 23 minutes on average. The shortest time I saw was 14 minutes, and the longest is 43 minutes.
The requirement is < 10 min for a successful complete control plane upgrade.

Version-Release number of selected component (if applicable): 4.12.12

How reproducible:
100 %

Steps to Reproduce:

1. Install a hosted cluster on 4.12.12. Wait for it to be 'ready'.
2. Upgrade the control plane to 4.12.13 via OCM.

Actual results: upgrade completes on average after 23 minutes.

Expected results: upgrade completes after < 10 min

Additional info:

N/A

https://github.com/openshift/hypershift/pull/2566

Bug OCPBUGS-18365: Default userManagedNetworking in ACI doesn't always work

View the Description View the linked PRs

When the user is providing ZTP manifests, a missing value for userManagedNetworking (in AgentClusterInstall) should be defaulted based on the platform type - for platform None this should default to true.

This is only happening if the platform type is misspelled as none instead of None. (Both are accepted for backwards compat with ~~OCPBUGS-7495~~, but they should not result in different behaviour.)

When the user starts from an install-config, we set the correct value explicitly in the generated AgentClusterInstall, so this is not a problem so long as the user doesn't edit it.

https://github.com/openshift/installer/pull/7458

Bug OCPBUGS-14338: "shouldn't exceed the 650 series limit of total series sent via telemetry from each cluster" failing on techpreview jobs

View the Description View the linked PRs

Description of problem:


This test is permafailing on techpreview since https://github.com/openshift/origin/pull/27915 landed

[sig-instrumentation][Late] Alerts shouldn't exceed the 650 series limit of total series sent via telemetry from each cluster [Suite:openshift/conformance/parallel]

            s: "promQL query returned unexpected results:\navg_over_time(cluster:telemetry_selected_series:count[49m15s]) >= 650\n[\n  {\n    \"metric\": {\n      \"prometheus\": \"openshift-monitoring/k8s\"\n    },\n    \"value\": [\n      1685504058.881,\n      \"700.3636363636364\"\n    ]\n  }\n]",

Version-Release number of selected component (if applicable):


4.14

How reproducible:


Always

Steps to Reproduce:

1. Run conformance tests on a techpreview cluster

Actual results:

Test fails

Expected results:

Test succeeds

Additional info:


Example job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-sdn-techpreview/1663723476923453440

https://github.com/openshift/origin/pull/27959

Bug OCPBUGS-20066: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7551

Bug OCPBUGS-8040: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2050

Bug OCPBUGS-8646: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/462

Bug OCPBUGS-14932: Images: update azure cli to 2.40.0+ in upi-installer to avoid security vulnerability

View the Description View the linked PRs

Description of problem:

Due to security vulnerability[1] affecting Azure CLI versions previous to 2.40.0(not included), it is recommended to update azure cli to higher version to avoid this issue. Currently, azure cli in CI is 2.38.0.

[1] https://github.com/Azure/azure-cli/security/advisories/GHSA-47xc-9rr2-q7p4

Version-Release number of selected component (if applicable):

All supported version

How reproducible:

Always

Steps to Reproduce:

1. Trigger CI jobs on azure platform that contains steps using azure cli.
2. 
3.

Actual results:

azure cli 2.38.0 is used now.

Expected results:

azure cli 2.40.0+ to be used in CI on all supported version

Additional info:

As azure cli 2.40.0+ is only available in rhel8-based repository, need to update its repo in upi-installer rhel8-based docker file[1]

[1] https://github.com/openshift/installer/blob/master/images/installer/Dockerfile.upi.ci.rhel8#L23

Bug OCPBUGS-16498: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/66

Bug OCPBUGS-17711: Regression: oc adm extract fails on invalid KUBECONFIG

View the Description View the linked PRs

Description of problem:

We suspect that https://github.com/openshift/oc/pull/1521 has broken all Metal jobs, an example of a failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/355/pull-ci-openshift-cluster-baremetal-operator-master-e2e-metal-ipi-ovn-ipv6/1691359315740332032.

~~Details:~~

The testing scripts we use set KUBECONFIG in advance to the location where we'll create it. At the time "oc adm extract" is called, the file does not exist yet. While you could argue that we should not do it, it has worked for years, and it's quite possible that customers have similar automation (e.g. setting KUBECONFIG as a global variable in their playbooks). In any case, I don't think "oc adm extract" should try to read the configuration if it does not explicitly need it.

Updated details:

After the change, "oc adm extract" expects KUBECONFIG to be present, but at the point when we call it, there is no cluster. I initially assumed that unsetting KUBECONFIG will help but it does not.

https://github.com/openshift/oc/pull/1527

Story OCPCLOUD-1990: ControlPlaneMachineSet: Update supported platforms in docs

View the Description View the linked PRs

Background

Update the CPMS docs to reflect the newly supported flavours for the upcoming 4.13 release.

Steps

Create a PR to update the docs

Stakeholders

Cloud Team

Definition of Done

PR merged{}

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/172

Bug OCPBUGS-10147: Update 4.14 ose-aws-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-aws/pull/37

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-aws/pull/37

Bug OCPBUGS-10794: Missing vCenter build number in telemetry

View the Description View the linked PRs

Description of problem:

Our telemetry contains only vCenter version ("7.0.3") and not the exact build number. We need the build number to know what exact vCenter build user has and what bugs are fixed there (e.g. https://issues.redhat.com/browse/OCPBUGS-5817).

https://github.com/openshift/vsphere-problem-detector/pull/102

Bug OCPBUGS-10846: CI fails on TestClientTLS

View the Description View the linked PRs

Description of problem

CI is flaky because the TestClientTLS test fails.

Version-Release number of selected component (if applicable)

I have seen these failures in 4.13 and 4.14 CI jobs.

How reproducible

Presently, search.ci reports the following stats for the past 14 days:

Found in 16.07% of runs (20.93% of failures) across 56 total runs and 13 jobs (76.79% failed) in 185ms

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check https://search.ci.openshift.org/?search=FAIL%3A+TestAll%2Fparallel%2FTestClientTLS&maxAge=336h&context=1&type=all&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job.

Actual results

The test fails:

=== RUN   TestAll/parallel/TestClientTLS
=== PAUSE TestAll/parallel/TestClientTLS
=== CONT  TestAll/parallel/TestClientTLS
=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [8 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [313 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [313 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:56:24 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=683c60a6110214134bed475edc895cb9; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:56:24 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=eb40064e54af58007f579a6c82f2bcd7; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [802 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:56:25 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=104beed63d6a19782a5559400bd972b6; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

        stdout:
        000

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS alert, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS alert, unknown CA (560):
        { [2 bytes data]
        * OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

        * Closing connection 0
        curl: (56) OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        000

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [8 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS alert, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS alert, unknown (628):
        { [2 bytes data]
        * OpenSSL SSL_read: error:1409445C:SSL routines:ssl3_read_bytes:tlsv13 alert certificate required, errno 0

        * Closing connection 0
        curl: (56) OpenSSL SSL_read: error:1409445C:SSL routines:ssl3_read_bytes:tlsv13 alert certificate required, errno 0

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:57:00 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=683c60a6110214134bed475edc895cb9; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [802 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:57:00 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=eb40064e54af58007f579a6c82f2bcd7; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        000

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS alert, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS alert, unknown CA (560):
        { [2 bytes data]
        * OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

        * Closing connection 0
        curl: (56) OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

=== CONT  TestAll/parallel/TestClientTLS
--- FAIL: TestAll (1538.53s)
    --- FAIL: TestAll/parallel (0.00s)
        --- FAIL: TestAll/parallel/TestClientTLS (123.10s)

Expected results

CI passes, or it fails on a different test.

Additional info

I saw that TestClientTLS failed on the test case with no client certificate and ClientCertificatePolicy set to "Required". My best guess is that the test is racy and is hitting a terminating router pod. The test uses waitForDeploymentComplete to wait until all new pods are available, but perhaps waitForDeploymentComplete should also wait until all old pods are terminated.

https://github.com/openshift/cluster-ingress-operator/pull/904

Bug OCPBUGS-13526: Dynamic conversion webhook clientConfig not retained as operator installs

View the Description View the linked PRs

Description of problem:

During a fresh install of an operator with conversion webhooks enabled, `crd.spec.conversion.webhook.clientConfig` is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

Oddly though, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected.

Version-Release number of selected component (if applicable):

OCP 4.10.36
OCP 4.11.18

How reproducible:

Consistently

Steps to Reproduce:

1. oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
2. oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Actual results:

Eventually, the clientConfig settings will revert to the following and stay that way.

$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}'
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

 conversion:
   strategy: Webhook
   webhook:
     clientConfig:
       service:
         namespace: openshift-dbaas-operator
         name: dbaas-operator-webhook-service
         path: /convert
         port: 443
     conversionReviewVersions:
       - v1alpha1
       - v1beta1

Expected results:

The `crd.spec.conversion.webhook.clientConfig` should instead retain the following settings.

$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}'
map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJpRENDQVMyZ0F3SUJBZ0lJUVA1b1ZtYTNqUG93Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNoTU4KVW1Wa0lFaGhkQ3dnU1c1akxqQWVGdzB5TWpFeU1UWXhPVEEwTWpsYUZ3MHlOREV5TVRVeE9UQTBNamxhTUJneApGakFVQmdOVkJBb1REVkpsWkNCSVlYUXNJRWx1WXk0d1dUQVRCZ2NxaGtqT1BRSUJCZ2dxaGtqT1BRTUJCd05DCkFBVGcxaEtPWW40MStnTC9PdmVKT21jbkx5MzZNWTBEdnRGcXF3cjJFdlZhUWt2WnEzWG9ZeWlrdlFlQ29DZ3QKZ2VLK0UyaXIxNndzSmRSZ2paYnFHc3pGbzJFd1h6QU9CZ05WSFE4QkFmOEVCQU1DQW9Rd0hRWURWUjBsQkJZdwpGQVlJS3dZQkJRVUhBd0lHQ0NzR0FRVUZCd01CTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFCkZPMWNXNFBrbDZhcDdVTVR1UGNxZWhST1gzRHZNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUNJUURxN0pkUjkxWlgKeWNKT0hyQTZrL0M0SG9sSjNwUUJ6bmx3V3FXektOd0xiZ0loQU5ObUd6RnBqaHd6WXpVY2RCQ3llU3lYYkp3SAphYllDUXFkSjBtUGFha28xCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]

 conversion:
   strategy: Webhook
   webhook:
     clientConfig:
       service:
         namespace: redhat-dbaas-operator
         name: dbaas-operator-controller-manager-service
         path: /convert
         port: 443
       caBundle: >-
         LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNoTU4KVW1Wa0lFaGhkQ3dnU1c1akxqQWVGdzB5TWpFeU1UWXhPVEF5TURkYUZ3MHlOREV5TVRVeE9UQXlNRGRhTUJneApGakFVQmdOVkJBb1REVkpsWkNCSVlYUXNJRWx1WXk0d1dUQVRCZ2NxaGtqT1BRSUJCZ2dxaGtqT1BRTUJCd05DCkFBUVRFQm8zb1BWcjRLemF3ZkE4MWtmaTBZQTJuVGRzU2RpMyt4d081ZmpKQTczdDQ2WVhOblFzTjNCMVBHM04KSXJ6N1dKVkJmVFFWMWI3TXE1anpySndTbzJFd1h6QU9CZ05WSFE4QkFmOEVCQU1DQW9Rd0hRWURWUjBsQkJZdwpGQVlJS3dZQkJRVUhBd0lHQ0NzR0FRVUZCd01CTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFCkZJemdWbC9ZWkFWNmltdHl5b0ZkNFRkLzd0L3BNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJRUY3ZXZ0RS95OFAKRnVrTUtGVlM1VkQ3a09DRzRkdFVVOGUyc1dsSTZlNEdBaUVBZ29aNmMvYnNpNEwwcUNrRmZSeXZHVkJRa25SRwp5SW1WSXlrbjhWWnNYcHM9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K

Additional info:

If the operator is, instead, installed as an upgrade... vs a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

1. oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
2. oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

https://github.com/openshift/operator-framework-olm/pull/490

Bug OCPBUGS-1626: alertmanager pod restarted once to become ready

View the Description View the linked PRs

Description of problem:

4.12.0-0.nightly-2022-09-20-095559 fresh cluster, alertmanager pod restarted once to become ready, this is a 4.12 regression, we should make sure the /etc/alertmanager/config_out/alertmanager.env.yaml exists first

# oc -n openshift-monitoring get pod
NAME                                                     READY   STATUS    RESTARTS       AGE
alertmanager-main-0                                      6/6     Running   1 (118m ago)   118m
alertmanager-main-1                                      6/6     Running   1 (118m ago)   118m
...

# oc -n openshift-monitoring describe pod alertmanager-main-0 
...
Containers:
  alertmanager:
    Container ID:  cri-o://31b6f3231f5a24fe85188b8b8e26c45b660ebc870ee6915919031519d493d7f8
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34003d434c6f07e4af6e7a52e94f703c68e1f881e90939702c764729e2b513aa
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34003d434c6f07e4af6e7a52e94f703c68e1f881e90939702c764729e2b513aa
    Ports:         9094/TCP, 9094/UDP
    Host Ports:    0/TCP, 0/UDP
    Args:
      --config.file=/etc/alertmanager/config_out/alertmanager.env.yaml
      --storage.path=/alertmanager
      --data.retention=120h
      --cluster.listen-address=[$(POD_IP)]:9094
      --web.listen-address=127.0.0.1:9093
      --web.external-url=https:/console-openshift-console.apps.qe-daily1-412-0922.qe.azure.devcluster.openshift.com/monitoring
      --web.route-prefix=/
      --cluster.peer=alertmanager-main-0.alertmanager-operated:9094
      --cluster.peer=alertmanager-main-1.alertmanager-operated:9094
      --cluster.reconnect-timeout=5m
      --web.config.file=/etc/alertmanager/web_config/web-config.yaml
    State:       Running
      Started:   Wed, 21 Sep 2022 19:40:14 -0400
    Last State:  Terminated
      Reason:    Error
      Message:   s=2022-09-21T23:40:06.507Z caller=main.go:231 level=info msg="Starting Alertmanager" version="(version=0.24.0, branch=rhaos-4.12-rhel-8, revision=4efb3c1f9bc32ba0cce7dd163a639ca8759a4190)"
ts=2022-09-21T23:40:06.507Z caller=main.go:232 level=info build_context="(go=go1.18.4, user=root@b2df06f7fbc3, date=20220916-18:08:09)"
ts=2022-09-21T23:40:07.119Z caller=cluster.go:260 level=warn component=cluster msg="failed to join cluster" err="2 errors occurred:\n\t* Failed to resolve alertmanager-main-0.alertmanager-operated:9094: lookup alertmanager-main-0.alertmanager-operated on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-1.alertmanager-operated:9094: lookup alertmanager-main-1.alertmanager-operated on 172.30.0.10:53: no such host\n\n"
ts=2022-09-21T23:40:07.119Z caller=cluster.go:262 level=info component=cluster msg="will retry joining cluster every 10s"
ts=2022-09-21T23:40:07.119Z caller=main.go:329 level=warn msg="unable to join gossip mesh" err="2 errors occurred:\n\t* Failed to resolve alertmanager-main-0.alertmanager-operated:9094: lookup alertmanager-main-0.alertmanager-operated on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-1.alertmanager-operated:9094: lookup alertmanager-main-1.alertmanager-operated on 172.30.0.10:53: no such host\n\n"
ts=2022-09-21T23:40:07.119Z caller=cluster.go:680 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
ts=2022-09-21T23:40:07.173Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2022-09-21T23:40:07.174Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="open /etc/alertmanager/config_out/alertmanager.env.yaml: no such file or directory"
ts=2022-09-21T23:40:07.174Z caller=cluster.go:689 level=info component=cluster msg="gossip not settled but continuing anyway" polls=0 elapsed=54.469985ms      Exit Code:    1
      Started:      Wed, 21 Sep 2022 19:40:06 -0400
      Finished:     Wed, 21 Sep 2022 19:40:07 -0400
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:     4m
      memory:  40Mi
    Startup:   exec [sh -c exec curl --fail http://localhost:9093/-/ready] delay=20s timeout=3s period=10s #success=1 #failure=40
...

# oc -n openshift-monitoring exec -c alertmanager alertmanager-main-0 -- cat /etc/alertmanager/config_out/alertmanager.env.yaml
"global":
  "resolve_timeout": "5m"
"inhibit_rules":
- "equal":
  - "namespace"
  - "alertname"
  "source_matchers":
  - "severity = critical"
  "target_matchers":
  - "severity =~ warning|info"
- "equal":
  - "namespace"
  - "alertname"

...

Version-Release number of selected component (if applicable):

# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-09-20-095559   True        False         109m    Cluster version is 4.12.0-0.nightly-2022-09-20-095559

How reproducible:

always

Steps to Reproduce:

1. see the steps
2.
3.

Actual results:

alertmanager pod restarted once to become ready

Expected results:

no restart

Additional info:

no issue with 4.11

# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-09-20-140029   True        False         16m     Cluster version is 4.11.0-0.nightly-2022-09-20-140029
# oc -n openshift-monitoring get pod | grep alertmanager-main
alertmanager-main-0                                      6/6     Running   0          54m
alertmanager-main-1                                      6/6     Running   0          55m

Bug OCPBUGS-10568: migrate to using Lease for leader election

View the Description View the linked PRs

Description of problem:

library-go should use Lease for leader election by default. 
In 4.10 we switched from configmaps to configmapsleases, now we can switch to leases

change library-go to use lease by default, we already have an open pr for that: https://github.com/openshift/library-go/pull/1448 

once the pr merges, we should revendor library-go for:
- kas operator
- oas operator
- etcd operator
- kcm operator
- openshift controller manager operator
- scheduler operator
- auth operator
- cluster policy controller

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-14053: Critical Alert Rules do not have runbook url

View the Description View the linked PRs

Description of problem:

Critical Alert Rules do not have runbook url

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

1.  Check details of MultipleDefaultStorageClasses Alert Rule
2.
3.

Actual results:

The Alert Rule MultipleDefaultStorageClasses has Critical Severity, but does not have runbook_url annotation.

Expected results:

All Critical Alert Rules must have runbbok_url annotation

Additional info:

Critical Alerts must have a runbook, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 

The runbooks are located at github.com/openshift/runbooks

To resolve the bug, 
- Add runbooks for the relevant Alerts at github.com/openshift/runbooks
- Add the link to the runbook in the Alert annotation 'runbook_url'
- Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27933

https://github.com/openshift/cluster-storage-operator/pull/382

Bug OCPBUGS-13187: Make vsphere-problem-detector alerts configurable

View the Description View the linked PRs

Description of problem:

The vsphere-problem-detector feature is triggering VSphereOpenshiftClusterHealthFail alerts regarding “CheckFolderPermissions” and “CheckDefaultDatastore” after upgrading from 4.9.54. Forcing users to update configuration solely to get around the problem detector. Depending on the customer policies around vCenter passwords or configuration updates, this can be a major obstacle for a user who wants to keep the current vSphere settings since they've worked correctly in the previous Openshift versions.

Version-Release number of selected component (if applicable):

4.10.55

How reproducible:

Consistently

Steps to Reproduce:

1.Upgrading a cluster to 4.10 with invalid vSphere credentials

Actual results:

The cluster-storage-operator fires alarms regarding vSphere configuration in Openshift.

Expected results:

Bypass the vsphere-problem-detector if the user doesn't want to make a config change, since the setup is working, and upgrades like this succeeded for user previous to 4.10.

Additional info:

Bug OCPBUGS-15299: Create Serverless Function Form is Broken

View the Description View the linked PRs

Description of problem:

Create Serverless Function Form is Broken

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always on Master.

Steps to Reproduce:

1. Go to Add Page
2. Click Create Serverless Function form

Actual results:

Form throwing error.

Expected results:

Form should open and submit

Screenshot of Error: https://drive.google.com/file/d/1uyzGHktfr8tEGWPyYkv9ISYI6BhdnK6f/view?usp=sharing

Additional info:

https://github.com/openshift/console/pull/12926

Bug OCPBUGS-17156: Fix for dnf-RHEL worker nodes breaks 4.12 -> 4.13 upgrades badly

View the Description View the linked PRs

One of the 4.13 nightly payload test is failing and it seems like kernel-uname-r is needed in base RHCOS.

Error message from rpm-ostree rebase made

 Problem: package kernel-modules-core-5.14.0-284.25.1.el9_2.x86_64 requires kernel-uname-r = 5.14.0-284.25.1.el9_2.x86_64, but none of the providers can be installed
  - conflicting requests

MCD pod log: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade/1686324400581775360/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-daemon-bjhq4_machine-config-daemon.log

Perhaps something changed recently in packaging.

https://github.com/openshift/machine-config-operator/pull/3832

Bug OCPBUGS-7980: Verify Hyper-Thread aware scheduling for guaranteed pods test fails on 4.13

View the Description View the linked PRs

Description of problem:

Test in periodic job of 4.13 release fails in about 30% jobs:
[rfe_id:27363][performance] CPU Management Hyper-thread aware scheduling for guaranteed pods Verify Hyper-Thread aware scheduling for guaranteed pods [test_id:46959] Number of CPU requests as multiple of SMT count allowed when HT enabled

Version-Release number of selected component (if applicable):

4.13

How reproducible:

In periodic jobs

Steps to Reproduce:

Run cnf tests on 4.13

Actual results:

Expected results:

Additional info:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-telco5g-cnftests/1628395172440051712/artifacts/e2e-telco5g-cnftests/telco5g-cnf-tests/artifacts/test_results.html

https://github.com/openshift/cluster-node-tuning-operator/pull/729

Bug OCPBUGS-13628: 4.14 CI: error getting FeatureGates, all installs failing

View the Description View the linked PRs

Baremetal ipi jobs are failing in 4.14 CI since May 12th

bootkube is failing to start with

May 15 10:11:56 localhost.localdomain systemd[1]: Started Bootstrap a Kubernetes cluster.
May 15 10:12:04 localhost.localdomain bootkube.sh[82661]: Rendering Kubernetes Controller Manager core manifests...
May 15 10:12:09 localhost.localdomain bootkube.sh[84029]: F0515 10:12:09.396398       1 render.go:45] error getting FeatureGates: error creating feature accessor: unable to determine features: missing desired version "4.14.0-0.nightly-2023-05-12-121801" in featuregates.config.openshift.io/cluster
May 15 10:12:09 localhost.localdomain systemd[1]: bootkube.service: Main process exited, code=exited, status=255/EXCEPTION
May 15 10:12:09 localhost.localdomain systemd[1]: bootkube.service: Failed with result 'exit-code'.

https://github.com/openshift/installer/pull/7183

Bug OCPBUGS-15232: Installation failed - 0 hosts available while choosing host for machine

View the Description View the linked PRs

Description of problem:

Cluster deployment of 4.14.0-0.nightly-2023-06-20-065807 fails as worker nodes are stuck in INSPECTING state despite being reported as MANAGEABLE

From the logs of machine-controller container in machine-api-controllers pod:

I0621 06:12:02.779472       1 request.go:682] Waited for 2.095824347s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v2?timeout=32s
E0621 06:12:02.781540       1 logr.go:270] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"Metal3Remediation\" in version \"infrastructure.cluster.x-k8s.io/v1beta1\""  "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"Metal3Remediation"}
I0621 06:12:02.783418       1 controller.go:179] kni-qe-4-tj65t-worker-0-h6s8g: reconciling Machine
2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-h6s8g exists.
2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-h6s8g does not exist.
I0621 06:12:02.783439       1 controller.go:372] kni-qe-4-tj65t-worker-0-h6s8g: reconciling machine triggers idempotent create
2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-h6s8g
2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-h6s8g'
2023/06/21 06:12:02 No available BareMetalHost found
W0621 06:12:02.783735       1 controller.go:374] kni-qe-4-tj65t-worker-0-h6s8g: failed to create machine: requeue in: 30s
I0621 06:12:02.783748       1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s
I0621 06:12:02.783780       1 controller.go:179] kni-qe-4-tj65t-worker-0-j259x: reconciling Machine
2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-j259x exists.
2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-j259x does not exist.
I0621 06:12:02.783792       1 controller.go:372] kni-qe-4-tj65t-worker-0-j259x: reconciling machine triggers idempotent create
2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-j259x
2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-j259x'
2023/06/21 06:12:02 No available BareMetalHost found
W0621 06:12:02.783971       1 controller.go:374] kni-qe-4-tj65t-worker-0-j259x: failed to create machine: requeue in: 30s
I0621 06:12:02.783976       1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s

BMH Resources:

oc get bmh -A
NAMESPACE               NAME                 STATE                    CONSUMER                  ONLINE   ERROR   AGE
openshift-machine-api   openshift-master-0   externally provisioned   kni-qe-4-tj65t-master-0   true             175m
openshift-machine-api   openshift-master-1   externally provisioned   kni-qe-4-tj65t-master-1   true             175m
openshift-machine-api   openshift-master-2   externally provisioned   kni-qe-4-tj65t-master-2   true             175m
openshift-machine-api   openshift-worker-0   inspecting                                         true             175m
openshift-machine-api   openshift-worker-1   inspecting                                         true             175m

From Ironic:

baremetal node list
+--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name                                     | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
| 86f146e3-3e48-4a7a-b0ef-57c42083fc92 | openshift-machine-api~openshift-master-0 | 7eeb9e57-2df2-4710-82d9-d3f99a20348e | power on    | active             | False       |
| 2380f211-934f-4193-8cb1-d09e7008410c | openshift-machine-api~openshift-master-2 | fd856ced-2912-4800-848c-256c00a1fdb7 | power on    | active             | False       |
| 9ad70c58-de44-4d56-9304-4bf7c95de6fb | openshift-machine-api~openshift-master-1 | aa1a4c89-4215-44ec-90c7-9c5f3de95ab8 | power on    | active             | False       |
| bb5ea5f4-016c-4bdd-834d-61d575284bf3 | openshift-machine-api~openshift-worker-0 | None                                 | power off   | manageable         | False       |
| 3045a07a-09d6-43a0-ab9c-d856b54bad6c | openshift-machine-api~openshift-worker-1 | None                                 | power off   | manageable         | False       |
+--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-20-065807

How reproducible:

so far once

Steps to Reproduce:

1. Deploy baremetal dualstack cluster with day1 networking

Actual results:

Deployment fails as worker nodes are not provisioned

Expected results:

Deployment succeeds

https://github.com/openshift/cluster-baremetal-operator/pull/348

Bug OCPBUGS-16678: CI fails on "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install" for cinder CSI pods

View the Description View the linked PRs

Description of problem: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.14-e2e-openstack-sdn/1682353286402805760 failed with:

fail [github.com/openshift/origin/test/extended/authorization/scc.go:69]: 2 pods failed before test on SCC errors
Error creating: pods "openstack-cinder-csi-driver-controller-7c4878484d-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[1].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[1].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[2].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[2].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[3].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[3].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[4].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[4].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[5].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[5].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[6].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[6].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[7].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[7].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[8].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[8].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[9].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[9].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for ReplicaSet.apps/v1/openstack-cinder-csi-driver-controller-7c4878484d -n openshift-cluster-csi-drivers happened 13 times
Error creating: pods "openstack-cinder-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[8]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[0].capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[0].allowPrivilegeEscalation: Invalid value: true: Allowing privilege escalation for containers is not allowed, provider restricted-v2: .containers[1].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[1].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[2].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[2].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/openstack-cinder-csi-driver-node -n openshift-cluster-csi-drivers happened 12 times

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/120

Bug OCPBUGS-19300: OKD: Implement workaround to allow SNO installations for OKD/FCOS [release-4.14]

View the Description View the linked PRs

Description of problem:

OKD/FCOS uses FCOS as its bootimage, i.e. when booting cluster nodes
the first time during installation. FCOS does not provide tools such
as OpenShift Client (oc) or hyperkube which are used during
single-node cluster installation at first boot (e.g. oc in
bootkube.sh) and thus setup fails.

Version-Release number of selected component (if applicable):

4.14

https://github.com/openshift/installer/pull/7479

Bug OCPBUGS-12110: Update 4.14 ose-cluster-control-plane-machine-set-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/197

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/202

Bug OCPBUGS-19357: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3920

Bug OCPBUGS-7446: Show type of sample on the samples view

View the Description View the linked PRs

Description

As a user, I would like to see the type of technology used by the samples on the samples view similar to the all services view.

On the samples view:

It is showing different types of samples, e.g. devfile, helm and all showing as .NET. It is difficult for user to decide which .Net entry to select on the list. We'll need something like the all service view where it shows the type of technology on the top right of each card for users to differentiate between the entries:

Acceptance Criteria

Add visible label as the all services view on each card to show the technology used by the sample on the samples view.

Additional Details:

https://github.com/openshift/console/pull/12548

Bug OCPBUGS-8086: Visual issues with listing items

View the Description View the linked PRs

Remove list bullets

Need space between "Phase" and status icon

https://github.com/openshift/console/pull/12619

Bug OCPBUGS-12925: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/26

Bug OCPBUGS-13699: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/371

Bug OCPBUGS-14548: The ExternalLink for ' OpenShift Pipelines based on Tekton' is incorrect

View the Description View the linked PRs

Description of problem:

The ExternalLink 'OpenShift Pipelines based on Tekton' in Pipeline Build Strategy deprecation Alert is incorrect, currently it's defined as https://openshift.github.io/pipelines-docs/ and will redirect to a 'Not found' page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-04-133505

How reproducible:

Always

Steps to Reproduce:

1. $oc new app -n test https://github.com/openshift/cucushift/blob/master/testdata/pipeline/samplepipeline.yaml
  
   OR Create Jenkins server and Pipeline BC
   $ oc new-app https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/jenkins-ephemeral-template.json
   $ oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/pipeline/samplepipeline.yaml

2. Admin user login console and navigate to Builds -> Build Configs -> sample-pipeline Details page
3.Check the External link 'OpenShift Pipelines based on Tekton' in the 'Pipeline build strategy deprecation' Alert

Actual results:

Now a 'Not found' page would be redirected for the user

Expected results:

The link should be correct and existing

Additional info:

Impact file build.tsx
https://github.com/openshift/console/blob/a0e7e98e5ffe4aca73f9f1f441d15cc4e9b33ee6/frontend/public/components/build.tsx#LL238C17-L238C60

Base bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1768350

Task SPLAT-1099: [vsphere] introduce feature gate for vSphere static IPs

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/323

Bug MGMT-14073: debug info is not printed for data collection

View the Description View the linked PRs

Description of the problem:

Debug info is not printed for data collection

How reproducible:

Always

Steps to reproduce:

1. Deploy MCE multicluster-engine.v2.3.0-81.

2. Enable log level debug for AI

3. Deploy spoke multinode 4.12

Actual results:

No debug info printed.

Expected results:

should print debug info :
log.Debugf("Red Hat Insights Request ID: %+v", res.Header.Get("X-Rh-Insights-Request-Id"))

https://github.com/openshift/assisted-service/pull/5070

Bug OCPBUGS-10089: Update 4.14 kube-rbac-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/64

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-rbac-proxy/pull/64

Bug OCPBUGS-12921: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/33

Bug OCPBUGS-16035: Upgrade to 4.13.4 stuck for MCO degraded because of nmstatectl: exit status 1

View the Description View the linked PRs

Description of problem:

This issue was supposed to be fixed in 4.13.4 but is happening again. Manually creating the directory "/etc/systemd/network" allow to complete the upgrade but is not a sustainable workaround when there are several cluster to update.

Version-Release number of selected component (if applicable):

4.13.4

How reproducible:

At customer environment.

Steps to Reproduce:

1. Update to 4.13.4 from 4.12.21
2.
3.

Actual results:

MCO degraded blocking the upgrade.

Expected results:

Upgrade to complete.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3883

Bug OCPBUGS-16507: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/333

Bug OCPBUGS-18024: HCP Create NodePool AWS Render Does Not Specify InstanceType or Arch

View the Description View the linked PRs

Description of problem:

The HCP Create NodePool AWS Render command does not work correctly since it does not render a specification with the arch and instance type defined.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

No arch or instance type defined in specification.

Expected results:

Arch and instance type defined in specification.

Additional info:

https://github.com/openshift/hypershift/pull/2941

Story HOSTEDCP-809: Fix the Root CA in HCP namespace to have the same naming convention as Certmanager

View the Description View the linked PRs

When we create an HCP, the Root CA in the HCP namespaces has the certificate and key named as

ca.key
ca.crt
But to cert manager expects them to be named as
tls.key
tls.cert

Done criteria: The Root CA should have the certificate and key named as the cert manager expects.

https://github.com/openshift/hypershift/pull/2246

Bug OCPBUGS-129: [OCP web console] Unable to select/change log component under master node's logs section once user made any selection.

View the Description View the linked PRs

Description of problem:

Once a user makes a change to the log component from master node's log section, then the user is unable to change or select a different log component from the dropdown.

To make different log component selection , the user needs to revisit the logs section under master node again and this refreshes the pane and reloads to default options.

Version-Release number of selected components (if applicable):

4.11.0-0.nightly-2022-08-15-152346

How reproducible:

Always

Steps to Reproduce:

Login to OCP web console.
Go to Compute > Nodes > Click on one of the master nodes.
Go to the Logs section.
Change the dropdown value from journal to openshift-apiserver ( also select audit log)
Try to change the dropdown value from openshift-apiserver to journal/kube-apiserver/oauth-apiserver.
View the behavior.

Actual results:

Unable to select or change the log component once the user already made a selection from the dropdown under master nodes' logs section.

Expected results:

Users should be allowed to change/select the log component from master node's logs section whenever required with the help of available dropdown.

Additional info:

Reproduced in both chrome[103.0.5060.114 (Official Build) (64-bit)] and firefox[91.11.0esr (64-bit)] browsers
Attached screen capture for the same.ScreenRecorder_2022-08-16_26457662-aea5-4a00-aeb4-0fbddf8f16f0.mp4

https://github.com/openshift/console/pull/13052

Bug OCPBUGS-13547: Azure CCM should be promoted to GA

View the Description View the linked PRs

Description of problem:

Azure CCM should be GA before the end of 4.14. When we previously tried to promote it there were issues, so we need to improve the feature gates promotion so that we can promote all components in a single release.
And then promote the CCM to GA once those changes are in place.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-13132: Update 4.14 ose-machine-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1137

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1146

Bug OCPBUGS-15138: The whereabouts-reconciler daemonset lacks the kubernetes.io/os: linux node selector

View the Description View the linked PRs

Description of problem:

All the DaemonSets defined within the openshift-multus namespace have a node selector predicate on the kubernetes.io/os label to schedule the daemonset's pods only on linux workers. The wherebout-reconciler seems missing it. We might need to add the `kubernetes.io/os: linux` label to stay consistent with the other daemonsets definitions and avoid risks in case of clusters with windows workers.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

Always

Steps to Reproduce:

1. oc get daemonsets -n openshift-multus
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
multus                          6         6         6       6            6           kubernetes.io/os=linux   4h1m
multus-additional-cni-plugins   6         6         6       6            6           kubernetes.io/os=linux   4h1m
multus-networkpolicy            6         6         6       6            6           kubernetes.io/os=linux   19s

Actual results:

network-metrics-daemon          6         6         6       6            6           kubernetes.io/os=linux   4h1m whereabouts-reconciler          6         6         6       6            6           <none>                   23s

note the missing kuberentes.io/os nodeselector

Expected results:

The whereabouts-reconciler should also have the nodeselecto term kubernetes.io/os: linux.

Additional info:

https://redhat-internal.slack.com/archives/CFFSAHWHF/p1687158805205059

https://github.com/openshift/cluster-network-operator/pull/1841

Bug OCPBUGS-1117: The oc binary stored at /usr/local/bin in the cli-artifacts image of a non-amd64 payload is the amd64one

View the Description View the linked PRs

Description of problem:

The oc binary stored at /usr/local/bin in the cli-artifacts image of a non-amd64 payload is not the one for the architecture bound to the payload. It is an amd64 binary.

Version-Release number of selected component (if applicable):

4.11.4

How reproducible:

always

Steps to Reproduce:

1. CLI_ARTIFACTS_IMAGE=$(oc adm release info quay.io/openshift-release-dev/ocp-release:4.11.4-aarch64 --image-for=cli-artifacts)
2. CONTAINER=$(podman create $CLI_ARTIFACTS_IMAGE)
3. podman cp $CONTAINER:/usr/bin/oc /tmp/oc
4. file /tmp/oc

Actual results:

/tmp/oc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,.....

Expected results:

It should be a binary bound to the architecture for which the image is built. i.e., using the above aarch64 payload should lead to an arm64 binary at /usr/bin and the other arches bins in /usr/share/openshift

Additional info:

https://github.com/openshift/oc/blob/master/images/cli-artifacts/Dockerfile.rhel#L13

https://github.com/openshift/oc/pull/1374

Bug OCPBUGS-17119: Improve Error Messages for Multiple Required SCC Annotations Failures

View the Description View the linked PRs

Description of problem:
Create two custom SCCs with different permissions, for example, custom-scc-1 with 'privileged' and custom-scc-2 with 'restricted'. Deploy a pod with annotations "openshift.io/required-scc: custom-scc-1, custom-scc-2". Pod deployment failed with error "Error creating: pods "test-747555b669-" is forbidden: required scc/custom-restricted-v2-scc, custom-privileged-scc not found". The system fails to provide appropriate error messages for multiple required SCC annotations, leaving users unable to identify the cause of the failure effectively.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-31-181848

How reproducible:

Always

Steps to Reproduce:

$ oc login -u testuser-0
$ oc new-project scc-test
$ oc create sa scc-test -n scc-test
serviceaccount/scc-test created

$ oc get scc restricted-v2 -o yaml --context=admin > custom-restricted-v2-scc.yaml
$ sed -i -e 's/restricted-v2/custom-restricted-v2-scc/g' -e "s/MustRunAsRange/RunAsAny/" -e "s/priority: null/priority: 10/" custom-restricted-v2-scc.yaml

$ oc create -f custom-restricted-v2-scc.yaml --context=admin
securitycontextconstraints.security.openshift.io/custom-restricted-v2-scc created

$ oc adm policy add-scc-to-user custom-restricted-v2-scc system:serviceaccount:scc-test:scc-test --context=admin
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:custom-restricted-v2-scc added: "scc-test"

$ oc get scc privileged -o yaml --context=admin > custom-privileged-scc.yaml
$ sed -i -e 's/privileged/custom-privileged-scc/g' -e "s/priority: null/priority: 5/" custom-privileged-scc.yaml

$ oc create -f custom-privileged-scc.yaml --context=admin
securitycontextconstraints.security.openshift.io/custom-privileged-scc created

$ oc adm policy add-scc-to-user custom-privileged-scc system:serviceaccount:scc-test:scc-test --context=admin
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:custom-privileged-scc added: "scc-test"


$ cat deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  selector:
    matchLabels:
      deployment: test
  template:
    metadata:
      annotations:
        openshift.io/required-scc: custom-restricted-v2-scc, custom-privileged-scc
      labels:
        deployment: test
    spec:
      containers:
      - args:
        - infinity
        command:
        - sleep
        image: fedora:latest
        name: sleeper
      securityContext:
        runAsNonRoot: true
      serviceAccountName: scc-test


$ oc create -f deployment.yaml 
deployment.apps/test created

$ oc describe rs test-747555b669 | grep FailedCreate
  ReplicaFailure   True    FailedCreate
  Warning  FailedCreate  61s (x15 over 2m23s)  replicaset-controller  Error creating: pods "test-747555b669-" is forbidden: required scc/custom-restricted-v2-scc, custom-privileged-scc not found

Actual results:

Pod deployment failed with "Error creating: pods "test-747555b669-" is forbidden: required scc/custom-restricted-v2-scc, custom-privileged-scc not found"

Expected results:

Either it should ignore the second scc instead of "not found"  or it should show a proper error message

Additional info:

https://github.com/openshift/kubernetes/pull/1661

Bug OCPBUGS-18934: CBO crashes if internal IP is nil

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17589~~. The following is the description of the original issue:
—
This bug has been seen during the analysis of another issue

If the Server Internal IP is not defined, CBO crashes as nil is not handled in https://github.com/openshift/cluster-baremetal-operator/blob/release-4.12/provisioning/utils.go#L99

I0809 17:33:09.683265       1 provisioning_controller.go:540] No Machines with cluster-api-machine-role=master found, set provisioningMacAddresses if the metal3 pod fails to start

I0809 17:33:09.690304       1 clusteroperator.go:217] "new CO status" reason=SyncingResources processMessage="Applying metal3 resources" message=""

I0809 17:33:10.488862       1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.1779c769624884f4  dummy    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ValidatingWebhookConfigurationUpdated,Message:Updated ValidatingWebhookConfiguration.admissionregistration.k8s.io/baremetal-operator-validating-webhook-configuration because it changed,Source:EventSource{Component:,Host:,},FirstTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,LastTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1768fd4]

 

goroutine 574 [running]:

github.com/openshift/cluster-baremetal-operator/provisioning.getServerInternalIP({0x1e774d0?, 0xc0001e8fd0?})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:75 +0x154

github.com/openshift/cluster-baremetal-operator/provisioning.GetIronicIP({0x1ea2378?, 0xc000856840?}, {0x1bc1f91, 0x15}, 0xc0004c4398, {0x1e774d0, 0xc0001e8fd0})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:98 +0xfb

https://github.com/openshift/cluster-baremetal-operator/pull/360

Bug OCPBUGS-12863: cluster-dns-operator repo Issues link directs people to Bugzilla

View the Description View the linked PRs

Description of problem:

Reported in https://github.com/openshift/cluster-ingress-operator/issues/911

When you open a new issue, it still directs you to Bugzilla, and then doesn't work.

It can be changed here: https://github.com/openshift/cluster-ingress-operator/blob/master/.github/ISSUE_TEMPLATE/config.yml
, but to what?

The correct Jira link is
https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1&components=12367900&priority=10300&customfield_12316142=26752

But can the public use this mechanism? Yes - https://redhat-internal.slack.com/archives/CB90SDCAK/p1682527645965899

Version-Release number of selected component (if applicable):

n/a

How reproducible:

May be in other repos too.

Steps to Reproduce:

1. Open Issue in the repo - click on New Issue
2. Follow directions and click on link to open Bugzilla
3. Get message that this doesn't work anymore

Actual results:

You get instructions that don't work to open a bug from an Issue.

Expected results:

You get instructions to just open an Issue, or get correct instructions on how to open a bug using Jira.

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/374

Bug OCPBUGS-16813: ignition-server-proxy and konnectivity-server components only have one replica when two dedicated nodes are allocated and it is in HA mode

View the Description View the linked PRs

Description of problem:

In HA mode there are two dedicated nodes, ignition-server-proxy and konnectivity-server only have one replica, I expect that they have two replicas, each runs on one dedicated node.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. allocate two dedicated nodes
2. create a cluster in HA mode
3. check ignition-server-proxy and konnectivity-server in control plane

Actual results:

ignition-server-proxy and konnectivity-server have one replica

Expected results:

ignition-server-proxy and konnectivity-server have two replicas, each replica runs on one dedicated node

Additional info:

Bug OCPBUGS-4147: More than one cluster can be created in openshift-cluster-api

View the Description View the linked PRs

Description of problem:

More than one cluster can be created in openshift-cluster-api

$ oc get cluster                                                             
NAME                          PHASE          AGE   VERSION
ci-ln-kv1gj4b-72292-jn4rw     Provisioning   19m
ci-ln-kv1gj4b-72292-jn4rw-1   Provisioning   7s

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2022-11-25-204445

How reproducible:

Always

Steps to Reproduce:

1. 
2.
3.

Actual results:

More than one cluster can be created in openshift-cluster-api
$ oc get cluster                                                             NAME                          PHASE          AGE   VERSION ci-ln-kv1gj4b-72292-jn4rw     Provisioning   19m ci-ln-kv1gj4b-72292-jn4rw-1   Provisioning   7s

Expected results:

The cluster-api namespace to be only the cluster you're running on, and allow users to use cluster API for creating other clusters only in other namespaces

Additional info:

Related to https://issues.redhat.com/browse/OCPBUGS-1493

https://github.com/openshift/cluster-capi-operator/pull/106

Bug MGMT-13903: Assisted service creates bare metal hosts with IP outside of machine CIDR

View the Description View the linked PRs

Description of the problem:

When machines have multiple IP addresses assigned to the same network interface the assisted service will create the bare metal host configuration using the first IP address of the interface. That IP address may or may not be inside the machine CIDR of the cluster. If it isn't then the bare metal host will have an IP address that is different to the IP address of the corresponding node. As a result of that the machine operator will not link the machine and the node, and the machine will never move to the `Running` phase. In that situation the corresponding machine pool will never have the minimum required number of replicas. For worker machine pools that means that the cluster will never be considered completely installed.

How reproducible:

Note that this easy to reproduce using the current zero touch provisioning factory workflow, because when machines have a single NIC they will have two IP addresses assigned. May be harder to reproduce in other scenarios.

Steps to reproduce:

1. Create a bare metal cluster with three control plane nodes and one worker node, where nodes have one NIC and two IP addresses assigned to that NIC. In the ZTPFW scenario that will be a static IP address in the 192.168.7.0/24 range (which is the machine CIDR of the cluster) and another IP address assigned via DHCP, say in the 192.168.150.0/24 range (whic is not the machine CIDR of the cluster).

2. Stat the installation.

3. Check the manifests generated by the assisted service, in particular the `99_openshift-cluster-api_hosts-*.yaml` files. Those will contain the definition of the bare metal hosts, together with a `baremetalhost.metal3.io/status` annotation that contains the status that they should have. Check that it contains the wrong IP address in the 192.168.150.0/24 range, outside of the machine CIDR of the cluster.

4. Check that all the machines (oc get machine -A) didn't move to the `Running` phase. That is because the machine API operator can't link them to the nodes due to the missmatching IP addresses: nodes have 192.168.7.* and machines have 192.168.150.* (copied from the bare metal hosts).

5. Check that the worker machine pool doesn't have the minimum required number of replicas.

6. Check that the installation doesn't complete.

Actual results:

The machines aren't in the `Running` phase, the worker pool doesn't have the minimum required number of replicas and the installation doesn't complete.

Expected results:

All the machines should move to the `Running` phase, the worker pool should have the minimum required number of replicas and the installation should complete.

https://github.com/openshift/assisted-service/pull/5024

Bug OCPBUGS-10638: Agent create sub-command is returning fatal error

View the Description View the linked PRs

Description of problem:

Agent create sub-command is showing fatal error when executing invalid command.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Execute `openshift-install agent create invalid`

Actual results:

FATA[0000] Error executing openshift-install: accepts 0 arg(s), received 1

Expected results:

It should return the help of the create command.

Additional info:

https://github.com/openshift/installer/pull/7005

Bug OCPBUGS-12666: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/154

Task HOSTEDCP-972: Update Make file to include command to perform all pre-commit checks

View the Description View the linked PRs

As a developer, I would like a Make file command that performs all the pre-commit checks that should be run before committing any code to GitHub. This includes updating Golang and API dependencies, building the source code, building the e2e's, verifying source code formatting, and running unit tests.

https://github.com/openshift/hypershift/pull/2465

Bug OCPBUGS-10152: Update 4.14 ose-vmware-vsphere-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/62

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/63

Bug OCPBUGS-10570: Wrong PrimarySubnet in OpenstackProviderSpec when using Failure Domains

View the Description View the linked PRs

What happens:

When deploying OpenShift 4.13 with Failure Domains, the PrimarySubnet in the ProviderSpec of the Machine is set to the MachinesSubnet set in install-config.yaml.

What is expected:

Machines in failure domains with a control-plane port target should not use the MachinesSubnet as a primary subnet in the provider spec. it should be the ID of the subnet that is actually used for the control plane on that domain.

How to reproduce:

install-config.yaml:

apiVersion: v1
baseDomain: shiftstack.com
compute:
- name: worker
  platform:
    openstack:
      type: m1.xlarge
  replicas: 1
controlPlane:
  name: master
  platform:
    openstack:
      type: m1.xlarge
      failureDomains:
      - portTargets:
        - id: control-plane
          network:
            id: fb6f8fea-5063-4053-81b3-6628125ed598
          fixedIPs:
          - subnet:
              id: b02175dd-95c6-4025-8ff3-6cf6797e5f86
        computeAvailabilityZone: nova-az1
        storageAvailabilityZone: cinder-az1
      - portTargets:
        - id: control-plane
          network:
            id: 9a5452a8-41d9-474c-813f-59b6c34194b6
          fixedIPs:
          - subnet:
              id: 5fe5b54a-217c-439d-b8eb-441a03f7636d
        computeAvailabilityZone: nova-az1
        storageAvailabilityZone: cinder-az1
      - portTargets:
        - id: control-plane
          network:
            id: 3ed980a6-6f8e-42d3-8500-15f18998c434
          fixedIPs:
          - subnet:
              id: a7d57db6-f896-475f-bdca-c3464933ec02
        computeAvailabilityZone: nova-az1
        storageAvailabilityZone: cinder-az1
  replicas: 3
metadata:
  name: mycluster
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 192.168.10.0/24
  - cidr: 192.168.20.0/24
  - cidr: 192.168.30.0/24
  - cidr: 192.168.72.0/24
  - cidr: 192.168.100.0/24
platform:
  openstack:
    cloud: foch_openshift
    machinesSubnet: b02175dd-95c6-4025-8ff3-6cf6797e5f86
    apiVIPs:
    - 192.168.100.240
    ingressVIPs:
    - 192.168.100.250
    loadBalancer:
      type: UserManaged
featureSet: TechPreviewNoUpgrade

Machine spec:

  Provider Spec:
    Value:
      API Version:  machine.openshift.io/v1alpha1
      Cloud Name:   openstack
      Clouds Secret:
        Name:       openstack-cloud-credentials
        Namespace:  openshift-machine-api
      Flavor:       m1.xlarge
      Image:        foch-bgp-2fnjz-rhcos
      Kind:         OpenstackProviderSpec
      Metadata:
        Creation Timestamp:  <nil>
      Networks:
        Filter:
        Subnets:
          Filter:
            Id:        5fe5b54a-217c-439d-b8eb-441a03f7636d
        Uuid:          9a5452a8-41d9-474c-813f-59b6c34194b6
      Primary Subnet:  b02175dd-95c6-4025-8ff3-6cf6797e5f86
      Security Groups:
        Filter:
        Name:  foch-bgp-2fnjz-master
        Filter:
        Uuid:             1b142123-c085-4e14-b03a-cdf5ef028d91
      Server Group Name:  foch-bgp-2fnjz-master
      Server Metadata:
        Name:                  foch-bgp-2fnjz-master
        Openshift Cluster ID:  foch-bgp-2fnjz
      Tags:
        openshiftClusterID=foch-bgp-2fnjz
      Trunk:  true
      User Data Secret:
        Name:  master-user-data
Status:
  Addresses:
    Address:  192.168.20.20
    Type:     InternalIP
    Address:  foch-bgp-2fnjz-master-1
    Type:     Hostname
    Address:  foch-bgp-2fnjz-master-1
    Type:     InternalDNS

The machine is connected to the right subnet, but has a wrong PrimarySubnet configured.

https://github.com/openshift/installer/pull/6994

Bug OCPBUGS-7978: Bump FCOS image to latest stable

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6902

Bug OCPBUGS-10831: It is better for pod-security admission config to use v1 like upstream instead of still using v1beta1

View the Description View the linked PRs

Description of problem:
It is better for pod-security admission config to use v1 like upstream instead of still using v1beta1

Version-Release number of selected component (if applicable):
4.12, 4.13

How reproducible:
Always

Steps to Reproduce:
1. In upstream, when it was 1.24, https://v1-24.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/#configure-the-admission-controller shows "pod-security.admission.config.k8s.io/v1beta1".

When it was 1.25 (OCP 4.12), https://v1-25.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/#configure-the-admission-controller does not show "shows pod-security.admission.config.k8s.io/v1beta1" any longer. In the bottom, it notes: pod-security.admission.config.k8s.io/v1 configuration requires v1.25+. For v1.23 and v1.24, use v1beta1.

In OCP 4.12 (1.25) and 4.13 (1.26), it is still v1beta1, we'd better to align with upstream:

4.12:
$ oc version
..
Server Version: 4.12.9
Kubernetes Version: v1.25.7+eab9cc9

$ jq "" $(oc extract cm/config -n openshift-kube-apiserver --confirm) | jq '.admission.pluginConfig.PodSecurity'
{
  "configuration": {
    "apiVersion": "pod-security.admission.config.k8s.io/v1beta1",
    "defaults": {
      "audit": "restricted",
      "audit-version": "latest",
      "enforce": "privileged",
      "enforce-version": "latest",
      "warn": "restricted",
      "warn-version": "latest"
    },
    "exemptions": {
      "usernames": [
        "system:serviceaccount:openshift-infra:build-controller"
      ]
    },
    "kind": "PodSecurityConfiguration"
  }
}

4.13:
$ oc version
...
Server Version: 4.13.0-0.nightly-2023-03-23-204038
Kubernetes Version: v1.26.2+dc93b13

$ jq "" $(oc extract cm/config -n openshift-kube-apiserver --confirm) | jq '.admission.pluginConfig.PodSecurity'
{
  "configuration": {
    "apiVersion": "pod-security.admission.config.k8s.io/v1beta1",
    "defaults": {
      "audit": "restricted",
      "audit-version": "latest",
      "enforce": "privileged",
      "enforce-version": "latest",
      "warn": "restricted",
      "warn-version": "latest"
    },
    "exemptions": {
      "usernames": [
        "system:serviceaccount:openshift-infra:build-controller"
      ]
    },
    "kind": "PodSecurityConfiguration"
  }
}

Actual results:

See above.

Expected results:

It is better for pod-security admission config to align with upstream to use v1 than v1beta1.

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1481

Bug OCPBUGS-18249: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1699

Bug OCPBUGS-9970: Cluster CAPI operator webhook for Cluster object panics with empty spec

View the Description View the linked PRs

Description of problem:

InfraStructureRef* is dereferenced without checking for nil value

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Run TechPreview cluster
2. Try to create Cluster object with empty spec
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: example
  namespace: openshift-cluster-api
spec: {}
 3. Observe panic in cluster-capi-operator

Actual results:

2023/03/10 14:13:31 http: panic serving 10.129.0.2:39614: runtime error: invalid memory address or nil pointer dereference
goroutine 3619 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1850 +0xbf
panic({0x16cada0, 0x2948bc0})
    /usr/lib/golang/src/runtime/panic.go:890 +0x262
github.com/openshift/cluster-capi-operator/pkg/webhook.(*ClusterWebhook).ValidateCreate(0xc000ceac00?, {0x24?, 0xc00090fff0?}, {0x1b72d68?, 0xc0010831e0?})
    /go/src/github.com/openshift/cluster-capi-operator/pkg/webhook/cluster.go:32 +0x39
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatorForType).Handle(_, {_, _}, {{{0xc000ceac00, 0x24}, {{0xc00090fff0, 0x10}, {0xc000838000, 0x7}, {0xc000838007, ...}}, ...}})
    /go/src/github.com/openshift/cluster-capi-operator/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/validator_custom.go:79 +0x2dd
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000ceac00, 0x24}, {{0xc00090fff0, 0x10}, {0xc000838000, 0x7}, {0xc000838007, ...}}, ...}})
    /go/src/github.com/openshift/cluster-capi-operator/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:169 +0xfd
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc000630e80, {0x7f26f94b5580?, 0xc000f80280}, 0xc000750800)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:98 +0xeb5
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7f26f94b5580, 0xc000f80280}, 0x1b7ff00?)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60 +0xd4
net/http.HandlerFunc.ServeHTTP(0x1b7ffb0?, {0x7f26f94b5580?, 0xc000f80280?}, 0x7afe60?)
    /usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1b7ffb0?, 0xc000a72000?}, 0xc000750800)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:146 +0xb8
net/http.HandlerFunc.ServeHTTP(0x0?, {0x1b7ffb0?, 0xc000a72000?}, 0xc00056f0e1?)
    /usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1b7ffb0, 0xc000a72000}, 0xc000750800)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:108 +0xbf
net/http.HandlerFunc.ServeHTTP(0xc000a72000?, {0x1b7ffb0?, 0xc000a72000?}, 0x18e45d1?)
    /usr/lib/golang/src/net/http/server.go:2109 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00056f0c0?, {0x1b7ffb0, 0xc000a72000}, 0xc000750800)
    /usr/lib/golang/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0x1b71dc8?}, {0x1b7ffb0, 0xc000a72000}, 0xc000750800)
    /usr/lib/golang/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc00039af00, {0x1b81198, 0xc000416c00})
    /usr/lib/golang/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3102 +0x4db

Expected results:

Webhook returns error, but does not panic

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/116

Bug OCPBUGS-16733: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3827

Bug OCPBUGS-18336: HA konnectivity server causes connectivity issues from kas to worker kubelets

View the Description View the linked PRs

Description of problem:


On August 24th, a bugfix was merged into the hypershift repo to address OCPBUGS-16813 (https://github.com/openshift/hypershift/pull/2942). This resulted in a change in the konnectivity server with the HCP namespace. The change is that we went from a single konnectivity server to multiple when HA hcps are in use.

The konnectivity agents within the HCP worker nodes connect to the server through a route. When connecting through this route, the agents on the worker are supposed to discover all the HA konnectivity servers through round robin load balancing, meaning if the agents try to connect to the route endpoint enough times, the theory is that they should eventually discover all the servers.

With the kubevirt platform, only a single konnectivity server is discovered by the agents in the worker nodes, which leads to the inability for the kas on the HCP to reliably contact kubelets within the worker nodes.

The outcome of this issue is that webhooks (and other connections that require the kas (api server) in the HCP to contact worker nodes) to fail the majority of the time.

Version-Release number of selected component (if applicable):

How reproducible:


create a kubevirt platform HCP using the `hcp` cli tool. This will default to HA mode, and the cluster will never fully roll out. The ingress, monitoring, and console clusteroperators will flap back and forth between failing and success. Usually we'll see an error about webhook connectivity failing.

During this time, any `oc` command that attempts to tunnel a connection through the kas to the kubelets will fail the majority of the time. This means `oc logs`, `oc exec`, etc... will not work. 


Actual results:{code:none}

kas -> kubelet connections are unreliable

Expected results:


kas -> kubelet connections are reliable

Additional info:

https://github.com/openshift/hypershift/pull/2971

Bug OCPBUGS-6372: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/175

Bug OCPBUGS-6759: New master couldn't be created when update cpms on ASH

View the Description View the linked PRs

Description of problem:

Update cpms vmSize on ASH, got error "The value 1024 of parameter 'osDisk.diskSizeGB' is out of range. The value must be between '1' and '1023', inclusive." Target="osDisk.diskSizeGB"when provisioning new control plane node, change diskSizeGB to 1023, new nodes are provisioned. But for fresh install, the default diskSizeGB is 1024 for master.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-27-165107

How reproducible:

Always

Steps to Reproduce:

1. Update cpms vmSize to Standard_DS3_v2
2. Check new machine state
$ oc get machine  
NAME                                PHASE     TYPE              REGION   ZONE   AGE
jima28b-r9zht-master-h7g67-1        Running   Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-hhfzl-0        Failed                                      24s
jima28b-r9zht-master-qtb9j-0        Running   Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-tprc7-2        Running   Standard_DS5_v2   mtcazs          11h

$ oc get machine jima28b-r9zht-master-hhfzl-0 -o yaml
  errorMessage: 'failed to reconcile machine "jima28b-r9zht-master-hhfzl-0": failed
    to create vm jima28b-r9zht-master-hhfzl-0: failure sending request for machine
    jima28b-r9zht-master-hhfzl-0: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate:
    Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter"
    Message="The value 1024 of parameter ''osDisk.diskSizeGB'' is out of range. The
    value must be between ''1'' and ''1023'', inclusive." Target="osDisk.diskSizeGB"'
  errorReason: InvalidConfiguration
  lastUpdated: "2023-01-29T02:35:13Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2023-01-29T02:35:13Z"
      message: 'failed to create vm jima28b-r9zht-master-hhfzl-0: failure sending
        request for machine jima28b-r9zht-master-hhfzl-0: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate:
        Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter"
        Message="The value 1024 of parameter ''osDisk.diskSizeGB'' is out of range.
        The value must be between ''1'' and ''1023'', inclusive." Target="osDisk.diskSizeGB"'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreated
    metadata: {}
3. Checke logs
$ oc logs -f machine-api-controllers-84444d49f-mlldl -c machine-controller
I0129 02:35:15.047784       1 recorder.go:103] events "msg"="InvalidConfiguration: failed to reconcile machine \"jima28b-r9zht-master-hhfzl-0\": failed to create vm jima28b-r9zht-master-hhfzl-0: failure sending request for machine jima28b-r9zht-master-hhfzl-0: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"InvalidParameter\" Message=\"The value 1024 of parameter 'osDisk.diskSizeGB' is out of range. The value must be between '1' and '1023', inclusive.\" Target=\"osDisk.diskSizeGB\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima28b-r9zht-master-hhfzl-0","uid":"6cb07114-41a6-40bc-8e83-d9f27931bc8c","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"451889"} "reason"="FailedCreate" "type"="Warning"

 $ oc logs -f control-plane-machine-set-operator-69b756df4f-skv4x E0129 02:35:13.282358       1 controller.go:818]  "msg"="Observed failed replacement control plane machines" "error"="found replacement control plane machines in an error state, the following machines(s) are currently reporting an error: jima28b-r9zht-master-hhfzl-0" "controller"="controlplanemachineset" "failedReplacements"="jima28b-r9zht-master-hhfzl-0" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="a988d699-8ddc-4880-9930-0db64ca51653" I0129 02:35:13.282380       1 controller.go:264]  "msg"="Cluster state is degraded. The control plane machine set will not take any action until issues have been resolved." "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="a988d699-8ddc-4880-9930-0db64ca51653" 
4. Change diskSizeGB to 1023, new machine Provisioned.
            osDisk:
              diskSettings: {}
              diskSizeGB: 1023

$ oc get machine                  
NAME                                PHASE      TYPE              REGION   ZONE   AGE
jima28b-r9zht-master-h7g67-1        Running    Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-hhfzl-0        Deleting                                     7m1s
jima28b-r9zht-master-qtb9j-0        Running    Standard_DS5_v2   mtcazs          12h
jima28b-r9zht-master-tprc7-2        Running    Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-worker-mtcazs-p8d79   Running    Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-x5gvh   Running    Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-xmdvw   Running    Standard_DS3_v2   mtcazs          18h
$ oc get machine        
NAME                                PHASE         TYPE              REGION   ZONE   AGE
jima28b-r9zht-master-h7g67-1        Running       Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-qtb9j-0        Running       Standard_DS5_v2   mtcazs          12h
jima28b-r9zht-master-tprc7-2        Running       Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-vqd7r-0        Provisioned   Standard_DS3_v2   mtcazs          16s
jima28b-r9zht-worker-mtcazs-p8d79   Running       Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-x5gvh   Running       Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-xmdvw   Running       Standard_DS3_v2   mtcazs          18h

Actual results:

For fresh install, the default diskSizeGB is 1024 for master. But update cpms vmSize, new master was created failed, report error "The value 1024 of parameter ''osDisk.diskSizeGB'' is out of range.  The value must be between ''1'' and ''1023'', inclusive"
When changing diskSizeGB to 1023, new machine got Provisioned.

Expected results:

New master could be created when change vmtype, and don't need update diskSizeGB to 1023.

Additional info:

Minimum recommendation for control plane nodes is 1024 GB
https://docs.openshift.com/container-platform/4.12/installing/installing_azure_stack_hub/installing-azure-stack-hub-network-customizations.html#installation-azure-stack-hub-config-yaml_installing-azure-stack-hub-network-customizations

https://github.com/openshift/installer/pull/7100

Bug OCPBUGS-11298: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/751

Bug OCPBUGS-10207: When releaseImage is a digest the create image command generates spurious warning

View the Description View the linked PRs

Description of problem:

When the releaseImage is a digest, for example quay.io/openshift-release-dev/ocp-release@sha256:bbf1f27e5942a2f7a0f298606029d10600ba0462a09ab654f006ce14d314cb2c, a spurious warning is putput when running
openshift-install agent create image

Its not calculating the releaseImage properly (see the '@sha' suffix below) so it causes this spurious message
WARNING The ImageContentSources configuration in install-config.yaml should have at-least one source field matching the releaseImage value quay.io/openshift-release-dev/ocp-release@sha256 

This can cause confusion for users.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Every time when using a release image with a digest is used

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6971

Bug OCPBUGS-11996: Make Serverless form is broken

View the Description View the linked PRs

Description of problem:

Not able to convert a deployment to a Serverless as Make Serverless form in the console is broken.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. Create a deployment using a Container image flow
2. Select Make Serverless option from the topology actions menu of the created deployment
3.

Actual results:

After clicking on create it throw an error

Expected results:

Should create a Serverless resource.

Additional info:

https://github.com/openshift/console/pull/12815

Bug OCPBUGS-17860: Unnecessary SG opening 0.0.0.0/0 on OpenStack

View the Description View the linked PRs

Description of problem:

OpenStack features SG rules opening traffic from `0.0.0.0/0` on NodePorts. This was required for the OVN loadbalancers to work properly as they keep the source IP of the traffic when traffic reaches the LB members. This isn't needed anymore as in 4.14 OSASINFRA-3067 implemented and enabled `manage-security-groups` option on the cloud-provider-openstack, so that it will create and attach the proper SG on its own to make sure only necessary NodePorts are open.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Check for existence of rules opening traffic from 0.0.0.0/0 on the master and worker nodes.

Actual results:

Rules are still there.

Expected results:

Rules are not needed anymore.

Additional info:

https://github.com/openshift/installer/pull/7405

Bug MGMT-14161: Wrong cpu_architecture spec for InfraEnvs

View the Description View the linked PRs

Description of the problem:

According to swagger.yaml cpu_architecture in infra-envs can include 'multi', but that only makes sense in the cluster entity.

(Slack thread: https://redhat-internal.slack.com/archives/CUPJTHQ5P/p1680095368006089)

How reproducible:

100%

Steps to reproduce:

1. Check out the swagger.yaml here

Actual results:

enum: ['x86_64', 'aarch64', 'arm64','ppc64le','s390x','multi']

Expected results:

enum: ['x86_64', 'aarch64', 'arm64','ppc64le','s390x']

https://github.com/openshift/assisted-service/pull/5098

Bug OCPBUGS-13107: cluster-destroy: Too many workers for a standalone OSP

View the Description View the linked PRs

Description of problem:

Running `openshift-install cluster destroy` defeats an OpenStack cloud with many Swift objects, if said cloud is low on resources.

In particular, testing the teardown of an OCP cluster with 500.000 objects in the image registry caused RabbitMQ to crash on a standalone (single-host) OpenStack deployment backed with NVMe storage.

Version-Release number of selected component (if applicable):

How reproducible:

on a constrained (single-host) OpenStack cloud, with the default limit of 10000 to the bulk-deletion of Swift objects.

Steps to Reproduce:

1. install OpenShift
2. upload 500000 arbitrary objects in the image-registry container
3. launch cluster teardown
4. enjoy Swift responding 504 errors, and the rest of the cluster to become unstable

https://github.com/openshift/installer/pull/7165

Bug OCPBUGS-13190: Ingress Operator is needlessly reverting default values in Internal Services

View the Description View the linked PRs

Description of problem:

Ingress operator is constantly reverting Internal Services when it detects a service change that are default values.

Version-Release number of selected component (if applicable):

4.13, 4.14

How reproducible:

100%

Steps to Reproduce:

1. Create an ingress controller
2. Watch ingress operator logs for excess updates "updated internal service"
[I'll provide a more specific reproducer if needed]

Actual results:

Excess:
2023-05-04T02:08:02.331Z INFO operator.ingress_controller ingress/internal_service.go:44 updated internal service ...

Expected results:

No updates

Additional info:

The diff looks like:
2023-05-05T15:12:06.668Z    INFO    operator.ingress_controller    ingress/internal_service.go:44    updated internal service    {"namespace": "openshift-ingress", "name": "router-internal-default", "diff": "  &v1.Service{
    TypeMeta:   {},
    ObjectMeta: {Name: \"router-internal-default\", Namespace: \"openshift-ingress\", UID: \"815f1499-a4d4-4cb8-9a5b-9905580e0ffd\", ResourceVersion: \"8031\", ...},
    Spec: v1.ServiceSpec{
      Ports:                    {{Name: \"http\", Protocol: \"TCP\", Port: 80, TargetPort: {Type: 1, StrVal: \"http\"}, ...}, {Name: \"https\", Protocol: \"TCP\", Port: 443, TargetPort: {Type: 1, StrVal: \"https\"}, ...}, {Name: \"metrics\", Protocol: \"TCP\", Port: 1936, TargetPort: {Type: 1, StrVal: \"metrics\"}, ...}},
      Selector:                 {\"ingresscontroller.operator.openshift.io/deployment-ingresscontroller\": \"default\"},
      ClusterIP:                \"172.30.56.107\",
-     ClusterIPs:               []string{\"172.30.56.107\"},
+     ClusterIPs:               nil,
      Type:                     \"ClusterIP\",
      ExternalIPs:              nil,
-     SessionAffinity:          \"None\",
+     SessionAffinity:          \"\",
      LoadBalancerIP:           \"\",
      LoadBalancerSourceRanges: nil,
      ... // 3 identical fields
      PublishNotReadyAddresses:      false,
      SessionAffinityConfig:         nil,
-     IPFamilies:                    []v1.IPFamily{\"IPv4\"},
+     IPFamilies:                    nil,
-     IPFamilyPolicy:                &\"SingleStack\",
+     IPFamilyPolicy:                nil,
      AllocateLoadBalancerNodePorts: nil,
      LoadBalancerClass:             nil,
-     InternalTrafficPolicy:         &\"Cluster\",
+     InternalTrafficPolicy:         nil,
    },
    Status: {},
  }
"}

Messing around with unit testing, it looks like internalServiceChanged triggers true when spec.IPFamilies, spec.IPFamilyPolicy, and spec.InternalTrafficPolicy are set to the default values that you see in the diff above.

Ingress operator then resets back to nil, then the API server sets them to their defaults, and this process repeats.

internalServiceChanged should either ignore, or explicitly set these values.

https://github.com/openshift/cluster-ingress-operator/pull/927

Bug OCPBUGS-17504: Dev console: Alert details page's "Silenced By" list should not have checkboxes

View the Description View the linked PRs

Same issue as https://issues.redhat.com/browse/OU-230, but for the Developer console.

https://github.com/openshift/console/pull/13085

Bug OCPBUGS-18724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1693

Bug MGMT-14165: Cluster not ready because Machine Network CIDR is undefined

View the Description View the linked PRs

Description of the problem:

In the Create cluster wizard -> Networking page, an error is shown saying that the cluster is not ready yet. The warning message suggests to define the API or Ingress IP but they are already input in the form and in the YAML (see screenshots attached)

Also, the hosts are oscillating between "Pending input" and "Insufficient" states, with the errors shown in the images

Found this error while testing epic MGMT-9907

MCE image 2.3.0-DOWNANDBACK-2023-03-28-23-01-58

https://github.com/openshift/assisted-service/pull/5275

Task MON-3291: Adjust node-exporter's MaxProcs doc now that we set GOMAXPROCS

View the Description View the linked PRs

Now that https://issues.redhat.com//browse/OCPBUGS-13153 sets a default value of GOMAXPROCS before running node exporter, see https://github.com/openshift/cluster-monitoring-operator/pull/1996

The doc at https://github.com/openshift/cluster-monitoring-operator/blob/45bdf6f0148b771618d0dd89c432e7a1932e7a0a/pkg/manifests/types.go#L289-L295 should be adjusted.

https://github.com/openshift/cluster-monitoring-operator/pull/2055

Bug OCPBUGS-10127: Update 4.14 ose-machine-api-provider-aws image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/62

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/62

Bug OCPBUGS-13093: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/17

Bug OCPBUGS-14988: CNO rebase to kube 1.27

View the Description View the linked PRs

Should not need any special QA.

https://github.com/openshift/cluster-network-operator/pull/1826

Bug OCPBUGS-15531: [4.14] DaemonSet fails to scale down during the rolling update when maxUnavailable=0

View the Description View the linked PRs

Description of problem:

The OpenShift DNS daemonset has the rolling update strategy. The "maxSurge" parameter is set to a non zero value which means that the "maxUnavailable" parameter is set to zero. When the user replaces the toleration in the daemonset's template spec (via the OpenShift DNS config API) from the one which helps to be scheduled on the master node into any other toleration: the new pods are still trying to be scheduled on the master nodes. The old pods from the tolerated nodes can be lucky enough to be recreated but only if they go before any pod from the intolerable node.

The new pods are not expected to be scheduled on the nodes which are not tolerated by the new damonset's template spec. The daemonset controller should just delete the old pods from the nodes which cannot be tolerated anymore. The old pods from the nodes which can still be tolerated should be recreated according to the rolling update parameters.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:
1. Create the daemonset which tolerates "node-role.kubernetes.io/master" taint and has the following rolling update parameters:

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.updateStrategy
rollingUpdate:
  maxSurge: 10%
  maxUnavailable: 0
type: RollingUpdate

$ oc  -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: node-role.kubernetes.io/master
  operator: Exists

2. Let the daemonset to be scheduled on all the target nodes (e.g. all masters and all workers)

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          119m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-9cjdf     2/2     Running   0          2m35s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          119m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          2m12s   10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          119m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          112m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

3. Update the daemonset's tolerations by removing "node-role.kubernetes.io/master" and adding any other toleration (not existing works too):

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: test-taint
  operator: Exists

Actual results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          124m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-76vjz     0/2     Pending   0          3m2s    <none>        <none>                                     <none>           <none>
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          124m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          124m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          117m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Expected results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-mmc78     2/2     Running   0          7m54s   10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Additional info:
Upstream issue: https://github.com/kubernetes/kubernetes/issues/118823
Slack discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1687455135950439

https://github.com/openshift/kubernetes/pull/1717

Bug OCPBUGS-16726: [4.14] don't enforce PSa in 4.14

View the Description View the linked PRs

Description of problem:

We shouldn't enforce PSa in 4.14, neither by label sync, neither by global cluster config.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

As a cluster admin:
1. create two new namespaces/projects: pokus, openshift-pokus
2. as a cluster-admin, attempt to create a privileged pod in both the namespaces from 1.

Actual results:

pod creation is blocked by pod security admission

Expected results:

only a warning about pod violating the namespace pod security level should be emitted

Additional info:

https://github.com/openshift/cluster-config-operator/pull/354

Bug OCPBUGS-16809: IgnitionServer proxy configuration is not ready for IPv6

View the Description View the linked PRs

Description of problem:

When you have a HCP running and it's creating the HostedCluster pods it renders this IgnitionProxy config:

defaults
  mode http
  timeout connect 5s
  timeout client 30s
  timeout server 30s

frontend ignition-server
  bind *:8443 ssl crt /tmp/tls.pem
  default_backend ignition_servers

backend ignition_servers
  server ignition-server ignition-server:443 check ssl ca-file /etc/ssl/root-ca/ca.crt

This Configuration is not supported on Ipv6 causing the worker nodes to fail downloading the Ignition Payload

Version-Release number of selected component (if applicable):

MCE 2.4
OCP 4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with the networking parameters set to IPv6 networks.
2. Check the IgnitionProxy config using: 

oc rsh <pod>
cat /tmp/haproxy.conf

Actual results:

Agent pod in the destination workers fails with:

Jul 26 10:23:44 localhost.localdomain next_step_runne[4242]: time="26-07-2023 10:23:44" level=error msg="ignition file download failed: request failed: Get \"https://ignition-server-clusters-hosted.apps.ocp-edge-cluster-0.qe.lab.redhat.com/ignition\": EOF" file="apivip_check.go:160"

Expected results:

The worker should download the ignition payload properly

Additional info:

N/A

https://github.com/openshift/hypershift/pull/2850

Bug METAL-575: TargetDown: alert firing on 4.14 metal-ipi jobs

View the Description View the linked PRs

4.14 e2e-metal-ipi jobs are failing with

: [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

e.g. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn/1643459330390888448

This is the alert that is firing,

promQL query returned unexpected results:
ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards|KubeJobFailed|Watchdog|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|etcdMembersDown|etcdMembersDown|etcdGRPCRequestsSlow|etcdGRPCRequestsSlow|etcdHighNumberOfFailedGRPCRequests|etcdHighNumberOfFailedGRPCRequests|etcdMemberCommunicationSlow|etcdMemberCommunicationSlow|etcdNoLeader|etcdNoLeader|etcdHighFsyncDurations|etcdHighFsyncDurations|etcdHighCommitDurations|etcdHighCommitDurations|etcdInsufficientMembers|etcdInsufficientMembers|etcdHighNumberOfLeaderChanges|etcdHighNumberOfLeaderChanges|KubeAPIErrorBudgetBurn|KubeAPIErrorBudgetBurn|KubeClientErrors|KubeClientErrors|KubePersistentVolumeErrors|KubePersistentVolumeErrors|MCDDrainError|MCDDrainError|MCDPivotError|MCDPivotError|PrometheusOperatorWatchErrors|PrometheusOperatorWatchErrors|RedhatOperatorsCatalogError|RedhatOperatorsCatalogError|VSphereOpenshiftNodeHealthFail|VSphereOpenshiftNodeHealthFail|SamplesImagestreamImportFailing|SamplesImagestreamImportFailing",alertstate="firing",severity!="info"} >= 1
[
{
"metric":

{ "__name__": "ALERTS", "alertname": "TargetDown", "alertstate": "firing", "job": "catalog-operator-metrics", "namespace": "openshift-operator-lifecycle-manager", "prometheus": "openshift-monitoring/k8s", "service": "catalog-operator-metrics", "severity": "warning" }

,
"value": [
1680670057.374,
"1"
]
},

https://github.com/openshift/operator-framework-olm/pull/478

Bug OCPBUGS-12794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/514

Bug OCPBUGS-13133: Update 4.14 ose-vsphere-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/37

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/37

Bug OCPBUGS-18568: Wrong image override is used in ignition server

View the Description View the linked PRs

Description of problem:

Currently, we unconditionally use an image mapping from the management
cluster if a mapping exists for ocp-release-dev or ocp/release.
When the individual images do not use those registries, the wrong
mapping is used.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.Create an ICSP on a management cluster:

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: image-policy-39
spec:
  repositoryDigestMirrors:
  - mirrors:
    - quay.io/openshift-release-dev/ocp-release
    - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
    source: quay.io/openshift-release-dev/ocp-release

2. Create a HostedCluster that uses a CI release

Actual results:

Nodes never join because ignition server is looking up the wrong image for the CCO and MCO.

Expected results:

Nodes can join the cluster.

Additional info:

https://github.com/openshift/hypershift/pull/2985

Bug OCPBUGS-10146: Update 4.14 coredns image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/89

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/89

Bug OCPBUGS-11792: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-11788~~.

https://github.com/openshift/installer/pull/7135

Bug OCPBUGS-14614: 4.14 Metal IPv6 Installs are worse than 4.13

View the Description View the linked PRs

Description of problem:


TRT has identified a likely regression in Metal IPv6 installations.  4.14 installs are statistically worse than 4.13. We are working on a new tool called Component Readiness that does cross-release comparisons to ensure nothing get worse. I think it has found something in metal.

At GA, 4.13 metal installs for ipv6 upgrade micro jobs were 100%.  They are now around 89% in 4.14.  All the failures seem to have the same mode where no workers come up, with PXE errors in the serial console.  

 !image-2023-06-06-10-13-13-310.png|thumbnail! 

You can view the report here:

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-05-16%2023%3A59%3A59&baseRelease=4.13&baseStartTime=2023-04-18%2000%3A00%3A00&capability=Other&component=Installer%20%2F%20openshift-installer&confidence=95&environment=ovn%20upgrade-micro%20amd64%20metal-ipi%20standard&excludeArches=arm64&excludeClouds=alibaba%2Cibmcloud%2Clibvirt%2Covirt&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=metal-ipi&sampleEndTime=2023-06-06%2023%3A59%3A59&sampleRelease=4.14&sampleStartTime=2023-05-09%2000%3A00%3A00&testId=cluster%20install%3A0cb1bb27e418491b1ffdacab58c5c8c0&testName=install%20should%20succeed%3A%20overall&upgrade=upgrade-micro&variant=standard

The serial console on the workers shows PXE errors:

>>Start PXE over IPv4.
  PXE-E18: Server response timeout.
BdsDxe: failed to load Boot0001 "UEFI PXEv4 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6..
  Station IP address is FD00:1101:0:0:2EE1:8456:96FB:68B1
  Server IP address is FD00:1101:0:0:0:0:0:3
  NBP filename is snponly.efi
  NBP filesize is 0 Bytes
  PXE-E18: Server response timeout.
BdsDxe: failed to load Boot0002 "UEFI PXEv6 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Not Found

>>Start HTTP Boot over IPv4.
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Server response timeout.
BdsDxe: failed to load Boot0003 "UEFI HTTPv4 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(): Not Found

>>Start HTTP Boot over IPv6..
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Remote boot cancelled.
BdsDxe: failed to load Boot0004 "UEFI HTTPv6 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri(): Not Found
BdsDxe: No bootable option or device was found.
BdsDxe: Press any key to enter the Boot Manager Menu.

Version-Release number of selected component (if applicable):


4.14

How reproducible:

10%

Steps to Reproduce:

1. 
2.
3.

Actual results:

Expected results:

Additional info:


Example failures:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1665428719952465920

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1664711616538611712

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1664645418744549376

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1663915360878858240

https://github.com/openshift/ironic-static-ip-manager/pull/39

Bug OCPBUGS-7699: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug MGMT-14881: Discovery step won't pass the "Host Discovery" step if there are not disks in the hosts without any proper message

View the Description View the linked PRs

Description of the problem:

Creating a host without any disks will cause the following error log message without any indicative error message displayed to the user.

In this case the status remains Discovering and the user cannot know what the issue is.

Log from the service:

time="2023-06-07T12:36:09Z" level=error msg="failed to create new validation context for host e0b465cc-e91f-4ca6-9594-27052a9a6f28" func="github.com/openshift/assisted-service/internal/host.(*Manager).IsValidMasterCandidate" file="/assisted-service/internal/host/host.go:1280" error="Inventory is not valid" pkg=cluster-state

Example inventory:

{
  "bmc_address": "0.0.0.0",
  "bmc_v6address": ":: /0",
  "boot": {
    "current_boot_mode": "uefi"
  },
  "cpu": {
    "architecture": "x86_64",
    "count": 8,
    "flags": [
      "fpu",
      "vme",
      "de",
      "pse",
      "tsc",
      "msr",
      "pae",
      "mce",
      "cx8",
      "apic",
      "sep",
      "mtrr",
      "pge",
      "mca",
      "cmov",
      "pat",
      "pse36",
      "clflush",
      "mmx",
      "fxsr",
      "sse",
      "sse2",
      "ht",
      "syscall",
      "nx",
      "mmxext",
      "fxsr_opt",
      "pdpe1gb",
      "rdtscp",
      "lm",
      "rep_good",
      "nopl",
      "cpuid",
      "extd_apicid",
      "tsc_known_freq",
      "pni",
      "pclmulqdq",
      "ssse3",
      "fma",
      "cx16",
      "pcid",
      "sse4_1",
      "sse4_2",
      "x2apic",
      "movbe",
      "popcnt",
      "tsc_deadline_timer",
      "aes",
      "xsave",
      "avx",
      "f16c",
      "rdrand",
      "hypervisor",
      "lahf_lm",
      "cmp_legacy",
      "cr8_legacy",
      "abm",
      "sse4a",
      "misalignsse",
      "3dnowprefetch",
      "osvw",
      "topoext",
      "perfctr_core",
      "ssbd",
      "ibrs",
      "ibpb",
      "stibp",
      "vmmcall",
      "fsgsbase",
      "tsc_adjust",
      "bmi1",
      "avx2",
      "smep",
      "bmi2",
      "rdseed",
      "adx",
      "smap",
      "clflushopt",
      "clwb",
      "sha_ni",
      "xsaveopt",
      "xsavec",
      "xgetbv1",
      "xsaves",
      "clzero",
      "xsaveerptr",
      "wbnoinvd",
      "arat",
      "umip",
      "vaes",
      "vpclmulqdq",
      "rdpid",
      "arch_capabilities"
    ],
    "frequency": 2545.214,
    "model_name": "AMD EPYC 7J13 64-Core Processor"
  },
  "disks": [],
  "gpus": [
    {
      "address": "0000: 00: 02.0"
    }
  ],
  "hostname": "02-00-17-01-2c-cf",
  "interfaces": [
    {
      "flags": [
        "up",
        "broadcast",
        "multicast"
      ],
      "has_carrier": true,
      "ipv4_addresses": [
        "10.0.28.205/20"
      ],
      "ipv6_addresses": [],
      "mac_address": "02: 00: 17: 01: 2c: cf",
      "mtu": 9000,
      "name": "ens3",
      "product": "0x101e",
      "speed_mbps": 50000,
      "type": "physical",
      "vendor": "0x15b3"
    }
  ],
  "memory": {
    "physical_bytes": 17179869184,
    "physical_bytes_method": "dmidecode",
    "usable_bytes": 16765730816
  },
  "routes": [
    {
      "destination": "0.0.0.0",
      "family": 2,
      "gateway": "10.0.16.1",
      "interface": "ens3",
      "metric": 100
    },
    {
      "destination": "10.0.16.0",
      "family": 2,
      "interface": "ens3",
      "metric": 100
    },
    {
      "destination": "10.88.0.0",
      "family": 2,
      "interface": "cni-podman0"
    },
    {
      "destination": "169.254.0.0",
      "family": 2,
      "interface": "ens3",
      "metric": 100
    },
    {
      "destination": ":: 1",
      "family": 10,
      "interface": "lo",
      "metric": 256
    },
    {
      "destination": "fe80:: ",
      "family": 10,
      "interface": "cni-podman0",
      "metric": 256
    },
    {
      "destination": "fe80:: ",
      "family": 10,
      "interface": "ens3",
      "metric": 1024
    }
  ],
  "system_vendor": {
    "manufacturer": "QEMU",
    "product_name": "Standard PC (i440FX + PIIX, 1996)",
    "virtual": true
  },
  "tpm_version": "none"
}

Steps to reproduce:

1. Register a new cluster

2. Generate image and deploy nodes without disks

Actual results:

Expected results:

Fail validation if the inventory is invalid.

https://github.com/openshift/assisted-service/pull/5430

Bug OCPBUGS-11550: `cluster-reader` role cannot access "k8s.ovn.org" API Group resources

View the Description View the linked PRs

Description of problem:

`cluster-reader` ClusterRole should have ["get", "list", "watch"] permissions for a number of privileged CRs, but lacks them for the API Group "k8s.ovn.org", which includes CRs such as EgressFirewalls, EgressIPs, etc.

Version-Release number of selected component (if applicable):

OCP 4.10 - 4.12 OVN

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster with OVN components, e.g. EgressFirewall
2. Check permissions of ClusterRole `cluster-reader`

Actual results:

No permissions for OVN resources

Expected results:

Get, list, and watch verb permissions for OVN resources

Additional info:

Looks like a similar bug was opened for "network-attachment-definitions" in OCPBUGS-6959 (whose closure is being contested).

https://github.com/openshift/cluster-network-operator/pull/1791

Bug OCPBUGS-11738: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2452

Bug OCPBUGS-10227: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17669: Validate Cluster Name in HostedCluster Controller

View the Description View the linked PRs

Description of problem:

The HostedCluster name is not currently validated against RFC1123.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1.
2.
3.

Actual results:

Any HostedCluster name is allowed

Expected results:

Only HostedCluster names meeting RFC1123 validation should be allowed.

Additional info:

https://github.com/openshift/hypershift/pull/2914

Story SDN-4057: Allow separate images to the specified for Hosted Control Plane components

View the Description View the linked PRs

Hypershift needs to be able to specify a different release payload for control plane components without redeploying anything in the hosted cluster.

ovnkube-node DaemonSet pods in the hosted cluster and the ovnkube-master pods that run in the control plane both use the same ovn-kubernetes image passed to the CNO.

https://github.com/openshift/hypershift/blob/fc42313fc93125799f7eba5361190043cc2f6561/control-plane-operator/controllers/hostedcontrolplane/cno/clusternetworkoperator.go#L90

We need a way to specify these images separately for ovnkube-node and ovnkube-master.

Background:
https://docs.google.com/document/d/1a3tAS_K6lQ2iicjvuIvPIK5lervXFEVQBCAXopBAJ6o/edit

Bug OCPBUGS-10414: Incorrect domain resolution by the coredns/Corefile in Vsphere IPI Clusters | openshift-vsphere-infra

View the Description View the linked PRs

Description of problem:

Coredns template implementations using incorrect Regex for resolving dot [.] character

Version-Release number of selected component (if applicable):

NA

How reproducible:

100% when you use router sharding with domains including apps

Steps to Reproduce:

1. Create an additional IngressRouter with domains names including apps. for ex: example.test-apps.<clustername>.<clusterdomain>
2. Create and configure the external LB corresponding to the additonal IngressController 
3. Configure the corporate DNS server and create records for the this additional IngressController resolving to the LB Ip setup in step 2 above.  
4. Try resolving the additional domain routes from outside cluster and within cluster, the DNS resolution works fine fro outside cluster. However within cluster all additional domains consisting apps in the domain name resolve to the default ingress VIP instead of their corresponding LB IPs configured on the corportae DNS server.

As an alternate and simple test to reroduce you can reproduce it simply by using the dig command on the cluster node with the additinal domain

for ex: 
sh-4.4# dig test.apps-test..<clustername>.<clusterdomain>

Actual results:

DNS resolved all the domains consisting of apps to the defult Ingres VIP for example: example.test-apps.<clustername>.<clusterdomain> resolves to default ingressVIP instead of their actual coresponding LB IP.

Expected results:

DNS should resolve it to coresponding LB IP configured at the DNS server.

Additional info:

The DNS solution is happenng using the CoreFile Templates used on the node. which is treating dot(.) as character instead of actual dot[.] this is a Regex configuration bug inside CoreFile used on Vspehere IPI clusters.

https://github.com/openshift/machine-config-operator/pull/3626

Bug OCPBUGS-11596: Users who can't list CatalogSources also can't initiate operator upgrades from the Subscription tab of the CSV details page

View the Description View the linked PRs

Description of problem:

We currently do some frontend logic to list and search CatalogSources for the source associated with the CSV and Subscription on the CSV details page. If we can't find the CatalogSource, we show an error message and prevent updates from the Subscription tab.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create an htpasswd idp with any user
2. Create a project admin role binding for this user
3. Install an operator in the namespace where the user has project admin permissions
4. Visit the CSV details page while logged in as the project admin user
5. View the subscriptions tab

Actual results:

An alert is shown indicating the the CatalogSource is missing, and the updates to the operator are prevented.

Expected results:

If the Subscription shows the catalog source as healthy in its status stanza, we shouldn't show an alert or prevent updates.

Additional info:

Reproducing this bug is dependent on the fix for OCPBUGS-3036 which prevents project admin users from viewing the Subscription tab at all.

https://github.com/openshift/console/pull/12717

Bug OCPBUGS-5027: CNO status reporting

View the Description View the linked PRs

Description of problem:

While investigating issue [1] we've noticed a few problems with CNO error reporting on the ClusterOperator status [2]:

that's fine, but I think there are a couple bugs to write up:
1. when a panic happens, the operator doesnt' go degraded. This can definitely be done
2. when status cannot be updated, the operator should go degraded
3. when service network and/or clusternetwork in status is missing, the operator should go Available=false.

[1] https://github.com/openshift/cluster-network-operator/pull/1669
[2] https://coreos.slack.com/archives/CB48XQ4KZ/p1671207248527519?thread_ts=1671197854.825529&cid=CB48XQ4KZ

Version-Release number of selected component (if applicable):

 4.13 and previous.

How reproducible:

 Always

Steps to Reproduce:

1. Cause a deliberate panic e.g. in the bootstrap code.

Actual results:

 Operator keeps getting restarted and is not Degraded.

Expected results:

 Operator goes Degraded.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1786

Bug OCPBUGS-17446: Wrong advertise address is used in hosted control plane etcd

View the Description View the linked PRs

Description of problem:

The advertise address configured for our hcp etcd clusters is not resolvable via DNS (ie. etcd-0.etcd-client.namespace.svc:2379). This impacts some of the etcd tooling that expects to access each member by their advertise address.

Version-Release number of selected component (if applicable):

4.14 (and earlier)

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster and wait for it to come up.
2. Exec into an etcd pod and query cluster endpoint health:
   $ oc rsh etcd-0
   $ etcdctl --cacert /etc/etcd/tls/etcd-ca/ca.crt \
             --cert /etc/etcd/tls/server/server.crt \
             --key /etc/etcd/tls/server/server.key \
             --endpoints https://localhost:2379 \
             endpoint health --cluster -w table

Actual results:

An error is returned similar to:
{"level":"warn","ts":"2023-08-07T20:40:49.890254Z","logger":"client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000378fc0/etcd-0.etcd-client.clusters-test-cluster.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-0.etcd-client.clusters-test-cluster.svc on 172.30.0.10:53: no such host\""}

Expected results:

Actual cluster health is returned:
+--------------------------------------------------------------+--------+-------------+-------+
|                           ENDPOINT                           | HEALTH |    TOOK     | ERROR |
+--------------------------------------------------------------+--------+-------------+-------+
| https://etcd-0.etcd-discovery.clusters-cewong-guest.svc:2379 |   true |  9.372168ms |       |
| https://etcd-2.etcd-discovery.clusters-cewong-guest.svc:2379 |   true | 12.269226ms |       |
| https://etcd-1.etcd-discovery.clusters-cewong-guest.svc:2379 |   true | 12.291392ms |       |
+--------------------------------------------------------------+--------+-------------+-------+

Additional info:

The etcd statefulset is created with spec.serviceName set to `etcd-discovery`. This means that pods in the statefulset, get subdomain set to `etcd-discovery` and names like etcd-0.etcd-discovery.[ns].svc are resolvable. However, the same is not true for the etcd-client service. etcd-0.etcd-client.[ns].svc is not resolvable. The fix would be to change the advertise address of each member to a resolvable name (ie. etcd-0.etcd-discvoery.[ns].svc) and adjust the server certificate to allow those names as well.

https://github.com/openshift/hypershift/pull/2884

Bug OCPBUGS-6829: while/after upgrading to OKD 4.11 2023-01-14 CoreDNS has a problem with UDP overflows

View the Description View the linked PRs

Description of problem:

While/after upgrading to 4.11 2023-01-14 CoreDNS has a problem with UDP overflows so DNS lookups are very slow and cause the ingress operator upgrade to stall. We needed to work around with force_tcp following this: https://access.redhat.com/solutions/5984291

Version-Release number of selected component (if applicable):

How reproducible:

100%, but seems to depend on the network environemnt (excact cause unknown)

Steps to Reproduce:

1. install cluster with OKD 4.11-2022-12-02 or earlier
2. initiate upgrade to OKD 4.11-2023-01-14
3. upgrade will stall after upgrading CoreDNS

Actual results:

CoreDNS logs: [ERROR] plugin/errors: 2 oauth-openshift.apps.okd-admin.muc.lv1871.de. AAAA: dns: overflowing header size

Expected results:

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/359

Task MGMT-15235: Change assisted related repos to compile with CGO_ENABLED=1

View the Description View the linked PRs

Needed for FIPS compliance

Bug OCPBUGS-10526: EgressIP doesn't work in GCP XPN cluster

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

 4.13.0-0.nightly-2023-03-17-161027

How reproducible:

Always

Steps to Reproduce:

1.  Create a GCP XPN cluster with flexy job template ipi-on-gcp/versioned-installer-xpn-ci, then 'oc descirbe node'

2. Check logs for cloud-network-config-controller pods

Actual results:


 % oc get nodes
NAME                                                          STATUS   ROLES                  AGE    VERSION
huirwang-0309d-r85mj-master-0.c.openshift-qe.internal         Ready    control-plane,master   173m   v1.26.2+06e8c46
huirwang-0309d-r85mj-master-1.c.openshift-qe.internal         Ready    control-plane,master   173m   v1.26.2+06e8c46
huirwang-0309d-r85mj-master-2.c.openshift-qe.internal         Ready    control-plane,master   173m   v1.26.2+06e8c46
huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal   Ready    worker                 162m   v1.26.2+06e8c46
huirwang-0309d-r85mj-worker-b-5txgq.c.openshift-qe.internal   Ready    worker                 162m   v1.26.2+06e8c46
 `oc describe node`, there is no related egressIP annotations 
% oc describe node huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal 
Name:               huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=n2-standard-4
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-central1
                    failure-domain.beta.kubernetes.io/zone=us-central1-a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal
                    kubernetes.io/os=linux
                    machine.openshift.io/interruptible-instance=
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=n2-standard-4
                    node.openshift.io/os_id=rhcos
                    topology.gke.io/zone=us-central1-a
                    topology.kubernetes.io/region=us-central1
                    topology.kubernetes.io/zone=us-central1-a
Annotations:        csi.volume.kubernetes.io/nodeid:
                      {"pd.csi.storage.gke.io":"projects/openshift-qe/zones/us-central1-a/instances/huirwang-0309d-r85mj-worker-a-wsrls"}
                    k8s.ovn.org/host-addresses: ["10.0.32.117"]
                    k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"shared","interface-id":"br-ex_huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal","mac-address":"42:01:0a:00:...
                    k8s.ovn.org/node-chassis-id: 7fb1870c-4315-4dcb-910c-0f45c71ad6d3
                    k8s.ovn.org/node-gateway-router-lrp-ifaddr: {"ipv4":"100.64.0.5/16"}
                    k8s.ovn.org/node-mgmt-port-mac-address: 16:52:e3:8c:13:e2
                    k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.32.117/32"}
                    k8s.ovn.org/node-subnets: {"default":["10.131.0.0/23"]}
                    machine.openshift.io/machine: openshift-machine-api/huirwang-0309d-r85mj-worker-a-wsrls
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true


 % oc logs cloud-network-config-controller-5cd96d477d-2kmc9  -n openshift-cloud-network-config-controller  
W0320 03:00:08.981493       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0320 03:00:08.982280       1 leaderelection.go:248] attempting to acquire leader lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock...
E0320 03:00:38.982868       1 leaderelection.go:330] error retrieving resource lock openshift-cloud-network-config-controller/cloud-network-config-controller-lock: Get "https://api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cloud-network-config-controller/configmaps/cloud-network-config-controller-lock": dial tcp: lookup api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com: i/o timeout
E0320 03:01:23.863454       1 leaderelection.go:330] error retrieving resource lock openshift-cloud-network-config-controller/cloud-network-config-controller-lock: Get "https://api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cloud-network-config-controller/configmaps/cloud-network-config-controller-lock": dial tcp: lookup api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: read udp 10.129.0.14:52109->172.30.0.10:53: read: connection refused
I0320 03:02:19.249359       1 leaderelection.go:258] successfully acquired lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock
I0320 03:02:19.250662       1 controller.go:88] Starting node controller
I0320 03:02:19.250681       1 controller.go:91] Waiting for informer caches to sync for node workqueue
I0320 03:02:19.250693       1 controller.go:88] Starting secret controller
I0320 03:02:19.250703       1 controller.go:91] Waiting for informer caches to sync for secret workqueue
I0320 03:02:19.250709       1 controller.go:88] Starting cloud-private-ip-config controller
I0320 03:02:19.250715       1 controller.go:91] Waiting for informer caches to sync for cloud-private-ip-config workqueue
I0320 03:02:19.258642       1 controller.go:182] Assigning key: huirwang-0309d-r85mj-master-2.c.openshift-qe.internal to node workqueue
I0320 03:02:19.258671       1 controller.go:182] Assigning key: huirwang-0309d-r85mj-master-1.c.openshift-qe.internal to node workqueue
I0320 03:02:19.258682       1 controller.go:182] Assigning key: huirwang-0309d-r85mj-master-0.c.openshift-qe.internal to node workqueue
I0320 03:02:19.351258       1 controller.go:96] Starting node workers
I0320 03:02:19.351303       1 controller.go:102] Started node workers
I0320 03:02:19.351298       1 controller.go:96] Starting secret workers
I0320 03:02:19.351331       1 controller.go:102] Started secret workers
I0320 03:02:19.351265       1 controller.go:96] Starting cloud-private-ip-config workers
I0320 03:02:19.351508       1 controller.go:102] Started cloud-private-ip-config workers
E0320 03:02:19.589704       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-1.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-1.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.615551       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-0.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-0.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.644628       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-2.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-2.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.774047       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-0.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-0.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.783309       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-1.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-1.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.816430       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-2.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-2.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue

Expected results:

EgressIP should work

Additional info:

It can be reproduced in  4.12 as well, not regression issue.

https://github.com/openshift/cloud-network-config-controller/pull/100

Bug OCPBUGS-11531: Bump documentationBaseURL to 4.14

View the Description View the linked PRs

Description of problem:

documentationBaseURL is still linking to 4.13

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-05-183601

How reproducible:

Always

Steps to Reproduce:

1. get documentationBaseURL in cm/console-config
$ oc get cm console-config -n openshift-console -o yaml | grep documentationBaseURL
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-04-05-183601   True        False         68m     Cluster version is 4.14.0-0.nightly-2023-04-05-183601
2.
3.

Actual results:

documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/

Expected results:

documentationBaseURL should be  https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/

Additional info:

https://github.com/openshift/console-operator/pull/750

Bug OCPBUGS-11971: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13016

Bug OCPBUGS-15823: Adjust CSI rpc call timeouts from Sidecar for AWS and GCP-PD driver

View the Description View the linked PRs

We should adjust CSI RPC call timeout from sidecars to CSI driver. We seem to be using default values which are just too short and hence can cause unintended side-effects.

Bug OCPBUGS-15497: Can't use git lfs in BuildConfig git source with strategy Docker

View the Description View the linked PRs

I am using a BuildConfig with git source and the Docker strategy. The git repo contains a large zip file via LFS and that zip file is not getting downloaded. Instead just the ascii metadata is getting downloaded. I've created a simple reproducer (https://github.com/selrahal/buildconfig-git-lfs) on my personal github. If you clone the repo

git clone git@github.com:selrahal/buildconfig-git-lfs.git

and apply the bc.yaml file with

oc apply -f bc.yaml

Then start the build with

oc start-build test-git-lfs

You will see the build fails at the unzip step in the docker file

STEP 3/7: RUN unzip migrationtoolkit-mta-cli-5.3.0-offline.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.

I've attached the full build logs to this issue.

https://github.com/openshift/builder/pull/350

Bug OCPBUGS-7178: User telemetry is broken (inaccurate) due to the fact that page titles are not unique.

View the Description View the linked PRs

Description of problem:

Pages should have unique page titles, so that we can gather accurate user telemetry data via segment. The page title should differ based on the selected tab.

In order to do proper analysis, branding should not be included in the page title.

Currently the following pages have this title "Red Hat OpenShift Dedicated" (or the respective brand name):
Dev perspective:

BuildConfigs
Pipelines>Pipelines
Pipelines>Repositories
Helm>Helm Releases
Helm>Repositories
Install Helm Chart
Admin perspective:
Pipelines>Pipelines
Pipelines>PipelineRuns
Pipelines>PipelineResources
Pipelines>Repositories
Tasks>Tasks
Tasks>TaskRuns
Tasks>ClusterTasks

The following tabs all have the same page title Observe · Red Hat OpenShift Dedicated:
Dev perspective:

Observe>Dashboard
Observe>Alerts
Observe>Metrics

The following tabs all have the same page title Project Details · Red Hat OpenShift Dedicated:
Dev perspective:

Project>Overview
Project>Details
Project>Project access

All the user preferences tabs have the same page title : User Preferences · Red Hat OpenShift Dedicated

User Preferences>General
User Preferences>Language
User Preferences>Notifications
User Preferences>Applications

The Topology page in the Dev Perspective and the workloads tab of the Project Details/Workloads tab both share the same title: Topology · Red Hat OpenShift Dedicated

The following tabs on the Admin Project page all share the same title. Unsure if we can handle this since it is including the namespace name: sdoyle-dev · Details · Red Hat OpenShift Dedicated. If not, we can drop til 4.14.

Project>Project details>Overview
Project>Project details>Details
Project>Project details>YAML
Project>Project details>RoleBindings

https://github.com/openshift/console/pull/12591

Task MGMT-14425: Improving the severity counters in the events API

View the Description View the linked PRs

Description of the problem:

As discussed on the Github PR, we want to align the severities filter with the previous implementation. Therefore the severity counts in the response headers should be:

the total counts of events with the respective severity across all possible pages
with regards to the applied filters (hosts, cluster-level, message,...)
but they should not take the severities filter itself into account.

In addition to that, we need a new response header with a total number of events with all current filters (severities included) applied.

https://github.com/openshift/assisted-service/pull/5186

Bug OCPBUGS-10071: Update 4.14 ose-vsphere-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-vsphere/pull/12

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-vsphere/pull/12

Bug OCPBUGS-10293: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/588

Bug OCPBUGS-16128: 4.13/4.14 MCDs do not work with FIPS enabled golang builders

View the Description View the linked PRs

Updated Description:

The MCD, during a node lifespan, can go through multiple iterations of RHEL8 and RHEL9. This was not a problem until we turned on fips enabled golang with dynamic linking. This requires the MCD binary running (either in container or on host) to always match the host built version. As an additional complication, we have an early boot process (machine-config-daemon-pull/firstboot.service) that can be different from the rest of the cluster node versions (bootimage version is not updated) as well as the fact that we chroot (dynamically go from rhel8 to rhel9) in the container, so we need a better process to ensure the right binary is always used.

Current testing of this flow in https://github.com/openshift/machine-config-operator/pull/3799

Description of problem:

MCO CI started failing this week, and 4.14 nightlies have also made it into 4.14 nightlies. See also: https://issues.redhat.com/browse/TRT-1143. The failure manifests as a warning in the MCO. Looking at a MCD log, you will see a failure like:

W0712 08:52:15.475268    7971 daemon.go:1089] Got an error from auxiliary tools: kubelet health check has failed 3 times: Get "http://localhost:10248/healthz": dial tcp: lookup localhost: device or resource busy

The root cause so far seems to be that 4.14 switched from a regular 1.20.3 golang to 1.20.5 with FIPS and dynamic linking in the builder, causing the failures to begin. Most functionality is not broken, but the daemon subroutine that does the kubelet health check appears to be unable to reach the localhost endpoint

One possibility is that the rhel8-daemon chroot'ing into the rhel9-host and running these commands is causing the issue. Regardless, there are a bunch of issues with rhel8/rhel9 duality in the MCD that we would need to address in 4.13/4.14

Also tangentially related: https://issues.redhat.com/browse/MCO-663

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3799

Bug OCPBUGS-8004: oc image mirror includes wrong digest when rewriting lists

View the Description View the linked PRs

Description of problem:

When using oc image mirror, oc creates a new manifest lists when filtering platforms. When this happens, oc still tries to push and tag the original manifest list.

Version-Release number of selected component (if applicable):

4.8

How reproducible:

Consistent

Steps to Reproduce:

1. Run oc image mirror --filter-by-os 'linux/arm' docker.io/library/busybox@sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c yourregistry.io/busybox:target
2. Check the plan, see that the original manifest digest is being used for the tag

Actual results:

jammy:Downloads$ oc image mirror --filter-by-os 'linux/arm' docker.io/library/busybox@sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c sparse-registry1.fyre.ibm.com/jammy/busybox:target
sparse-registry1.fyre.ibm.com/
  jammy/busybox
    blobs:
      docker.io/library/busybox sha256:1d57ab16f681953c15d7485bf3ee79a49c2838e5f9394c43e20e9accbb1a2b20 1.436KiB
      docker.io/library/busybox sha256:99ee43e96ff50e90c5753954d7ce2dfdbd7eb9711c1cd96de56d429cb628e343 1.436KiB
      docker.io/library/busybox sha256:a22ab831b2b2565a624635af04e5f76b4554d9c84727bf7e6bc83306b3b339a9 1.436KiB
      docker.io/library/busybox sha256:abaa813f94fdeebd3b8e6aeea861ab474a5c4724d16f1158755ff1e3a4fde8b0 1.438KiB
      docker.io/library/busybox sha256:b203a35cab50f0416dfdb1b2260f83761cb82197544b9b7a2111eaa9c755dbe7 937.1KiB
      docker.io/library/busybox sha256:46758452d3eef8cacb188405495d52d265f0c3a7580dfec51cb627c04c7bafc4 1.604MiB
      docker.io/library/busybox sha256:4c45e4bb3be9dbdfb27c09ac23c050b9e6eb4c16868287c8c31d34814008df80 1.847MiB
      docker.io/library/busybox sha256:f78e6840ded1aafb6c9f265f52c2fc7c0a990813ccf96702df84a7dcdbe48bea 1.908MiB
    manifests:
      sha256:4ff685e2bcafdab0d2a9b15cbfd9d28f5dfe69af97e3bb1987ed483b0abf5a99
      sha256:5e42fbc46b177f10319e8937dd39702e7891ce6d8a42d60c1b4f433f94200bd2
      sha256:7128d7c7704fb628f1cedf161c01d929d3d831f2a012780b8191dae49f79a5fc
      sha256:77ed5ebc3d9d48581e8afcb75b4974978321bd74f018613483570fcd61a15de8
      sha256:dde8e930c7b6a490f728e66292bc9bce42efc9bbb5278bae40e4f30f6e00fe8c
      sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c -> target

Expected results:

jammy:~$ oc-devel image mirror --filter-by-os 'linux/arm' docker.io/library/busybox@sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c sparse-registry1.fyre.ibm.com/jammy/busybox:target
sparse-registry1.fyre.ibm.com/
  jammy/busybox
    blobs:
      docker.io/library/busybox sha256:1d57ab16f681953c15d7485bf3ee79a49c2838e5f9394c43e20e9accbb1a2b20 1.436KiB
      docker.io/library/busybox sha256:99ee43e96ff50e90c5753954d7ce2dfdbd7eb9711c1cd96de56d429cb628e343 1.436KiB
      docker.io/library/busybox sha256:a22ab831b2b2565a624635af04e5f76b4554d9c84727bf7e6bc83306b3b339a9 1.436KiB
      docker.io/library/busybox sha256:abaa813f94fdeebd3b8e6aeea861ab474a5c4724d16f1158755ff1e3a4fde8b0 1.438KiB
      docker.io/library/busybox sha256:b203a35cab50f0416dfdb1b2260f83761cb82197544b9b7a2111eaa9c755dbe7 937.1KiB
      docker.io/library/busybox sha256:46758452d3eef8cacb188405495d52d265f0c3a7580dfec51cb627c04c7bafc4 1.604MiB
      docker.io/library/busybox sha256:4c45e4bb3be9dbdfb27c09ac23c050b9e6eb4c16868287c8c31d34814008df80 1.847MiB
      docker.io/library/busybox sha256:f78e6840ded1aafb6c9f265f52c2fc7c0a990813ccf96702df84a7dcdbe48bea 1.908MiB
    manifests:
      sha256:4ff685e2bcafdab0d2a9b15cbfd9d28f5dfe69af97e3bb1987ed483b0abf5a99
      sha256:5e42fbc46b177f10319e8937dd39702e7891ce6d8a42d60c1b4f433f94200bd2
      sha256:7128d7c7704fb628f1cedf161c01d929d3d831f2a012780b8191dae49f79a5fc
      sha256:77ed5ebc3d9d48581e8afcb75b4974978321bd74f018613483570fcd61a15de8
      sha256:dde8e930c7b6a490f728e66292bc9bce42efc9bbb5278bae40e4f30f6e00fe8c
      sha256:7128d7c7704fb628f1cedf161c01d929d3d831f2a012780b8191dae49f79a5fc -> target

Additional info:

https://github.com/openshift/oc/pull/1335

Bug OCPBUGS-10673: [alibabacloud] IPI install got bootstrap failure and without any node ready, due to enforced EIP bandwidth 5 Mbit/s

View the Description View the linked PRs

Description of problem:

The IPI installation in some regions got bootstrap failure, and without any node available/ready.

Version-Release number of selected component (if applicable):

12-22 16:22:27.970  ./openshift-install 4.12.0-0.nightly-2022-12-21-202045
12-22 16:22:27.970  built from commit 3f9c38a5717c638f952df82349c45c7d6964fcd9
12-22 16:22:27.970  release image registry.ci.openshift.org/ocp/release@sha256:2d910488f25e2638b6d61cda2fb2ca5de06eee5882c0b77e6ed08aa7fe680270
12-22 16:22:27.971  release architecture amd64

How reproducible:

Always

Steps to Reproduce:

1. try the IPI installation in the problem regions (so far tried and failed with ap-southeast-2, ap-south-1, eu-west-1, ap-southeast-6, ap-southeast-3, ap-southeast-5, eu-central-1, cn-shanghai, cn-hangzhou and cn-beijing)

Actual results:

Bootstrap failed to complete

Expected results:

Installation in those regions should succeed.

Additional info:

FYI the QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/166672/

No any node available/ready, and no any operator available.
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          30m     Unable to apply 4.12.0-0.nightly-2022-12-21-202045: an unknown error has occurred: MultipleErrors
$ oc get nodes
No resources found
$ oc get machines -n openshift-machine-api -o wide
NAME                         PHASE   TYPE   REGION   ZONE   AGE   NODE   PROVIDERID   STATE
jiwei-1222f-v729x-master-0                                  30m                       
jiwei-1222f-v729x-master-1                                  30m                       
jiwei-1222f-v729x-master-2                                  30m                       
$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication
baremetal
cloud-controller-manager                                                                          
cloud-credential                                                                                  
cluster-autoscaler                                                                                
config-operator                                                                                   
console                                                                                           
control-plane-machine-set                                                                         
csi-snapshot-controller                                                                           
dns                                                                                               
etcd                                                                                              
image-registry                                                                                    
ingress                                                                                           
insights                                                                                          
kube-apiserver                                                                                    
kube-controller-manager                                                                           
kube-scheduler                                                                                    
kube-storage-version-migrator                                                                     
machine-api                                                                                       
machine-approver                                                                                  
machine-config                                                                                    
marketplace                                                                                       
monitoring                                                                                        
network                                                                                           
node-tuning                                                                                       
openshift-apiserver                                                                               
openshift-controller-manager                                                                      
openshift-samples                                                                                 
operator-lifecycle-manager                                                                        
operator-lifecycle-manager-catalog                                                                
operator-lifecycle-manager-packageserver
service-ca
storage
$

Mater nodes don't run for example kubelet and crio services.
[core@jiwei-1222f-v729x-master-0 ~]$ sudo crictl ps
FATA[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory" 
[core@jiwei-1222f-v729x-master-0 ~]$ 

The machine-config-daemon firstboot tells "failed to update OS".
[jiwei@jiwei log-bundle-20221222085846]$ grep -Ei 'error|failed' control-plane/10.0.187.123/journals/journal.log 
Dec 22 16:24:16 localhost kernel: GPT: Use GNU Parted to correct GPT errors.
Dec 22 16:24:16 localhost kernel: GPT: Use GNU Parted to correct GPT errors.
Dec 22 16:24:18 localhost ignition[867]: failed to fetch config: resource requires networking
Dec 22 16:24:18 localhost ignition[891]: GET error: Get "http://100.100.100.200/latest/user-data": dial tcp 100.100.100.200:80: connect: network is unreachable
Dec 22 16:24:18 localhost ignition[891]: GET error: Get "http://100.100.100.200/latest/user-data": dial tcp 100.100.100.200:80: connect: network is unreachable
Dec 22 16:24:19 localhost.localdomain NetworkManager[919]: <info>  [1671726259.0329] hostname: hostname: hostnamed not used as proxy creation failed with: Could not connect: No such file or directory
Dec 22 16:24:19 localhost.localdomain NetworkManager[919]: <warn>  [1671726259.0464] sleep-monitor-sd: failed to acquire D-Bus proxy: Could not connect: No such file or directory
Dec 22 16:24:19 localhost.localdomain ignition[891]: GET error: Get "https://api-int.jiwei-1222f.alicloud-qe.devcluster.openshift.com:22623/config/master": dial tcp 10.0.187.120:22623: connect: connection refused
...repeated logs omitted...
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 ovs-ctl[1888]: 2022-12-22T16:27:46Z|00001|dns_resolve|WARN|Failed to read /etc/resolv.conf: No such file or directory
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 ovs-vswitchd[1888]: ovs|00001|dns_resolve|WARN|Failed to read /etc/resolv.conf: No such file or directory
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 dbus-daemon[1669]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.resolve1.service': Unit dbus-org.freedesktop.resolve1.service not found.
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 nm-dispatcher[1924]: Error: Device '' not found.
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 nm-dispatcher[1937]: Error: Device '' not found.
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 nm-dispatcher[2037]: Error: Device '' not found.
Dec 22 08:35:32 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: Warning: failed, retrying in 1s ... (1/2)I1222 08:35:32.477770    2181 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-extensions/os-extensions-content-910221290 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:259d8c6b9ec714d53f0275db9f2962769f703d4d395afb9d902e22cfe96021b0
Dec 22 08:56:06 jiwei-1222f-v729x-master-0 rpm-ostree[2288]: Txn Rebase on /org/projectatomic/rpmostree1/rhcos failed: remote error: Get "https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/blobs/sha256:27f262e70d98996165748f4ab50248671d4a4f97eb67465cd46e1de2d6bd24d0": net/http: TLS handshake timeout
Dec 22 08:56:06 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: W1222 08:56:06.785425    2181 firstboot_complete_machineconfig.go:46] error: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:411e6e3be017538859cfbd7b5cd57fc87e5fee58f15df19ed3ec11044ebca511 : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:411e6e3be017538859cfbd7b5cd57fc87e5fee58f15df19ed3ec11044ebca511: Warning: The unit file, source configuration file or drop-ins of rpm-ostreed.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Dec 22 08:56:06 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: error: remote error: Get "https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/blobs/sha256:27f262e70d98996165748f4ab50248671d4a4f97eb67465cd46e1de2d6bd24d0": net/http: TLS handshake timeout
Dec 22 08:57:31 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: Warning: failed, retrying in 1s ... (1/2)I1222 08:57:31.244684    2181 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-extensions/os-extensions-content-4021566291 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:259d8c6b9ec714d53f0275db9f2962769f703d4d395afb9d902e22cfe96021b0
Dec 22 08:59:20 jiwei-1222f-v729x-master-0 systemd[2353]: /usr/lib/systemd/user/podman-kube@.service:10: Failed to parse service restart specifier, ignoring: never
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 podman[2437]: Error: open default: no such file or directory
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 podman[2450]: Error: failed to start API service: accept unixgram @00026: accept4: operation not supported
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 systemd[2353]: podman-kube@default.service: Failed with result 'exit-code'.
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 systemd[2353]: Failed to start A template for running K8s workloads via podman-play-kube.
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 systemd[2353]: podman.service: Failed with result 'exit-code'.
[jiwei@jiwei log-bundle-20221222085846]$

https://github.com/openshift/installer/pull/7011

Bug OCPBUGS-17196: NetworkAttachedDefenition bug - clone

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13053

Bug OCPBUGS-12896: Route Checkbox getting checked even if it is unchecked during editing the Serverless Function form

View the Description View the linked PRs

Description of problem:

Route Checkbox getting checked even if it is unchecked during editing the Serverless Function form.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install Serverless Operator and Create KN Serving Instance
2. Create a Serverless Function and open the Edit form of the KSVC
3. Uncheck the Create Route option and save.
4. Reopen the Edit form again.

Actual results:

The checkbox still shows checked.

Expected results:

It should retain the previous condtion.

Additional info:

https://github.com/openshift/console/pull/12834

Bug OCPBUGS-15489: No datapoints found for Dashboards default API performance V2 option

View the Description View the linked PRs

Description of problem:

Opened the web-console and navigate to Dashboards, the default API performance V2 option selected, shows No datapoints found for each sub-pages.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-27-000502

How reproducible:

always

Steps to Reproduce:

1. Open the web-console and navigate to Dashboards, the default API performance V2 option selected, shows No datapoints found for each sub-pages.

Actual results:

No datapoints found for Dashboards default API performance V2 option and shows blank page.

Expected results:

Should show diagrams for Dashboards default API performance V2 option

Additional info:
This blocked bug https://issues.redhat.com/browse/OCPBUGS-14940, when I filed the bug https://issues.redhat.com/browse/OCPBUGS-14940, not seen this.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1521

Bug OCPBUGS-18992: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/303

Bug OCPBUGS-10003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-13219: OVN image pre-puller pod uses `imagePullPolicy: Always` and blocks upgrade when there is no registry

View the Description View the linked PRs

Description of problem:

OVN image pre-puller blocks upgrades in environments where the images have already been pulled but the registry server is not available.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster in a disconnected environment.

2. Manually pre-pull all the images required for the upgrade. For example, get the list of images needed:

# oc adm release info quay.io/openshift-release-dev/ocp-release:4.12.10-x86_64 -o json > release-info.json

And then pull them in all the nodes of the cluster:

# crio pull $(cat release-info.json | jq -r '.references.spec.tags[].from.name')

3. Stop or somehow make the registry unreachable, then trigger the upgrade.

Actual results:

The upgrade blocks with the following error reported by the cluster version operator:

# oc get clusterversion; oc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.10   True        True          62m     Working towards 4.12.11: 483 of 830 done (58% complete), waiting on network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
network   4.12.10   True        True          False      133m    DaemonSet "/openshift-ovn-kubernetes/ovnkube-upgrades-prepuller" is not available (awaiting 1 nodes)

The reason for that is that the `ovnkube-upgrades-prepuller-...` pod uses `imagePullPolicy: Always` and that fails if there is no registry, even if the image has already been pulled:

# oc get pods -n openshift-ovn-kubernetes ovnkube-upgrades-prepuller-5s2cn
NAME                               READY   STATUS             RESTARTS   AGE
ovnkube-upgrades-prepuller-5s2cn   0/1     ImagePullBackOff   0          44m

# oc get events -n openshift-ovn-kubernetes --field-selector involvedObject.kind=Pod,involvedObject.name=ovnkube-upgrades-prepuller-5s2cn,reason=Failed
LAST SEEN   TYPE      REASON   OBJECT                                 MESSAGE
43m         Warning   Failed   pod/ovnkube-upgrades-prepuller-5s2cn   Failed to pull image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071": rpc error: code = Unknown desc = (Mirrors also failed: [server.home.arpa:8443/openshift/release@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071: pinging container registry server.home.arpa:8443: Get "https://server.home.arpa:8443/v2/": dial tcp 192.168.100.1:8443: connect: connection refused]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 192.168.100.1:53: server misbehaving
43m         Warning   Failed   pod/ovnkube-upgrades-prepuller-5s2cn   Error: ErrImagePull
43m         Warning   Failed   pod/ovnkube-upgrades-prepuller-5s2cn   Error: ImagePullBackOff

# oc get pod -n openshift-ovn-kubernetes ovnkube-upgrades-prepuller-5s2cn -o json | jq -r '.spec.containers[] | .imagePullPolicy + " " + .image'
Always quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071

Expected results:

The upgrade should not block.

Additional info:

We detected this in a situation where we want to be able to perform upgrades in a disconnected environment and without the registry server running. See ~~MGMT-13733~~ for details.

https://github.com/openshift/cluster-network-operator/pull/1803

Bug OCPBUGS-13762: Performance issues when using registries.conf

View the Description View the linked PRs

Description of problem:

When using the --oci-registries-config flag explicitly or getting registries.conf from the environment, execution time when processing related images via the addRelatedImageToMapping function serially can drastically impact performance depending on the number of images involved. In my testing of a large catalog, there were approximately 470 images and this took approximately 13 minutes. This processing occurs prior to letting the underlying oc mirror code plan out the images that should be mirrored. Actual planning time is consistent at around 1 min 30 seconds.

The cause of this is due to the need to determine mirrors for each one of the related images based on the configuration provided in registries.conf, and this action is done serially in a loop. If I introduce parallel execution, the processing time for addRelatedImageToMapping is reduced from ~13 min to ~14 seconds.

Version-Release number of selected component (if applicable): 4.13

How reproducible: always

Steps to Reproduce:

Note: the catalog used here is publicly available, but the related images are not so this may be difficult to reproduce.

Copy catalog image to disk in OCI layout

mkdir -p /tmp/oci/registriesconf/performance
skopeo --override-os linux copy docker://quay.io/jhunkins/ocp13762:v1 oci:///tmp/oci/registriesconf/performance --format v2s2

Create a ~/.config/containers/registries.conf file with this content

[[registry]]
location = "icr.io/cpopen"
insecure = false
blocked = false
mirror-by-digest-only = true
prefix = ""
[[registry.mirror]]
  location = "quay.io/jhunkins"
  insecure = false

Create a ISC [path to isc]/isc-registriesconf-performance.yaml

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror: 
  operators: 
  - catalog: oci:///tmp/oci/registriesconf/performance
    full: true
    targetTag: latest
    targetCatalog: ibm-catalog
storageConfig: 
  local: 
    path: /tmp/oc-mirror-temp

run oc mirror with OCI flags (running with dry run is sufficient to replicate this issue)

oc mirror --config [path to isc]/isc-registriesconf-performance.yaml --include-local-oci-catalogs --oci-insecure-signature-policy --dest-use-http docker://localhost:5000/oci --skip-cleanup --dry-run

Actual results:

roughly 13 minutes elapses before the planning phase begins

Expected results:

much faster execution before the planning phase begins

Additional info:

I intend to create a PR which adds parallel execution around the addRelatedImageToMapping function

https://github.com/openshift/oc-mirror/pull/638

Bug OCPBUGS-16717: Unable to set BMH credentials, Apply button greyed out.

View the Description View the linked PRs

Description of problem:

A cluster installed via ACM and nodes are showing as Unmanaged. When trying to set the BMH credential via console, the Apply button is not clickable(greyed out).

Version-Release number of selected component (if applicable): 4.11

How reproducible: Always

Steps to Reproduce:
1. Install a cluster via ACM
2. Setting a BMH credential on console
3.

Actual results:

The Apply button on the console screen is greyed out, unclickable.

Expected results:

Should be able to configure BHM credential

Additional info:{code:none}

https://github.com/openshift/console/pull/13075

Bug ACM-4277: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2470

Task MGMT-15073: Delete service generated manifests during "deleteClusterFiles" operation

View the Description View the linked PRs

Based on a suggestion from Omer

"Now that we can tell apart user manifests from our own service manifests, I think it's best that this function deletes the service manifests.

https://github.com/openshift/assisted-service/blob/master/internal/cluster/cluster.go#L1418

The original motivation for this skip was that we didn't want to destroy user uploaded manifests when the user resets their installation, but preserving the service generated ones is useless, and was just an unfortunate side-effect of protecting the user manifests. The service ones would anyway get regenerated when the user hits install again, there's no point in protecting them. If anything, clearing those manifests I think this might solve some edge case bugs I can think of"

We will need to wait for https://github.com/openshift/assisted-service/pull/5278/files to be merged before starting this as this depends on changes made in this PR

https://github.com/openshift/assisted-service/pull/5338

Bug OCPBUGS-15947: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/266

Story OCPNODE-1495: [MCO] Set CGroups version via `01-master-kubelet/01-worker-kubelet` mcs

View the Description View the linked PRs

Instead of creating a new MC 97-{master/worker}-generated-kubelet to set the default cgroups version, it is better to set it via a template.

Slack Ref: https://redhat-internal.slack.com/archives/CK1AE4ZCK/p1676313403836509?thread_ts=1676313208.921679&cid=CK1AE4ZCK

https://github.com/openshift/machine-config-operator/pull/3563

Bug OCPBUGS-9907: Alerts display incorrect source when adding external alert sources

View the Description View the linked PRs

Description of problem:

The alerts table displays incorrect values (Prometheus) in the source column

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Install LokiOperator, Cluster Logging operator and enable the logging view plugin with the alerts feature toggle enabled
2. Add a log-based alert
3. Check the alerts table source in the observe -> alerts section

Actual results:

Incorrect "Prometheus" value is displayed for non log-based alerts

Expected results:

"Platform" or "User" value is displayed for non log-based alerts

Additional info:

https://github.com/openshift/console/pull/12632

Bug OCPBUGS-12153: OLM CatalogSources in guest cluster cannot pull images if pre-GA

View the Description View the linked PRs

Description of problem:

When HyperShift HostedClusters are created with "OLMCatalogPlacement" set to "guest" and if the desired release is pre-GA, the CatalogSource pods cannot pull their images due to using unreleased images.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Common

Steps to Reproduce:

1. Create a HyperShift 4.13 HostedCluster with spec.OLMCatalogPlacement = "guest"
2. See the openshift-marketplace/community-operator-* pods in the guest cluster in ImagePullBackoff

Actual results:

openshift-marketplace/community-operator-* pods in the guest cluster in ImagePullBackoff

Expected results:

All CatalogSource pods to be running and to use n-1 images if pre-GA

Additional info:

https://github.com/openshift/hypershift/pull/2454

Bug OCPBUGS-16536: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/213

Bug OCPBUGS-19662: Fix MCO Image Registry ConfigMap updating

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18800~~. The following is the description of the original issue:
—
Description of problem:

currently the mco updates its image registry certificate configmap by deleting and re-creating it on each MCO sync. Instead, we should be patching it

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3937

Bug OCPBUGS-8203: Passwords printed in log messages

View the Description View the linked PRs

When processing an install-config containing either BMC passwords in the baremetal platform config, or a vSphere password in the vsphere platform config, we log a warning message to say that the value is ignored.

This warning currently includes the value in the password field, which may be inconvenient for users reusing IPI configs who don't want their password values to appear in logs.

https://github.com/openshift/installer/pull/6922

Bug OCPBUGS-15331: Failing to reconcile kube-apiserver advertisementAddress

View the Description View the linked PRs

Description of problem:

Once the https://issues.redhat.com/browse/OCPBUGS-14783 is fixed we found another issue which prevents the KubeapiServer's init-container to finish successful. The init-container tries to reach the Kubeapiserver in a ipv4 based url and that's not up, it should go to the IPv6 one.

Slack thread: https://redhat-internal.slack.com/archives/C058TF9K37Z/p1687445369492779
Log traces are in a image in the thread

https://github.com/openshift/hypershift/pull/2779

Bug OCPBUGS-17964: hypershift ovn-k control-plane vs worker config missmatch

View the Description View the linked PRs

Description of problem:

Hypershift kubevirt provider hosted cluster cannot start up after activating ovn-k interconnect at hosted cluster.

The issue is that ovn-k configurations missmatch:

The cluster manager config in the hosted cluster namespace:

  ovnkube.conf: |-
    [default]
    mtu="8801"
    cluster-subnets="10.132.0.0/14/23"
    encap-port="9880"
    enable-lflow-cache=true
    lflow-cache-limit-kb=1048576

    [kubernetes]
    service-cidrs="172.31.0.0/16"
    ovn-config-namespace="openshift-ovn-kubernetes"
    cacert="/hosted-ca/ca.crt"
    apiserver="https://kube-apiserver:6443"
    host-network-namespace="openshift-host-network"
    platform-type="KubeVirt"
    dns-service-namespace="openshift-dns"
    dns-service-name="dns-default"

    [ovnkubernetesfeature]
    enable-egress-ip=true
    enable-egress-firewall=true
    enable-egress-qos=true
    enable-egress-service=true
    egressip-node-healthcheck-port=9107

    [gateway]
    mode=shared
    nodeport=true
    v4-join-subnet="100.65.0.0/16"

    [masterha]
    election-lease-duration=137
    election-renew-deadline=107
    election-retry-period=26

The controller config in the hosted cluster
  ovnkube.conf: |-
    [default]
    mtu="8801"
    cluster-subnets="10.132.0.0/14/23"
    encap-port="9880"
    enable-lflow-cache=true
    lflow-cache-limit-kb=1048576
    enable-udp-aggregation=true

    [kubernetes]
    service-cidrs="172.31.0.0/16"
    ovn-config-namespace="openshift-ovn-kubernetes"
    apiserver="https://a392ee248c42a4ffca67f2909823466e-18e866c0f5fb5880.elb.us-west-2.amazonaws.com:6443"
    host-network-namespace="openshift-host-network"
    platform-type="KubeVirt"
    healthz-bind-address="0.0.0.0:10256"
    dns-service-namespace="openshift-dns"
    dns-service-name="dns-default"

    [ovnkubernetesfeature]
    enable-egress-ip=true
    enable-egress-firewall=true
    enable-egress-qos=true
    enable-egress-service=true
    egressip-node-healthcheck-port=9107
    enable-multi-network=true

    [gateway]
    mode=shared
    nodeport=true

    [masterha]
    election-lease-duration=137
    election-renew-deadline=107
    election-retry-period=26

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Deploy latest 4.14 ocp clustrer
2. Install latest hypershift operator
3. Deploy hosted cluster with latest 4.14 ocp release image

Actual results:

Hosted cluster get stuck at 

network                                    4.14.0-0.ci-2023-08-20-221659   True        True          False      3h53m   DaemonSet "/openshift-multus/network-metrics-daemon" is waiting for other operators to become ready...

Expected results:

All the hosted clusters operators should be ok

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1962

Bug OCPBUGS-19636: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7521

Bug OCPBUGS-19738: Remove warning about CPUPartitioning

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19699~~. The following is the description of the original issue:
—
Description of problem:


When CPUPartitioning is not set in install-config.yaml a warning message is still generated

WARNING CPUPartitioning:  is ignored

This warning is both incorrect, since the check is against "None" and the the value is an empty string when not set, and also no longer relevant now that https://issues.redhat.com//browse/OCPBUGS-18876 has been fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create an install config with CPUPartitioning not set
2. Run "openshift-install agent create image --dir cluster-manifests/ --log-level debug"

Actual results:

See the output "WARNING CPUPartitioning:  is ignored"

Expected results:

No warning

Additional info:

https://github.com/openshift/installer/pull/7529

Bug OCPBUGS-14082: TestNewApp unit tests in oc are failing

View the Description View the linked PRs

Description of problem:

Since the `registry.centos.org` is closed, all the unit tests in oc relying on this registry started failing.

Version-Release number of selected component (if applicable):

all versions

How reproducible:

trigger CI jobs and see unit tests are failing

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1430

Task MGMT-12301: Don't perform state transitions outside of the state machine

View the Description View the linked PRs

We use the state machine design pattern to have explicit clear rules for how hosts can move in and out of states depending on the things that are happening.

This makes it relatively easy to follow / understand host behavior.

We should ensure our code doesn't contain places where we force a host into a state, without going through the state machine 🍝, otherwise it beats the purpose of having a state machine

One example that personally confused me is this switch statement, which contains updates like this one , this one and this one and also this one

https://github.com/openshift/assisted-service/pull/5103

Bug OCPBUGS-13142: InstallPlan info cannot shown on Subscription tab for the user who has project admin permission

View the Description View the linked PRs

Description of problem:

After the changes of OCPBUGS-3036 and OCPBUGS-11596, the user who has project admin permision would be able to check all the subscription information on the operaotor details page. But currently the installPlan infromation will shown "None" in the page wich is incorrect

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-03-163151

How reproducible:

Always

Steps to Reproduce:

1. Configure IDP. add a user
2. Install any operator in specific namespace 
3. Assign project admin permission to the user for the same namespace
   $ oc adm policy add-role-to-user admin <username> -n <projectname>
4. Check user have enough permission to check installplan via CLI
   $ oc get clusterrole admin -o yaml | grep -C10 installplan
     - apiGroups:
       - operators.coreos.com
       resources:
       - clusterserviceversions
       - catalogsources
       - installplans
       - subscriptions
       verbs:
       - delete
     - apiGroups:
       - operators.coreos.com
       resources:
       - clusterserviceversions
       - catalogsources
       - installplans
       - subscriptions
       - operatorgroups
       verbs:
       - get
       - list
       - watch
4. Login OCP with the user, and go to InstallPlan page, user is able to check the InstallPlan list without any error
   /k8s/ns/<projectname>/operators.coreos.com~v1alpha1~InstallPlan
5. Navigate to OperatorDetails -> Subscription Tab, check if the 'InstallPlan' name could be shown on page

Actual results:

Only 'None' is shown on the InstallPlan section

Expected results:

The installplan name can be shown on the subsctiption page

Additional info:

https://github.com/openshift/console/pull/13012

Bug OCPBUGS-14125: e2e-agnostic-ovn-cmd is permanently failing due to registry.centos.org

View the Description View the linked PRs

Description of problem:

Since registry.centos.org is closed, tests relying on this registry in e2e-agnostic-ovn-cmd job are failing.

Version-Release number of selected component (if applicable):

all

How reproducible:

Trigger e2e-agnostic-ovn-cmd job

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27945

Bug OCPBUGS-14354: E2e tests: Pipelines tests should be enabled again

View the Description View the linked PRs

As part of issue - https://issues.redhat.com/browse/OCPBUGS-14352 pipeline e2e tests are disabled. Enable pipeline e2e tests again.

https://github.com/openshift/console/pull/12911

Bug OCPBUGS-11749: Private router not deployed in HCP namespace on a 4.13 mgmt cluster

View the Description View the linked PRs

Description of problem:

Cluster does not finish rolling out on a 4.13 management cluster because of pod security constraints.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.Install 4.14 hypershift operator on a recent 4.13 mgmt cluster
2.Create an AWS PublicAndPrivate hosted cluster on that hypershift cluster

Actual results:

Hosted cluster stalls rollout because the private router never gets created

Expected results:

Hosted cluster comes up successfully

Additional info:

Pod security enforcement is preventing the private router from getting created.

https://github.com/openshift/hypershift/pull/2415

Bug OCPBUGS-9435: Hard coded region references remain in installer

View the Description View the linked PRs

PRs were previously merged to add SC2S support via AWS SDK here:

However, further updates to add support for SC2S region (us-isob-east-1) and new TC2S region (us-iso-west-1) are still required.

There are still hard coded references to the old regions in the follow locations.

https://github.com/openshift/installer/pull/7184

Bug OCPBUGS-5833: Changing a PreprovisioningImage ImageURL and/or ExtraKernelParams should reboot the host

View the Description View the linked PRs

Description of problem:

Altering the ImageURL or ExtraKernelParams values in a PreprovisioningImage CR should cause the host to boot using the new image or parameters, but currently the host doesn't respond at all to changes in those fields.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-11-225449

How reproducible:

Always

Steps to Reproduce:

1. Create a BMH
2. Set preprovisioning image image URL
3. Allow host to boot
4. Change image URL or extra kernel params

Actual results:

Host does not reboot

Expected results:

Host reboots using the newly provided image or parameters

Additional info:
BMH:

- apiVersion: metal3.io/v1alpha1
  kind: BareMetalHost
  metadata:
    annotations:
      inspect.metal3.io: disabled
    creationTimestamp: "2023-01-13T16:06:12Z"
    finalizers:
    - baremetalhost.metal3.io
    generation: 4
    labels:
      infraenvs.agent-install.openshift.io: myinfraenv
    name: ostest-extraworker-0
    namespace: assisted-installer
    resourceVersion: "61077"
    uid: 444d7246-3d0a-4188-a8c4-f407ee4f741f
  spec:
    automatedCleaningMode: disabled
    bmc:
      address: redfish+http://192.168.111.1:8000/redfish/v1/Systems/6f45ba9f-251a-46f7-a7a8-10c6ca9231dd
      credentialsName: ostest-extraworker-0-bmc-secret
    bootMACAddress: 00:b2:71:b8:14:4f
    customDeploy:
      method: start_assisted_install
    online: true
  status:
    errorCount: 0
    errorMessage: ""
    goodCredentials:
      credentials:
        name: ostest-extraworker-0-bmc-secret
        namespace: assisted-installer
      credentialsVersion: "44478"
    hardwareProfile: unknown
    lastUpdated: "2023-01-13T16:06:22Z"
    operationHistory:
      deprovision:
        end: null
        start: null
      inspect:
        end: null
        start: null
      provision:
        end: null
        start: "2023-01-13T16:06:22Z"
      register:
        end: "2023-01-13T16:06:22Z"
        start: "2023-01-13T16:06:12Z"
    operationalStatus: OK
    poweredOn: false
    provisioning:
      ID: b5e8c1a9-8061-420b-8c32-bb29a8b35a0b
      bootMode: UEFI
      image:
        url: ""
      raid:
        hardwareRAIDVolumes: null
        softwareRAIDVolumes: []
      rootDeviceHints:
        deviceName: /dev/sda
      state: provisioning
    triedCredentials:
      credentials:
        name: ostest-extraworker-0-bmc-secret
        namespace: assisted-installer
      credentialsVersion: "44478"

Preprovisioning Image (with changes)

- apiVersion: metal3.io/v1alpha1
  kind: PreprovisioningImage
  metadata:
    creationTimestamp: "2023-01-13T16:06:22Z"
    generation: 1
    labels:
      infraenvs.agent-install.openshift.io: myinfraenv
    name: ostest-extraworker-0
    namespace: assisted-installer
    ownerReferences:
    - apiVersion: metal3.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: BareMetalHost
      name: ostest-extraworker-0
      uid: 444d7246-3d0a-4188-a8c4-f407ee4f741f
    resourceVersion: "56838"
    uid: 37f4da76-0d1c-4e05-b618-2f0ab9d5c974
  spec:
    acceptFormats:
    - initrd
    architecture: x86_64
  status:
    architecture: x86_64
    conditions:
    - lastTransitionTime: "2023-01-13T16:34:26Z"
      message: Image has been created
      observedGeneration: 1
      reason: ImageCreated
      status: "True"
      type: Ready
    - lastTransitionTime: "2023-01-13T16:06:24Z"
      message: Image has been created
      observedGeneration: 1
      reason: ImageCreated
      status: "False"
      type: Error
    extraKernelParams: coreos.live.rootfs_url=https://assisted-image-service-assisted-installer.apps.ostest.test.metalkube.org/boot-artifacts/rootfs?arch=x86_64&version=4.12
      rd.break=initqueue
    format: initrd
    imageUrl: https://assisted-image-service-assisted-installer.apps.ostest.test.metalkube.org/images/79ef3924-ee94-42c6-96c3-2d784283120d/pxe-initrd?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI3OWVmMzkyNC1lZTk0LTQyYzYtOTZjMy0yZDc4NDI4MzEyMGQifQ.YazOZS01NoI7g_eVhLmRNmM6wKVVaZJdWbxuePia46Fo0GMLYtSOp1JTvtcStoT51g7VkSnTf8LBJ0zmbGu3HQ&arch=x86_64&version=4.12
    kernelUrl: https://assisted-image-service-assisted-installer.apps.ostest.test.metalkube.org/boot-artifacts/kernel?arch=x86_64&version=4.12
    networkData: {}

This was found while testing ZTP so in this case the assisted-service controllers are altering the preprovisioning image in response to changes made in the assisted-specific CRs, but I don't think this issue is ZTP specific.

https://github.com/openshift/baremetal-operator/pull/270

Bug OCPBUGS-10172: Update 4.14 ose-prometheus-adapter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/68

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/k8s-prometheus-adapter/pull/68

Task HOSTEDCP-1101: Add snyk-secret to push/pull Tekton Files

View the Description View the linked PRs

Add snyk-secret to parameters to the push & pull tekton files so that snyk scan will be performed on HO RHTAP builds.

https://github.com/openshift/hypershift/pull/2788

Bug OCPBUGS-16415: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7353

Bug OCPBUGS-20064: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2050

Bug OCPBUGS-13003: Rebase vSphere CSI driver to 3.0.1

View the Description View the linked PRs

The 3.0.1 version seems to have some important fixes about vsphere CSI driver crashes. We should backport those fixes to 4.13 and 4.14

https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/rn/vmware-vsphere-container-storage-plugin-30-release-notes/index.html

https://github.com/openshift/vmware-vsphere-csi-driver/pull/76

Bug OCPBUGS-7431: openshift-marketplace pods with no 'controller: true' ownerReferences

View the Description View the linked PRs

Description of problem:

Porting rhbz#2057740 to Jira. Pods without a controller: true entry in ownerReferences are not gracefully drained by the autoscaler (and potentially other drain-library drainers). Checking a recent 4.13 CI run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-e2e-aws-ovn/1625150492994703360/artifacts/e2e-aws-ovn/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select([(.ownerReferences // [])[] | select(.controller)] | length == 0) | .namespace + " " + .name + " " + (.ownerReferences | tostring)' | grep -v '^\(openshift-etcd\|openshift-kube-apiserver\|openshift-kube-controller-manager\|openshift-kube-scheduler\) ' 
openshift-marketplace certified-operators-fnm5z [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"certified-operators","uid":"4eb36072-7c56-4663-9b5a-fd23cee85432"}]
openshift-marketplace community-operators-nrfl6 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"community-operators","uid":"0e164593-5656-4592-9915-1a5367a6a548"}]
openshift-marketplace redhat-marketplace-7j7k9 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-marketplace","uid":"14b910c4-0e45-4188-ab57-671070b6a9f1"}]
openshift-marketplace redhat-operators-hxhxw [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-operators","uid":"ca9028e5-affb-4537-81f1-15e3a5129c6e"}]

Version-Release number of selected component (if applicable):

At least 4.11 and 4.13 (above). Likely all OpenShift 4.y which have had these openshift-marketplace pods.

How reproducible:

100%

Steps to Reproduce:

1. Launch a cluster.
2. Inspect the openshift-marketplace pods with: oc -n openshift-marketplace get -o json pods | jq -r '.items[].metadata | select(.namespace == "openshift-marketplace" and (([.ownerReferences[] | select(.controller == true)]) | length) == 0) | .name + " " + (.ownerReferences | tostring)'

Actual results:

certified-operators-fnm5z [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"certified-operators","uid":"4eb36072-7c56-4663-9b5a-fd23cee85432"}]
community-operators-nrfl6 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"community-operators","uid":"0e164593-5656-4592-9915-1a5367a6a548"}]
redhat-marketplace-7j7k9 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-marketplace","uid":"14b910c4-0e45-4188-ab57-671070b6a9f1"}]
redhat-operators-hxhxw [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-operators","uid":"ca9028e5-affb-4537-81f1-15e3a5129c6e"}]

Expected results:

No output.

Additional info:

Figuring out which resource to list as the controller is tricky, but there are workarounds, including pointing at the triggering resource or a ClusterOperator as the controller.

https://github.com/openshift/operator-framework-olm/pull/460

Bug OCPBUGS-15613: Default ingress check not working

View the Description View the linked PRs

Description of problem:

The chk_default_ingress.sh script for keepalived is not correctly matching the default ingress pod name anymore. The pod name in a recently deployed dev-scripts cluster is router-default-97fb6b94c-wfxfk which does not match our grep pattern of router-default-[[:xdigit:]]\\{10}-[[:alnum:]] {5}{}. The main issue seems to be that the first id is only 9 digits, not 10.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Unsure, but has been seen at least twice

Steps to Reproduce:

1. Deploy recent nightly build
2. Look at chk_default_ingress status
3.

Actual results:

Always failing, even on nodes with the default ingress pod

Expected results:

Passes on nodes with default ingress pod

Additional info:

https://github.com/openshift/machine-config-operator/pull/3775

Story HOSTEDCP-918: e2e to run NodePool without setting SG in spec

View the Description View the linked PRs

DoD:

e2e to run NodePool without setting SG in spec

Bug OCPBUGS-16921: Fail to apply machine-config during rhel node upgrade

View the Description View the linked PRs

Description of problem:

ci job "amd64-nightly-4.13-upgrade-from-stable-4.12-vsphere-ipi-proxy-workers-rhel8" failed at rhel node upgrade stage with following error:

TASK [openshift_node : Apply machine config] ***********************************3583task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/apply_machine_config.yml:683584Using module file /opt/python-env/ansible-core/lib64/python3.8/site-packages/ansible/modules/command.py3585Pipelining is enabled.3586<192.168.233.236> ESTABLISH SSH CONNECTION FOR USER: test3587<192.168.233.236> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="test"' -o ConnectTimeout=30 -o IdentityFile=/var/run/secrets/ci.openshift.io/cluster-profile/ssh-privatekey -o StrictHostKeyChecking=no -o 'ControlPath="/alabama/.ansible/cp/%h-%r"' 192.168.233.236 '/bin/sh -c '"'"'sudo -H -S -n  -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-vwugynewkogzaosazvikpnplnmjoluxs ; http_proxy=http://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX@192.168.221.228:3128 https_proxy=http://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX@192.168.221.228:3128 no_proxy=.cluster.local,.svc,10.128.0.0/14,127.0.0.1,172.30.0.0/16,192.168.233.0/25,api-int.ci-op-ssnlf4qb-1dacf.vmc-ci.devcluster.openshift.com,localhost /usr/libexec/platform-python'"'"'"'"'"'"'"'"' && sleep 0'"'"''3588Escalation succeeded3589<192.168.233.236> (1, b'\n{"changed": XXXX, "stdout": "I0726 23:36:56.436283   27240 start.go:61] Version: v4.13.0-202307242035.p0.g7b54f1d.assembly.stream-dirty (7b54f1dcce4ea9f69f300d0e1cf2316def45bf72)\\r\\nI0726 23:36:56.437075   27240 daemon.go:478] not chrooting for source=rhel-8 target=rhel-8\\r\\nF0726 23:36:56.437240   27240 start.go:75] failed to re-exec: writing /rootfs/run/bin/machine-config-daemon: open /rootfs/run/bin/machine-config-daemon: text file busy", "stderr": "time=\\"2023-07-26T19:36:55-04:00\\" level=warning msg=\\"The input device is not a TTY. The --tty and --interactive flags might not work properly\\"", "rc": 255, "cmd": ["podman", "run", "-v", "/:/rootfs", "--pid=host", "--privileged", "--rm", "--entrypoint=/usr/bin/machine-config-daemon", "-ti", "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032", "start", "--node-name", "ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1", "--once-from", "/tmp/ansible.mlldlsm5/worker_ignition_config.json", "--skip-reboot"], "start": "2023-07-26 19:36:55.852527", "end": "2023-07-26 19:36:56.827081", "delta": "0:00:00.974554", "failed": XXXX, "msg": "non-zero return code", "invocation": {"module_args": {"_raw_params": "podman run -v /:/rootfs --pid=host --privileged --rm --entrypoint=/usr/bin/machine-config-daemon -ti quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032 start --node-name ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1 --once-from /tmp/ansible.mlldlsm5/worker_ignition_config.json --skip-reboot", "_uses_shell": false, "warn": false, "stdin_add_newline": XXXX, "strip_empty_ends": XXXX, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b'')3590<192.168.233.236> Failed to connect to the host via ssh: 3591fatal: [192.168.233.236]: FAILED! => {3592    "changed": XXXX,3593    "cmd": [3594        "podman",3595        "run",3596        "-v",3597        "/:/rootfs",3598        "--pid=host",3599        "--privileged",3600        "--rm",3601        "--entrypoint=/usr/bin/machine-config-daemon",3602        "-ti",3603        "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032",3604        "start",3605        "--node-name",3606        "ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1",3607        "--once-from",3608        "/tmp/ansible.mlldlsm5/worker_ignition_config.json",3609        "--skip-reboot"3610    ],3611    "delta": "0:00:00.974554",3612    "end": "2023-07-26 19:36:56.827081",3613    "invocation": {3614        "module_args": {3615            "_raw_params": "podman run -v /:/rootfs --pid=host --privileged --rm --entrypoint=/usr/bin/machine-config-daemon -ti quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032 start --node-name ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1 --once-from /tmp/ansible.mlldlsm5/worker_ignition_config.json --skip-reboot",3616            "_uses_shell": false,3617            "argv": null,3618            "chdir": null,3619            "creates": null,3620            "executable": null,3621            "removes": null,3622            "stdin": null,3623            "stdin_add_newline": XXXX,3624            "strip_empty_ends": XXXX,3625            "warn": false3626        }3627    },3628    "msg": "non-zero return code",3629    "rc": 255,3630    "start": "2023-07-26 19:36:55.852527",3631    "stderr": "time=\"2023-07-26T19:36:55-04:00\" level=warning msg=\"The input device is not a TTY. The --tty and --interactive flags might not work properly\"",3632    "stderr_lines": [3633        "time=\"2023-07-26T19:36:55-04:00\" level=warning msg=\"The input device is not a TTY. The --tty and --interactive flags might not work properly\""3634    ],3635    "stdout": "I0726 23:36:56.436283   27240 start.go:61] Version: v4.13.0-202307242035.p0.g7b54f1d.assembly.stream-dirty (7b54f1dcce4ea9f69f300d0e1cf2316def45bf72)\r\nI0726 23:36:56.437075   27240 daemon.go:478] not chrooting for source=rhel-8 target=rhel-8\r\nF0726 23:36:56.437240   27240 start.go:75] failed to re-exec: writing /rootfs/run/bin/machine-config-daemon: open /rootfs/run/bin/machine-config-daemon: text file busy",3636    "stdout_lines": [3637        "I0726 23:36:56.436283   27240 start.go:61] Version: v4.13.0-202307242035.p0.g7b54f1d.assembly.stream-dirty (7b54f1dcce4ea9f69f300d0e1cf2316def45bf72)",3638        "I0726 23:36:56.437075   27240 daemon.go:478] not chrooting for source=rhel-8 target=rhel-8",3639        "F0726 23:36:56.437240   27240 start.go:75] failed to re-exec: writing /rootfs/run/bin/machine-config-daemon: open /rootfs/run/bin/machine-config-daemon: text file busy"3640    ]3641}3642

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-07-26-101700

How reproducible:

always

Steps to Reproduce:

Found in ci:
1. Install a v4.13.6 cluster with rhel8 node
2. Upgrade ocp succeed
3. Upgrade rhel node

Actual results:

rhel node upgrade failed

Expected results:

rhel node upgrade succeed

Additional info:

job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-4.13-upgrade-from-stable-4.12-vsphere-ipi-proxy-workers-rhel8-p2-f28/1684288836412116992

https://github.com/openshift/machine-config-operator/pull/3825

Bug OCPBUGS-8379: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-agent-image/pull/69

Bug OCPBUGS-8509: [4.13+ ONLY] Don't use port 80 in bootstrap IPI bare metal

View the Description View the linked PRs

Description of problem:

A customer is raising security concerns about using port 80 for bootstrap

Version-Release number of selected component (if applicable):

4.13

~~RFE-3577~~

https://github.com/openshift/installer/pull/6945

Story HOSTEDCP-980: Signal in a metric if KCM or other critical CP component is crashlooping

View the Description View the linked PRs

We should include HostedClusterDegraded in hypershift_hostedclusters_failure_conditions metric so it's obvious when there's an issue across the fleet.

lastTransitionTime: "2023-05-04T13:53:50Z" message: kube-controller-manager deployment has 1 unavailable replicas observedGeneration: 1 reason: UnavailableReplicas status: "True" type: Degraded

https://github.com/openshift/hypershift/pull/2523

Bug OCPBUGS-5356: [OCP 4.10] machine-config-daemon: invalid memory address or nil pointer dereference

View the Description View the linked PRs

Description of problem:

This issue is triggered by the lack of the file "/etc/kubernetes/kubeconfig" in the node, but what i found interesting is the aesthetic error that follows:

2023-01-04T10:56:50.807982171Z I0104 10:56:50.807918   18013 start.go:112] Version: v4.11.0-202212070335.p0.g60746a8.assembly.stream-dirty (60746a843e7ef8855ae00f2ffcb655c53e0e8296)
2023-01-04T10:56:50.810326376Z I0104 10:56:50.810190   18013 start.go:125] Calling chroot("/rootfs")
2023-01-04T10:56:50.810326376Z I0104 10:56:50.810274   18013 update.go:1972] Running: systemctl start rpm-ostreed
2023-01-04T10:56:50.855151883Z I0104 10:56:50.854666   18013 rpm-ostree.go:353] Running captured: rpm-ostree status --json
2023-01-04T10:56:50.899635929Z I0104 10:56:50.899574   18013 rpm-ostree.go:353] Running captured: rpm-ostree status --json
2023-01-04T10:56:50.941236704Z I0104 10:56:50.941179   18013 daemon.go:236] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:318187717bd19ef265000570d5580ea680dfbe99c3bece6dd180537a6f268f
e1 (410.84.202210061459-0)
2023-01-04T10:56:50.973206073Z I0104 10:56:50.973131   18013 start.go:101] Copied self to /run/bin/machine-config-daemon on host
2023-01-04T10:56:50.973259966Z E0104 10:56:50.973196   18013 start.go:177] failed to load kubelet kubeconfig: open /etc/kubernetes/kubeconfig: no such file or directory
2023-01-04T10:56:50.975399571Z panic: runtime error: invalid memory address or nil pointer dereference
2023-01-04T10:56:50.975399571Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x173d84f]
2023-01-04T10:56:50.975399571Z
2023-01-04T10:56:50.975399571Z goroutine 1 [running]:
2023-01-04T10:56:50.975399571Z main.runStartCmd(2023-01-04T10:56:50.975436752Z 0x2c3da80?, {0x1bc0b3b?, 0x0?, 0x0?})
2023-01-04T10:56:50.975436752Z  /go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/start.go:179 +0x70f
2023-01-04T10:56:50.975436752Z github.com/spf13/cobra.(*Command).execute(0x2c3da80, {0x2c89310, 0x0, 0x0})
2023-01-04T10:56:50.975436752Z  /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:860 +0x663
2023-01-04T10:56:50.975448580Z github.com/spf13/cobra.(*Command).ExecuteC(0x2c3d580)
2023-01-04T10:56:50.975448580Z  /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:974 +0x3b4
2023-01-04T10:56:50.975456464Z github.com/spf13/cobra.(*Command).Execute(...)
2023-01-04T10:56:50.975456464Z  2023-01-04T10:56:50.975464649Z /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:902
2023-01-04T10:56:50.975464649Z k8s.io/component-base/cli.Run(2023-01-04T10:56:50.975472575Z 0x2c3d580)
2023-01-04T10:56:50.975472575Z  /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/component-base/cli/run.go:105 +0x385
2023-01-04T10:56:50.975485076Z main.main()
2023-01-04T10:56:50.975485076Z  /go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/main.go:28 +0x25

Version-Release number of selected component (if applicable):

4.11.20

How reproducible:

Always

Steps to Reproduce:

1. Remove / change the name of the file "/etc/kubernetes/kubeconfig"
2. Delete machine-config-daemon pod
3.

Actual results:

2023-01-04T10:56:50.973259966Z E0104 10:56:50.973196   18013 start.go:177] failed to load kubelet kubeconfig: open /etc/kubernetes/kubeconfig: no such file or directory
2023-01-04T10:56:50.975399571Z panic: runtime error: invalid memory address or nil pointer dereference

Expected results:

Fatal error
 
 failed to load kubelet kubeconfig: open /etc/kubernetes/kubeconfig: no such file or directory

but no runtime error

Additional info:

https://github.com/openshift/machine-config-operator/blob/92012a837e2ed0ed3c9e61c715579ac82ad0a464/cmd/machine-config-daemon/start.go#L179

https://github.com/openshift/machine-config-operator/pull/3651

Bug OCPBUGS-10767: [AWS] installer get stuck if BYO private hosted zone is configured

View the Description View the linked PRs

Description of problem:

Installer get stuck at the beginning of installation if BYO private hosted zone is configured in install-config, from the CI logs, installer has no actions in 2 hours.

Errors:
level=info msg=Credentials loaded from the "default" profile in file "/var/run/secrets/ci.openshift.io/cluster-profile/.awscred"
185
{"component":"entrypoint","file":"k8s.io/test-infra/prow/entrypoint/run.go:164","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2023-03-05T16:44:27Z"}

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-23-000343

How reproducible:

Always

Steps to Reproduce:

1. Create an install-config.yaml, and config byo private hosted zone
2. Create the cluster

Actual results:

installer showed the following message and then get stuck, the cluster can not be created.

level=info msg=Credentials loaded from the "default" profile in file "/var/run/secrets/ci.openshift.io/cluster-profile/.awscred"

Expected results:

create cluster successfully

Additional info:

https://github.com/openshift/installer/pull/7070

Bug OCPBUGS-13955: Cannot override base image selection when creating agent ISO

View the Description View the linked PRs

Description of problem:

It's not currently possible to override the base image selected by the command:

$ openshift-install agent create image

Also defining the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE variable does not have any effect

Version-Release number of selected component (if applicable):

4.14

How reproducible:

By defining the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE when creating the image

Steps to Reproduce:

1. $ OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=<valid url to rhcos image> 
2. $ openshift-install agent create image
3.

Actual results:

The agent ISO is built by using the embedded rhcos.json metadata, instead of the rhcos image specified in the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE

Expected results:

Defining OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE should allow overriding the base image selected for creating the agent ISO

Additional info:

https://github.com/openshift/installer/pull/7211

Bug MGMT-14306: [Staging] [UI] - Day2 - In add hosts - getting "Failed to set role" when assigning "auto-assign" role to a host

View the Description View the linked PRs

Description of the problem:

In staging, UI 2.18.2, BE 2.18.0 - Day2 add hosts - getting the following error when assigning auto-assign role:

Failed to set roleRequested
 role (auto-assign) is invalid for host 
c746e34f-f44a-4291-9064-402ab95b5831 from infraEnv 
2b4ee2bf-ee45-4f57-b64e-715bc955f92e

How reproducible:

100%

Steps to reproduce:

1. install day1 cluster

2. In OCM, go to add host and discover new host

3. Assign auto-select role to this host

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5247

Bug MGMT-15295: [STG] Custom Manifest - after starting to install cluster with custom manifest , manifest is removed and not listable via http request

View the Description View the linked PRs

Description of the problem:
Please see Screening
Once installation started of cluster with valid custom manifest , manifest is no longer listable not mentioned in UI neither in cluster logs also when calling api/assisted-install/v2/clusters/{}/manifests
before installation manifest is listed , however after installation starts http api return error

{ "code": "500", "href": "", "id": 500, "kind": "Error", "reason": "Cannot list file 3a46c77e-bafc-4b66-87c8-80fe4e18806c/manifests/openshift/50-masters-chrony-configuration.yaml in cluster 3a46c77e-bafc-4b66-87c8-80fe4e18806c" }

How reproducible:
100%

Steps to reproduce:

1. created cluster with custom manifest

2. was able to see manifest in cluster details in installation page (before installation started)

3.also able to retrieve it via http get request

4. started installation
Actual results:
custom manifest no longer visible and not mentioned in logs
http get request returning above mentioned error (500)
it seems custom manifest was not added

Expected results:
manifest should still be visible and applied

https://github.com/openshift/assisted-service/pull/5366

Bug OCPBUGS-13108: log additional host info at warning level

View the Description View the linked PRs

Description of problem:

For https://issues.redhat.com//browse/OCPBUGS-4998, additional logging was added to the wait-for command when the state is in pending-user-action in order to show the particular host errors preventing installation. This additional host info should be added at the WARNING level.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Test this in the same as bug https://issues.redhat.com//browse/OCPBUGS-4998, i.e. by swapping the boot order of the disks
2. When the log message with additional info is logged it is logged at DEBUG level, for example
DEBUG Host master-2 Expected the host to boot from disk, but it booted the installation image - please reboot and fix boot order to boot from disk Virtual_disk 6000c295b246decdbb4f4e691c185fcf (sda, /dev/disk/by-id/wwn-0x6000c295b246decdbb4f4e691c185fcf)INFO cluster has stopped installing... working to recover installation
3. This has now been changed to log at WARNING level
4. In addition multiple messages are logged:
"level=info msg=cluster has stopped installing... working to recover installation". This will change to only log it one time.

Actual results:

Expected results:

1. The message is now logged at WARNING level
2. Only one message for "cluster has stopped installing... working to recover installation" will appear

Additional info:

https://github.com/openshift/installer/pull/7209

Bug OCPBUGS-14296: CI Failure: event happened 49 times, something is wrong: ns/openshift-etcd-operator deployment/etcd-operator hmsg/593a6eb603 - pathological/true reason/UnstartedEtcdMember unstarted members

View the Description View the linked PRs

From a recent PR run of the recovery suite:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/1049/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-etcd-recovery/1651162451397316608

> event happened 49 times, something is wrong: ns/openshift-etcd-operator deployment/etcd-operator hmsg/593a6eb603 - pathological/true reason/UnstartedEtcdMember unstarted members: NAME-PENDING-10.0.167.169 From: 10:39:53Z To: 10:39:54Z result=reject

Since the remainder of the test has passed, the event might not be reconciled correctly when a member is coming back in CEO. We should fix this event.

https://github.com/openshift/cluster-etcd-operator/pull/1059

Bug OCPBUGS-20080: [4.14] Keepalived on bootstrap doesn't start due to missing confiugration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19492~~. The following is the description of the original issue:
—
Description of problem:

Keepalived constantly fails on bootstrap causing installation failure

Seems like it doesn't have keepalived.conf file and keepalived monitor fails on
Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Regular installation through assisted installer

Steps to Reproduce:

1.
2.
3.

Actual results:

keepalived fails to start

Expected results:

Success

Additional info:
*

https://github.com/openshift/baremetal-runtimecfg/pull/277

Bug OCPBUGS-11147: network_logs: Gather multus resource yamls for namespaces

View the Description View the linked PRs

Extend multus resource collection so that we gather all resources on a per namespace basis with oc adm inspect.
This way, users can create a combined must-gather with all resources in one place.

We might have to revisit this once the reconciler and other changes land in more recent version of multus, but for the time being I think that this is a good change to make that we can also bp to older versions

https://github.com/openshift/must-gather/pull/354

Bug OCPBUGS-12572: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/113

Bug OCPBUGS-12709: --external-cloud-volume-plugin for out-of tree providers

View the Description View the linked PRs

Due to removal of in-tree AWS provider https://github.com/kubernetes/kubernetes/pull/115838 we need to ensure that KCM is setting --external-cloud-volume-plugin flag accordingly, especially that the CSI migration was GA-ed in 4.12/1.25.

The original PR that fixed this (https://github.com/openshift/cluster-kube-controller-manager-operator/pull/721) got reverted by mistake. We need to bring it back to unblock the kube rebase.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/729

Bug OCPBUGS-12776: GCP XPN Private Cluster fails with no public zone

View the Description View the linked PRs

Description of problem:

When there is no public zone in dns zone, the look up will fail during install. During the installation of a private cluster, there is no need for a public zone.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

FATAL failed to fetch Terraform Variables: failed to generate asset 
"Terraform Variables": failed to get GCP public zone: no matching public
 DNS Zone found

Expected results:

Installation complete

Additional info:

https://github.com/openshift/installer/pull/7134

Bug OCPBUGS-14395: DNS Operator log.SetLogger was never called error message

View the Description View the linked PRs

Description of problem:

cluster-dns-operator startup has an error message:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Start cluster-dns-operator
2. oc edit dnses.operator.openshift.io default
  -> Change operatorLogLevel to "Trace" or "Debug" (it doesn't matter which, we just want to trigger an update)
3. Observe backtrace in logs

Actual results:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 201 [running]:
runtime/debug.Stack()
	/usr/lib/golang/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithValues(0xc0000bae40, {0xc000768ae0, 0x6, 0x6})
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:168 +0x54
github.com/go-logr/logr.Logger.WithValues(...)
	/dns-operator/vendor/github.com/go-logr/logr/logr.go:323
sigs.k8s.io/controller-runtime/pkg/controller.NewUnmanaged.func1(0xc000991980)
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/controller/controller.go:121 +0x1f6
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003265a0, {0x1bddf28, 0xc00049d7c0}, {0x17b6120?, 0xc000991960?})
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:305 +0x18b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003265a0, {0x1bddf28, 0xc00049d7c0})
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222 +0x587

Expected results:

No error message

Additional info:

This is due to 1.27 rebase: https://github.com/openshift/cluster-dns-operator/pull/368

https://github.com/openshift/cluster-dns-operator/pull/369

Bug OCPBUGS-8540: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6950

Story CCO-363: Azure pod identity webhook

View the Description View the linked PRs

Will require following

fork webhook
make part of build process + OCP build dockerfile receival
write CCO controller which deploys the webhook

https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html

Background

We deploy the AWS STS pod identity webhook as a customer convenience for configuring their applications to utilize service account tokens minted by a cluster that supports STS. When you create a pod that references a service account, the webhook looks for annotations on that service account and if found, the webhook mutates the deployment in order to set environment variables + mounts the service account token on that deployment so that the pod has everything it needs to make an API client.
Our temporary access token (using TAT in place of STS because STS is AWS specific) enablement for (select) third party operators does not rely on the webhook and is instead using CCO to create a secret containing the variables based on the credentials requests. The service account token is also explicitly mounted for those operators. Pod identity webhooks were considered as an alternative to this approach but weren't chosen.
Basically, if we deploy this webhook it will be for customer convenience and will enable us to potentially use the Azure pod identity webhook in the future if we so chose. Note that AKS provides this webhook and other clouds like Google offer a webhook solution for configuring customer applications.
This is about providing parity with other solutions but not required for anything directly related to the product.
If we don't provide this Azure pod identity webhook method, customer would need to get the details via some other way like a secret or set explicitly as environment variables. With the webhook, you just annotate your service account.

https://github.com/openshift/cloud-credential-operator/pull/559

Bug OCPBUGS-10073: Update 4.14 ose-gcp-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-gcp/pull/193

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-gcp/pull/193

Bug OCPBUGS-13099: Update 4.14 ose-cluster-dns-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/363

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/363

Bug OCPBUGS-8274: skip /api/request-token request when auth is disabled

View the Description View the linked PRs

Description of problem:

when run local bridge with auth disabled, we can see error
GET http://localhost:9000/api/request-token 404 (Not Found)

Version-Release number of selected component (if applicable):

latest master

How reproducible:

Always

Steps to Reproduce:

1. fetch latest openshift/console code and build
2. run local bridge './bin/bridge'
3.

Actual results:

visiting localhost:9000 we can see errors GET http://localhost:9000/api/request-token 404 (Not Found)

Expected results:

maybe we should skip /api/request-token request when auth is disabled, as suggested in https://github.com/openshift/console/pull/12553#discussion_r1103151813

Additional info:

Bug OCPBUGS-11036: Nodes in Ironic are created without namespaces initially

View the Description View the linked PRs

Nodes in Ironic are created following pattern <namespace>~<host name>.

However, when creating nodes in Ironic, baremetal-operator first creates them without a namespace, and only prepends the namespace prefix later. This open a possibility of node clashes, especially in the ACM context.

https://github.com/openshift/baremetal-operator/pull/264

Bug OCPBUGS-19338: Hide DeploymentConfig option from forms when its not installed in the cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19313~~. The following is the description of the original issue:
—

Description

As a user, I dont want to see the option of "DeploymentConfigs" in any form I am filling, when I have not installed the same in the cluster.

Acceptance Criteria

Remove the DC option under the Resource Type dropdown in following forms:
- Import from Git
- Container Image
- Import JAR
- Builder Images (Developer Catalog)

Additional Details:

https://github.com/openshift/console/pull/13161

Bug OCPBUGS-7484: When there are 2 pipelines displayed in the dropdown menu, selecting one, unchecks the Add Pipeline checkbox

View the Description View the linked PRs

Description of problem:

The issue is regarding the Add Pipeline Checkbox. When there are 2 pipelines displayed in the dropdown menu, selecting one, unchecks the Add Pipeline checkbox.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always when 2 pipelines are in the ns

Steps to Reproduce:

1. Go to the Git Import Page. Create the application with Add Pipelines checked and a pipeline selected.
2. Go to the Serverless Function Page. Select Add Pipelines checkbox and try to select a pipeline from the drop-down.

Actual results:

The Add Pipelines checkbox automatically gets unchecked on selecting a Pipeline from the drop-down (in case of multiple pipelines in the dropdown)

Expected results:

The Add Pipelines checkbox must not get un-checked.

Additional info:

Video Link: https://drive.google.com/file/d/1OPRXbMw-EiihO3LAlDiOsh8qvhhiJK5H/view?usp=sharing

https://github.com/openshift/console/pull/12650

Bug OCPBUGS-8695: NetworkManager TUI quits regardless of a detected unsupported configuration

View the Description View the linked PRs

Description of problem:

In Agent TUI, setting

IPV6 Configuration to Automatic

and enabling

Require IPV6 addressing for this connection

generates a message saying that the feature is not supported. The user is allowed to quit the TUI (formally correct given that we select 'Quit' from the menu, I wonder if the 'Quit' options should remain greyed out until a valid config is applied? ) and the boot process proceeds using an unsupported/not working network configuration

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-07-131556

How reproducible:

Steps to Reproduce:

1. Feed the agent ISO with an agent-config.yaml file that defines an ipv6 only, static network configuration

2. Boot from the generated agent ISO, wait for the agent TUI to appear, select 'Edit a connection', than change Ipv6 configuration from Manual to Automatic, contextually enable the 'Require IPV6 addressing for this connection' option. Accept the changes.

3. (Not sure if this step is necessary) Once back in the main agent TUI screen, select 'Activate a connection'.
Select the currently active connection, de-activate and re-activate it.

4. Go back to main agent TUI screen, select Quit

Actual results:

The agent TUI displays the following message than quits

Failed to generate network state view: support for multiple default routes not yet implemented in agent-tui

Once the TUI quits, the boot process proceeds

Expected results:

The TUI blocks the possibility to enable unsupported configurations

The agent TUI informs the user about the unsupported configuration the moment it is applied (instead of informing the user the moment he selects 'Quit') and stays opened until a valid network configuration is applied

The TUI should put the boot process on hold until a valid network config is applied

Additional info:

OCP Version: 4.13.0-0.nightly-2023-03-07-131556 

agent-config.yaml snippet

  networkConfig:
    interfaces:
      - name: eno1
        type: ethernet
        state: up
        mac-address: 34:73:5A:9E:59:10
        ipv6:
          enabled: true
          address:
            - ip: 2620:52:0:1eb:3673:5aff:fe9e:5910
              prefix-length: 64
          dhcp: false

https://github.com/openshift/assisted-installer-agent/pull/517

Bug OCPBUGS-10235: [OVN] baremetal 30-static-dhcpv6 shell quoting error: binary operator expected

View the Description View the linked PRs

Description of problem:

I found an old shell error while checking logs. We don't quote a variable with [ -z ].

    if [ -z $DHCP6_IP6_ADDRESS ]
    then
        >&2 echo "Not a DHCP6 address. Ignoring."
        exit 0
    fi

https://github.com/openshift/machine-config-operator/blob/master/templates/common/baremetal/files/NetworkManager-static-dhcpv6.yaml#L8


Dec 05 12:05:02 master-0-2 nm-dispatcher[1365]: time="2022-12-05T12:05:02Z" level=debug msg="Ignoring filtered route {Ifindex: 10 Dst: fd2e:6f44:5dd8::59/128 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
Dec 05 12:05:02 master-0-2 nm-dispatcher[1365]: time="2022-12-05T12:05:02Z" level=debug msg="Ignoring filtered route {Ifindex: 10 Dst: fd2e:6f44:5dd8::5a/128 Src: <nil> Gw: <nil> Flags: [] Table: 254}"

Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: req:19 'up' [br-ex], "/etc/NetworkManager/dispatcher.d/30-static-dhcpv6": run script
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + '[' -z fd2e:6f44:5dd8::5a fd2e:6f44:5dd8::59 ']'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: /etc/NetworkManager/dispatcher.d/30-static-dhcpv6: line 4: [: fd2e:6f44:5dd8::5a: binary operator expected
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ ip -j -6 a show br-ex
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ jq -r '.[].addr_info[] | select(.scope=="global") | select(.deprecated!=true) | select(.local=="fd2e:6f44:5dd8::5a fd2e:6f44:5dd8::59") | .preferred_life_time'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + LEASE_TIME=
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ ip -j -6 a show br-ex
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ jq -r '.[].addr_info[] | select(.scope=="global") | select(.deprecated!=true) | select(.local=="fd2e:6f44:5dd8::5a fd2e:6f44:5dd8::59") | .prefixlen'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + PREFIX_LEN=
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + '[' 0 -lt 4294967295 ']'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + echo 'Not an infinite DHCP6 lease. Ignoring.'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: Not an infinite DHCP6 lease. Ignoring.
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + exit 0
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: req:19 'up' [

Version-Release number of selected component (if applicable):

4.10.0-0.nightly-2022-11-30-111136

How reproducible:

Twice

Steps to Reproduce:

1. Somehow DHCPv6 provides two IPv6 leases
2. NetworkManager sets $DHCP6_IP6_ADDRESS to be all IPv6 address with spaces in-between
3. Bash error

Actual results:


/etc/NetworkManager/dispatcher.d/30-static-dhcpv6: line 4: [: fd2e:6f44:5dd8::5a: binary operator expected

Expected results:

shell inputs are sanitized or properly quoted.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3679

Bug OCPBUGS-17257: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1935

Bug OCPBUGS-19921: Avoid panicking on all-fresh-cache evaluation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19868~~. The following is the description of the original issue:
—

Description of problem:

The cluster-version operator should not crash while trying to evaluate a bogus condition.

Version-Release number of selected component (if applicable):

4.10 and later are exposed to the bug. It's possible that the ~~OCPBUGS-19512~~ series increases exposure.

How reproducible:

Unclear.

Steps to Reproduce:

1. Create a cluster.
2. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge.json (you may need to adjust version strings and digests for your test-cluster's release).
3. Wait around 30 minutes.
4. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json (again, may need some customization).

Actual results:

$ grep -B1 -A15 'too fresh' previous.log
I0927 12:07:55.594222       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json?arch=amd64&channel=stable-4.15&id=dc628f75-7778-457a-bb69-6a31a243c3a9&version=4.15.0-0.test-2023-09-27-091926-ci-ln-01zw7kk-latest
I0927 12:07:55.726463       1 cache.go:118] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is the most stale cached cluster-condition match entry, but it is too fresh (last evaluated on 2023-09-27 11:37:25.876804482 +0000 UTC m=+175.082381015).  However, we don't have a cached evaluation for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, so attempt to evaluate that now.
I0927 12:07:55.726602       1 cache.go:129] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is stealing this cluster-condition match call for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, because its last evaluation completed 30m29.849594461s ago
I0927 12:07:55.758573       1 cvo.go:703] Finished syncing available updates "openshift-cluster-version/version" (170.074319ms)
E0927 12:07:55.758847       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 194 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1c4df00?, 0x32abc60})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001489d40?})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1c4df00, 0x32abc60})
        /usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql.(*PromQL).Match(0xc0004860e0, {0x220ded8, 0xc00041e550}, 0x0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql/promql.go:134 +0x419
github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache.(*Cache).Match(0xc0002d3ae0, {0x220ded8, 0xc00041e550}, 0xc0033948d0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache/cache.go:132 +0x982
github.com/openshift/cluster-version-operator/pkg/clusterconditions.(*conditionRegistry).Match(0xc000016760, {0x220ded8, 0xc00041e550}, {0xc0033948a0, 0x1, 0x0?})

Expected results:

No panics.

Additional info:

I'm still not entirely clear on how ~~OCPBUGS-19512~~ would have increased exposure.

https://github.com/openshift/cluster-version-operator/pull/976

Bug OCPBUGS-10699: Modification of alerts for `Kube*QuotaOvercommit`

View the Description View the linked PRs

There are prometheus rules defined in the kubestate rules which trigger alerts for the `Kube*QuotaOvercommit` ,

These alerts are triggered when the sum of memory/CPU resource quotas for the default/kube-/openshift- namespaces exceed the capacity of the cluster.

Since there are no quotas defined inside default OCP projects and Cu is not expected to create any quota for the default ocp project having these alerts is not adding any value , it would be good to have them removed

https://github.com/openshift/cluster-monitoring-operator/pull/2049

Bug OCPBUGS-15256: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-19776: 404: not found will shonw on Knative-serving Details page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18267~~. The following is the description of the original issue:
—
Description of problem:

'404: Not Found' will show on Knative-serving Details page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Installed 'Serveless' Operator, make sure the operator has been installed successfully, and the Knative Serving instance is created without any error
2. Navigate to Administration -> Cluster Settings -> Global Configuration
3. Go to Knative-serving Details page, check if 404 not found message is there
3.

Actual results:

Page will show 404 not found

Expected results:

the 404 not found page should not show

Additional info:

the dependency ticket is OCPBUGs-15008, more information could be checked in the comment

https://github.com/openshift/console/pull/13193

Bug OCPBUGS-9063: OLM Form view configuration needs some fields to be filled even though they are optional

View the Description View the linked PRs

Description of problem:

When deploying KafkaMirrorMaker through OLM form (in AMQ Streams and Strimzi operator) we have to specify fields, which already have defaults and are optional:

Liveness Probe
Readiness Probe
Tracing

For all other components it's correct.

Version-Release number of selected component (if applicable):

4.6
4.7
4.8
4.9

How reproducible:

Steps to Reproduce:
1. Deploy Strimzi 0.27.0 or AMQ Streams 1.8.4 via OLM
2. Try to deploy KafkaMirrorMaker via Form view without any changes

Actual results:
CR cannot be created because several required fields (all are in Liveness probe, Readiness probe and Tracing part) are not filled.

Expected results:
CR will be created, because all required fields are set (whitelist/include, kafka bootstrap address and replicas count, nothing else is needed)

Additional info:

https://github.com/openshift/console/pull/12788

Bug OCPBUGS-11162: openshift-azure-routes triggered continously on rhel9

View the Description View the linked PRs

openshift-azure-routes.path has the following [Path] section:

[Path]
PathExistsGlob=/run/cloud-routes/*
PathChanged=/run/cloud-routes/
MakeDirectory=true

There was a change in systemd that re-checks the files watched with PathExistsGlob once the service finishes:

With this commit, systemd rechecks all paths specs whenever the triggered unit deactivates. If any PathExists=, PathExistsGlob= or DirectoryNotEmpty= predicate passes, the triggered unit is reactivated

This means that openshift-azure-routes will get triggered all the time as long there are files in /run/cloud-routes.

https://github.com/openshift/machine-config-operator/pull/3643

Bug OCPBUGS-13854: Refactor Kubelet Pod Manager Downstream Components

View the Description View the linked PRs

Description of problem:

Backport https://github.com/kubernetes/kubernetes/pull/117371

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/1578

Bug OCPBUGS-14909: Disable web-terminal tests in CI

View the Description View the linked PRs

Description of problem:

Web-terminal tests are constantly failing on CI. Disable them till they are fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-console-master-e2e-gcp-console

https://search.ci.openshift.org/?search=Web+Terminal+for+Admin+user&maxAge=336h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Expected results:

Additional info:

https://github.com/openshift/console/pull/12892

Bug OCPBUGS-15095: kubvirt digest missing from 4.14 boot images

View the Description View the linked PRs

Description of problem:

kubevirt digest missing from RHCOS boot image

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Unable to create kubevirt cluster

Expected results:

Able to create kubevirt cluster

Additional info:

https://github.com/openshift/installer/pull/7254

Bug OCPBUGS-11992: aws-proxy jobs are failing with machine-config-operator errors

View the Description View the linked PRs

Description of problem:

aws-proxy jobs are failing with workers unable to come up. Example job run[1].  On the console, the workers report 500 errors trying to retrieve the worker ignition[2]. 

Is it possible https://github.com/openshift/machine-config-operator/pull/3662 broke things? See logs below.


[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-proxy/1648560213655031808
[2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-proxy/1648560213655031808/artifacts/e2e-aws-ovn-proxy/gather-aws-console/artifacts/i-071b5af3ddb12e55c

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.  Install with a proxy

Actual results:

No workers come up

Expected results:

Additional info:

Logs are reporting: 

2023-04-19T12:29:38.244051716Z I0419 12:29:38.244006 1 container_runtime_config_controller.go:415] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift .io "machine-config-controller" not found 2023-04-19T12:29:56.507515526Z I0419 12:29:56.507472 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found

./pods/machine-config-operator-6d7c6c8ccf-m7c57/machine-config-operator/machine-config-operator/logs/current.log:2023-04-19T12:38:15.240508503Z E0419 12:38:15.240437 1 operator.go:342] ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.proxy.apiVersion: Required value: must not be empty, spec.proxy.kind: Required value: must not be empty, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

https://github.com/openshift/machine-config-operator/pull/3682

Bug OCPBUGS-8330: csi-snapshot-controller ServiceAccount does not include the HCP pull-secret in its imagePullSecrets

View the Description View the linked PRs

csi-snapshot-controller ServiceAccount does not include the HCP pull-secret in its imagePullSecrets. Thus, if a HostedCluster is created with a `pullSecret` that contains creds that the management cluster pull secret does not have, the image pull fails.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/142

Bug OCPBUGS-8449: [azure] Install fails when setting diskEncryptionSet under defaultMachinePlatform/controlPlane/compute without subscriptionId

View the Description View the linked PRs

Description of problem:

Configure diskEncryptionSet as below in install-config.yaml, and not set subscriptionID as it is optional parameter.

install-config.yaml
--------------------------------
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    azure:
      encryptionAtHost: true
      osDisk:
        diskEncryptionSet:
          resourceGroup: jima07a-rg
          name: jima07a-des
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    azure:
      encryptionAtHost: true
      osDisk:
        diskEncryptionSet:
          resourceGroup: jima07a-rg
          name: jima07a-des
  replicas: 3
platform:
  azure:
    baseDomainResourceGroupName: os4-common
    cloudName: AzurePublicCloud
    outboundType: Loadbalancer
    region: centralus
    defaultMachinePlatform:
      osDisk:
        diskEncryptionSet:
          resourceGroup: jima07a-rg
          name: jima07a-des

Then create manifests file and create cluster, installer failed with error:
$ ./openshift-install create cluster --dir ipi --log-level debug
...
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.azure.defaultMachinePlatform.osDisk.diskEncryptionSet: Invalid value: azure.DiskEncryptionSet{SubscriptionID:"", ResourceGroup:"jima07a-rg", Name:"jima07a-des"}: failed to get disk encryption set: compute.DiskEncryptionSetsClient#Get: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidSubscriptionId" Message="The provided subscription identifier 'resourceGroups' is malformed or invalid." 

Checked manifest file cluster-config.yaml, and found that subscriptionId is not filled out automatically under defaultMachinePlatform
$ cat cluster-config.yaml
apiVersion: v1
data:
  install-config: |
    additionalTrustBundlePolicy: Proxyonly
    apiVersion: v1
    baseDomain: qe.azure.devcluster.openshift.com
    compute:
    - architecture: amd64
      hyperthreading: Enabled
      name: worker
      platform:
        azure:
          encryptionAtHost: true
          osDisk:
            diskEncryptionSet:
              name: jima07a-des
              resourceGroup: jima07a-rg
              subscriptionId: 53b8f551-f0fc-4bea-8cba-6d1fefd54c8a
            diskSizeGB: 0
            diskType: ""
          osImage:
            offer: ""
            publisher: ""
            sku: ""
            version: ""
          type: ""
      replicas: 3
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform:
        azure:
          encryptionAtHost: true
          osDisk:
            diskEncryptionSet:
              name: jima07a-des
              resourceGroup: jima07a-rg
              subscriptionId: 53b8f551-f0fc-4bea-8cba-6d1fefd54c8a
            diskSizeGB: 0
            diskType: ""
          osImage:
            offer: ""
            publisher: ""
            sku: ""
            version: ""
          type: ""
      replicas: 3
    metadata:
      creationTimestamp: null
      name: jimadesa
    networking:
      clusterNetwork:
      - cidr: 10.128.0.0/14
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.0.0.0/16
      networkType: OVNKubernetes
      serviceNetwork:
      - 172.30.0.0/16
    platform:
      azure:
        baseDomainResourceGroupName: os4-common
        cloudName: AzurePublicCloud
        defaultMachinePlatform:
          osDisk:
            diskEncryptionSet:
              name: jima07a-des
              resourceGroup: jima07a-rg
            diskSizeGB: 0
            diskType: ""
          osImage:
            offer: ""
            publisher: ""
            sku: ""
            version: ""
          type: ""
        outboundType: Loadbalancer
        region: centralus
    publish: External

It works well when setting disk encryption set without subscriptionId under defalutMachinePlatform or controlPlane/compute.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-05-104719

How reproducible:

Always on 4.11, 4.12, 4.13

Steps to Reproduce:

1. Prepare install-config, configure diskEncrpytionSet under defaultMchinePlatform, controlPlane and compute without subscriptionId
2. Install cluster 
3.

Actual results:

Cluster is installed successfully

Expected results:

installer failed

Additional info:

Bug OCPBUGS-8457: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/65

Bug OCPBUGS-13564: OCP installer's OpenStack Ironic iRMC driver doesn'e work with FIPS mode enabled.

View the Description View the linked PRs

Description of problem:

OCP installer's OpenStack Ironic iRMC driver doesn'e work with FIPS mode enabled, as it requires SNMP version to be set to v3. However, there is no way to set the SNMP version parameter in the RHOCP installer yaml file, so it falls back to default v2, and it fails 100% of the time.

Version-Release number of selected component (if applicable):

Release Number: 14.0-ec.0

Drivers or hardware or architecture dependency:
Deploy baremetal node with BMC using iRMC protocol(When RHOCP installer uses OpenStack Ironic iRMC driver)

Hardware configuration:
Model/Hypervisor: PRIMERGY RX2540 M6
CPU Info: Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz
Memory Info: 125G
Hardware Component Information: None
Configuration Info: None
Guest Configuration Info: None

How reproducible:

Always

Steps to Reproduce:

  1. Enable FIPS mode of RHOCP nodes through setting "fips" to "true" at install-config.yaml.
  2. In install-config.yaml, set platform.baremetal.hosts.bmc.address to start with 'irmc://'
  3. Run OpenShift Container Platform installer.

Actual results:

OpenStack Ironic iRMC driver used in OpenShift Container Platform installer doesn't work and installation fails. Log message suggests setting SNMP version parameter of Ironic iRMC driver to v3 (non-default value) under FIPS mode enabled.

Expected results:

When FIPS mode is enabled on RHOCP, OpenStack Ironic iRMC driver used in RHOCP installer checks whether iRMC driver is configured to use SNMP (current OCP installer configures iRMC driver not to use SNMP) and if iRMC driver is configured not to use SNMP, driver doesn't require setting SNMP version parameter to v3 and installation proceeds. If iRMC driver is configured to use SNMP, driver requires setting SNMP version parameter to v3.

Additional info:

When FIPS mode is enabled, installation of RHOCP into Fujitsu server fails
because OpenStack Ironic iRMC driver, which is used in RHOCP installer,
requires iRMC driver's SNMP version parameter to be set to v3 even though
iRMC driver isn't configured to use SNMP and there is no way to set it to v3.

Installing RHOCP with IPI to baremetal node uses install-config.yaml.
User sets configuration related to RHOCP in install-config.yaml.
This installation uses OpenStack Ironic internally and values in
install-config.yaml affect behavior of Ironic.
During installation, Ironic connects to BMC(Baseboard management controller)
and does operation related to RHOCP installation (e.g. power management).

Ironic uses iRMC driver to operate on Fujitsu server's BMC. And iRMC driver checks
iRMC-driver-specific Ironic parameters stored at Ironic component.
When FIPS is enabled (i.e. "fips" is set to "true" in install-config.yaml), iRMC
driver checks whether SNMP version specified in Ironic parameter to be set to v3
even though iRMC driver isn't configured to use SNMP internally.
Currently, default value of SNMP version parameter of Ironic, which is iRMC driver
specific parameter, is v2c and not v3. And iRMC driver fails with error if SNMP
version is set to other than v3 when FIPS enabled.

However, there is no way to set SNMP version parameter in RHOCP and that
parameter is set to v2c by default. So when FIPS is enabled, deployment of
OpenShift to Fujitsu server always fails.

Cause of problem is, when FIPS is enabled, iRMC driver always requires SNMP
version parameter to be set to v3 even though iRMC driver is not configured
to use SNMP (current RHOCP installer configures iRMC driver not to use SNMP).
To solve this problem, iRMC driver should be modified to check whether iRMC driver
is configured to use SNMP internally and, if iRMC driver is configured to use SNMP
and FIPS is enabled, requires SNMP version parameter to be set to v3.
Such modification patch is already submitted to OpenStack Ironic community[1].

Summary of actions taken to resolve issue:
Use OpenStack Ironic iRMC driver which incorporates bug fix patch[1] submitted on OpenStack Ironic community.

[1] https://review.opendev.org/c/openstack/ironic/+/881358

https://github.com/openshift/ironic-image/pull/379

Bug OCPBUGS-14077: MULTIARCH-3492: Avoid conflicting subnets

View the Description View the linked PRs

Description of problem:

Currently PowerVS uses a DefaultMachineCIDR: 192.168.0.0/24
This will create network conflicts if another cluster is created in the same zone.

Version-Release number of selected component (if applicable):

current master branch

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

The fix is to use a random number for DefaultMachineCIDR: 192.168.%d.0/24 This should significantly reduce the chances for collisions.

https://github.com/openshift/installer/pull/7145

Bug OCPBUGS-19074: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/181

Bug OCPBUGS-19796: tokenConfig's accessTokenInactivityTimeout fields doesn't work in hypershift guest cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13829~~. The following is the description of the original issue:
—
Description of problem:

The configured accessTokenInactivityTimeout under tokenConfig in HostedCluster doesn't have any effect.
1. The value is not getting updated in oauth-openshift configmap 
2. hostedcluster allows user to set accessTokenInactivityTimeout value < 300s, where as in master cluster the value should be > 300s.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Install a fresh 4.13 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Check the hcp:
$ oc get hcp -oyaml
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...

4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Actual results:

1. hostedcluster will allow user to set the value < 300s for accessTokenInactivityTimeout which is not possible on master cluster.

2. The value is not updated in oauth-openshift configmap:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-25785 
...
      tokenConfig:
        accessTokenMaxAgeSeconds: 86400
        authorizeTokenMaxAgeSeconds: 300
...

3. Login doesn't fail even if the user is not active for more than the set accessTokenInactivityTimeout seconds.

Expected results:

Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

https://github.com/openshift/hypershift/pull/3052

Bug OCPBUGS-13892: there are spaces before and after pod_network_name_info metric

View the Description View the linked PRs

Description of problem:

administrator console UI, admin user goes to "Workloads -> Pods", select one project, example: openshift-console, select one pod and go to Pod details page, click "Metrics" tab, then click on "Network in" or "Network out" graph, it will show the prometheus expression, would find there are spaces before and after "pod_network_name_info", it's "( pod_network_name_info )", "pod_network_name_info" is enough

"Network in" expression

(sum(irate(container_network_receive_bytes_total{pod='console-5f4978747c-vmxqf', namespace='openshift-console'}[5m])) by (pod, namespace, interface)) + on(namespace,pod,interface) group_left(network_name) ( pod_network_name_info )

"Network out" expression

(sum(irate(container_network_transmit_bytes_total{pod='console-5f4978747c-vmxqf', namespace='openshift-console'}[5m])) by (pod, namespace, interface)) + on(namespace,pod,interface) group_left(network_name) ( pod_network_name_info )

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-19-234822

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

there are spaces before and after pod_network_name_info

Expected results:

no additional spaces

Additional info:

the bug does not have functional impact

https://github.com/openshift/console/pull/13043

Bug OCPBUGS-16245: DHCP networking is not applicable using config-image API

View the Description View the linked PRs

Description of problem:

Using agent-config.yaml with DHCP network mode (i.e. without 'hosts' property), throws this error when loading the config-image: 
load-config-iso.sh[1656]: Expected file /etc/assisted/manifests/nmstateconfig.yaml is not in archive

Version-Release number of selected component (if applicable):

4.14 (master)

How reproducible:

100%

Steps to Reproduce:

1. Create an agent-config.yaml without 'hosts' property.
2. Generate a config-image.
3. Boot the machine and mount the ISO.

Actual results:

Installation can't continue due to an error on config-iso load:
load-config-iso.sh[1656]: Expected file /etc/assisted/manifests/nmstateconfig.yaml is not in archive

Expected results:

The installation should continue as normal.

Additional info:

The issue is probably due to a fix introduced for static networking:
https://issues.redhat.com/browse/OCPBUGS-15637
I.e. since '/etc/assisted/manifests/nmstateconfig.yaml' was added to GetConfigImageFiles, it's now mandatory on load-config.iso.sh (see 'copy_archive_contents' func).

The failure was missed on dev-scripts tests probably due to this issue: https://github.com/openshift-metal3/dev-scripts/pull/1551

https://github.com/openshift/installer/pull/7333

Bug OCPBUGS-19401: [4.14] cgroupv2 memory calculation is not accounted correctly

View the Description View the linked PRs

Description of problem:

https://github.com/kubernetes/kubernetes/issues/118916

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. compare memory usage from v1 and v2 and notice differences with the same workloads
2.
3.

Actual results:

they slightly differ because of accounting differences

Expected results:

they should be largely the same

Additional info:

https://github.com/openshift/kubernetes/pull/1713

Bug OCPBUGS-4009: Cluster operator should report ConsolePlugin as a related resource

View the Description View the linked PRs

Description of problem:

Since the operator watches plugins to enable dynamic plugins, it should list that resource under `status.relatedObjects` in its ClusterOperator.

Additional info:

Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044588

https://github.com/openshift/console-operator/pull/706

Bug OCPBUGS-9070: CVO hotloops on CronJob openshift-operator-lifecycle-manager/collect-profiles

View the Description View the linked PRs

Description of problem:

In a fresh installed cluster, we can see hot-loopings on Service openshift-monitoring/cluster-monitoring-operator.

grep -o 'Updating .*due to diff' cvo2.log | sort | uniq -c
18 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
12 Updating Service openshift-monitoring/cluster-monitoring-operator due to diff

Looking at the CronJob hot-looping

# grep -A60 'Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff' cvo2.log | tail -n61
I0110 06:32:44.489277       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
  	Object: map[string]interface{}{
  		"apiVersion": string("batch/v1"),
  		"kind":       string("CronJob"),
  		"metadata":   map[string]interface{}{"annotations": map[string]interface{}{"include.release.openshift.io/ibm-cloud-managed": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true")}, "creationTimestamp": string("2022-01-10T04:35:19Z"), "generation": int64(1), "managedFields": []interface{}{map[string]interface{}{"apiVersion": string("batch/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{".": map[string]interface{}{}, "f:include.release.openshift.io/ibm-cloud-managed": map[string]interface{}{}, "f:include.release.openshift.io/self-managed-high-availability": map[string]interface{}{}}, "f:ownerReferences": map[string]interface{}{".": map[string]interface{}{}, `k:{"uid":"334d6c04-126d-4271-96ec-d303e93b7d1c"}`: map[string]interface{}{}}}, "f:spec": map[string]interface{}{"f:concurrencyPolicy": map[string]interface{}{}, "f:failedJobsHistoryLimit": map[string]interface{}{}, "f:jobTemplate": map[string]interface{}{"f:spec": map[string]interface{}{"f:template": map[string]interface{}{"f:spec": map[string]interface{}{"f:containers": map[string]interface{}{`k:{"name":"collect-profiles"}`: map[string]interface{}{".": map[string]interface{}{}, "f:args": map[string]interface{}{}, "f:command": map[string]interface{}{}, "f:image": map[string]interface{}{}, ...}}, "f:dnsPolicy": map[string]interface{}{}, "f:priorityClassName": map[string]interface{}{}, "f:restartPolicy": map[string]interface{}{}, ...}}}}, "f:schedule": map[string]interface{}{}, ...}}, "manager": string("cluster-version-operator"), ...}, map[string]interface{}{"apiVersion": string("batch/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:status": map[string]interface{}{"f:lastScheduleTime": map[string]interface{}{}, "f:lastSuccessfulTime": map[string]interface{}{}}}, "manager": string("kube-controller-manager"), ...}}, ...},
  		"spec": map[string]interface{}{
+ 			"concurrencyPolicy":      string("Allow"),
+ 			"failedJobsHistoryLimit": int64(1),
  			"jobTemplate": map[string]interface{}{
+ 				"metadata": map[string]interface{}{"creationTimestamp": nil},
  				"spec": map[string]interface{}{
  					"template": map[string]interface{}{
+ 						"metadata": map[string]interface{}{"creationTimestamp": nil},
  						"spec": map[string]interface{}{
  							"containers": []interface{}{
  								map[string]interface{}{
  									... // 4 identical entries
  									"name":                     string("collect-profiles"),
  									"resources":                map[string]interface{}{"requests": map[string]interface{}{"cpu": string("10m"), "memory": string("80Mi")}},
+ 									"terminationMessagePath":   string("/dev/termination-log"),
+ 									"terminationMessagePolicy": string("File"),
  									"volumeMounts":             []interface{}{map[string]interface{}{"mountPath": string("/etc/config"), "name": string("config-volume")}, map[string]interface{}{"mountPath": string("/var/run/secrets/serving-cert"), "name": string("secret-volume")}},
  								},
  							},
+ 							"dnsPolicy":                     string("ClusterFirst"),
  							"priorityClassName":             string("openshift-user-critical"),
  							"restartPolicy":                 string("Never"),
+ 							"schedulerName":                 string("default-scheduler"),
+ 							"securityContext":               map[string]interface{}{},
+ 							"serviceAccount":                string("collect-profiles"),
  							"serviceAccountName":            string("collect-profiles"),
+ 							"terminationGracePeriodSeconds": int64(30),
  							"volumes": []interface{}{
  								map[string]interface{}{
  									"configMap": map[string]interface{}{
+ 										"defaultMode": int64(420),
  										"name":        string("collect-profiles-config"),
  									},
  									"name": string("config-volume"),
  								},
  								map[string]interface{}{
  									"name": string("secret-volume"),
  									"secret": map[string]interface{}{
+ 										"defaultMode": int64(420),
  										"secretName":  string("pprof-cert"),
  									},
  								},
  							},
  						},
  					},
  				},
  			},
  			"schedule":                   string("*/15 * * * *"),
+ 			"successfulJobsHistoryLimit": int64(3),
+ 			"suspend":                    bool(false),
  		},
  		"status": map[string]interface{}{"lastScheduleTime": string("2022-01-10T06:30:00Z"), "lastSuccessfulTime": string("2022-01-10T06:30:11Z")},
  	},
  }
I0110 06:32:44.499764       1 sync_worker.go:771] Done syncing for cronjob "openshift-operator-lifecycle-manager/collect-profiles" (574 of 765)
I0110 06:32:44.499814       1 sync_worker.go:759] Running sync for deployment "openshift-operator-lifecycle-manager/olm-operator" (575 of 765)

Extract the manifest:

# cat 0000_50_olm_07-collect-profiles.cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
  name: collect-profiles
  namespace: openshift-operator-lifecycle-manager
spec:
  schedule: "*/15 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: collect-profiles
          priorityClassName: openshift-user-critical
          containers:
            - name: collect-profiles
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2a8d116943a7c1eb32cd161a0de5cb173713724ff419a03abe0382a2d5d9c9a7
              imagePullPolicy: IfNotPresent
              command:
                - bin/collect-profiles
              args:
                - -n
                - openshift-operator-lifecycle-manager
                - --config-mount-path
                - /etc/config
                - --cert-mount-path
                - /var/run/secrets/serving-cert
                - olm-operator-heap-:https://olm-operator-metrics:8443/debug/pprof/heap
                - catalog-operator-heap-:https://catalog-operator-metrics:8443/debug/pprof/heap
              volumeMounts:
                - mountPath: /etc/config
                  name: config-volume
                - mountPath: /var/run/secrets/serving-cert
                  name: secret-volume
              resources:
                requests:
                  cpu: 10m
                  memory: 80Mi
          volumes:
            - name: config-volume
              configMap:
                name: collect-profiles-config
            - name: secret-volume
              secret:
                secretName: pprof-cert
          restartPolicy: Never

Looking at the in-cluster object:

# oc get cronjob.batch/collect-profiles -oyaml -n openshift-operator-lifecycle-manager
apiVersion: batch/v1
kind: CronJob
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
  creationTimestamp: "2022-01-10T04:35:19Z"
  generation: 1
  name: collect-profiles
  namespace: openshift-operator-lifecycle-manager
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 334d6c04-126d-4271-96ec-d303e93b7d1c
  resourceVersion: "450801"
  uid: d0b92cd3-3213-466c-921c-d4c4c77f7a6b
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
          - args:
            - -n
            - openshift-operator-lifecycle-manager
            - --config-mount-path
            - /etc/config
            - --cert-mount-path
            - /var/run/secrets/serving-cert
            - olm-operator-heap-:https://olm-operator-metrics:8443/debug/pprof/heap
            - catalog-operator-heap-:https://catalog-operator-metrics:8443/debug/pprof/heap
            command:
            - bin/collect-profiles
            image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2a8d116943a7c1eb32cd161a0de5cb173713724ff419a03abe0382a2d5d9c9a7
            imagePullPolicy: IfNotPresent
            name: collect-profiles
            resources:
              requests:
                cpu: 10m
                memory: 80Mi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /etc/config
              name: config-volume
            - mountPath: /var/run/secrets/serving-cert
              name: secret-volume
          dnsPolicy: ClusterFirst
          priorityClassName: openshift-user-critical
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          serviceAccount: collect-profiles
          serviceAccountName: collect-profiles
          terminationGracePeriodSeconds: 30
          volumes:
          - configMap:
              defaultMode: 420
              name: collect-profiles-config
            name: config-volume
          - name: secret-volume
            secret:
              defaultMode: 420
              secretName: pprof-cert
  schedule: '*/15 * * * *'
  successfulJobsHistoryLimit: 3
  suspend: false
status:
  lastScheduleTime: "2022-01-11T03:00:00Z"
  lastSuccessfulTime: "2022-01-11T03:00:07Z"

Version-Release number of the following components:
4.10.0-0.nightly-2022-01-09-195852

How reproducible:
1/1

Steps to Reproduce:
1.Install a 4.10 cluster
2. Grep 'Updating .*due to diff' in the cvo log to check hot-loopings
3.

Actual results:
CVO hotloops on CronJob openshift-operator-lifecycle-manager/collect-profiles

Expected results:
CVO should not hotloop on it in a fresh installed cluster

Additional info:
attachment 1850058 CVO log file

https://github.com/openshift/cluster-version-operator/pull/910

Bug OCPBUGS-10031: Metal virtual media job permafails during early bootstrap

View the Description View the linked PRs

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-metal-ipi-sdn-virtualmedia

Reproduced locally, the failure is:

level=error msg=Attempted to gather debug logs after installation failure: must provide bootstrap host address                                                                               
level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected                
level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected                
level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected                                   
level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected                                   
level=error msg=Cluster operator network Degraded is True with ApplyOperatorConfig: Error while updating operator configuration: could not apply (rbac.authorization.k8s.io/v1, Kind=RoleBindi
ng) openshift-config-managed/openshift-network-public-role-binding: failed to apply / update (rbac.authorization.k8s.io/v1, Kind=RoleBinding) openshift-config-managed/openshift-network-publi
c-role-binding: Patch "https://api-int.ostest.test.metalkube.org:6443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-config-managed/rolebindings/openshift-network-public-role-binding
?fieldManager=cluster-network-operator%2Foperconfig&force=true": dial tcp 192.168.111.5:6443: connect: connection refused

https://github.com/openshift/cluster-network-operator/pull/1744

Bug OCPBUGS-12972: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2628

Bug OCPBUGS-1341: Node churn leaks PodNetworkConnectivityChecks

View the Description View the linked PRs

I haven't gone back to pin down all affected versions, but I wouldn't be surprised if we've had this exposure for a while. On a 4.12.0-ec.2 cluster, we have:

cluster:usage:resources:sum{resource="podnetworkconnectivitychecks.controlplane.operator.openshift.io"}

currently clocking in around 67983. I've gathered a dump with:

$ oc --as system:admin -n openshift-network-diagnostics get podnetworkconnectivitychecks.controlplane.operator.openshift.io | gzip >checks.gz

And many, many of these reference nodes which no longer exist (the cluster is aggressively autoscaled, with nodes coming and going all the time). We should fix garbage collection on this resource, to avoid consuming excessive amounts of memory in the Kube API server and etcd as they attempt to list the large resource set.

https://github.com/openshift/cluster-network-operator/pull/1649

Bug OCPBUGS-2177: SNO node is not marked as degraded when pool selection is failed

View the Description View the linked PRs

Description of problem:

machine config pool selection will be failed when single node has master+custom roles, controller logged the error but node is not marked as degraded, end user does not know this error. no config can be applied on the node

Version-Release number of selected component (if applicable):

4.12. 4.11.z

Steps to Reproduce:

1. setup SNO cluster
2. create custom mcp
3. add custom mcp label on the node
4. check mcc pod log to see the error message about pool selection 
5. create mc to apply config

Actual results:

node state is good, the single node cannot be assigned to any mcp

Expected results:

node can be marked as degraded with error message

Additional info:

https://github.com/openshift/machine-config-operator/pull/3505

Bug OCPBUGS-10765: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-12581: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/17

Bug OCPBUGS-17073: [4.14][Azure][MAG] install MAG cluster failed by ‘Error ensuring Resource Providers are registered’

View the Description View the linked PRs

Description of problem:

Azure MAG install failed by Terraform error ‘Error ensuring Resource Providers are registered’

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-27-172239

How reproducible:

Always

Steps to Reproduce:

1. Create MAG Azure cluster with IPI

Actual results:

Fail to create the installer when ‘Creating infrastructure resources…’

In terraform.log: 
2023-07-29T11:33:02.938Z [ERROR] provider.terraform-provider-azurerm: Response contains error diagnostic: @module=sdk.proto tf_proto_version=5.3 tf_provider_addr=provider tf_req_id=45c10824-360b-b211-1ba1-9c3a722014af @caller=/go/src/github.com/openshift/installer/terraform/providers/azurerm/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:55 diagnostic_detail= diagnostic_severity=ERROR diagnostic_summary="Error ensuring Resource Providers are registered.Terraform automatically attempts to register the Resource Providers it supports to
ensure it's able to provision resources.If you don't have permission to register Resource Providers you may wish to use the
"skip_provider_registration" flag in the Provider block to disable this functionality.Please note that if you opt out of Resource Provider Registration and Terraform tries
to provision a resource from a Resource Provider which is unregistered, then the errors
may appear misleading - for example:> API version 2019-XX-XX was not found for Microsoft.FooCould indicate either that the Resource Provider "Microsoft.Foo" requires registration,
but this could also indicate that this Azure Region doesn't support this API version.More information on the "skip_provider_registration" flag can be found here:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#skip_provider_registrationOriginal Error: determining which Required Resource Providers require registration: the required Resource Provider "Microsoft.CustomProviders" wasn't returned from the Azure API" tf_rpc=Configure timestamp=2023-07-29T11:33:02.937Z
2023-07-29T11:33:02.938Z [ERROR] vertex "provider[\"openshift/local/azurerm\"]" error: Error ensuring Resource Providers are registered.Terraform automatically attempts to register the Resource Providers it supports to
ensure it's able to provision resources.If you don't have permission to register Resource Providers you may wish to use the
"skip_provider_registration" flag in the Provider block to disable this functionality.Please note that if you opt out of Resource Provider Registration and Terraform tries
to provision a resource from a Resource Provider which is unregistered, then the errors
may appear misleading - for example:> API version 2019-XX-XX was not found for Microsoft.FooCould indicate either that the Resource Provider "Microsoft.Foo" requires registration,
but this could also indicate that this Azure Region doesn't support this API version.More information on the "skip_provider_registration" flag can be found here:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#skip_provider_registrationOriginal Error: determining which Required Resource Providers require registration: the required Resource Provider "Microsoft.CustomProviders" wasn't returned from the Azure API

Expected results:

Create the installer should succeed.

Additional info:

Suspect that issue with https://github.com/openshift/installer/pull/7205/, IPI install on Azure MAG with 4.14.0-0.nightly-2023-07-27-051258 is OK

https://github.com/openshift/installer/pull/7412

Bug OCPBUGS-11702: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3441

Bug OCPBUGS-13081: Assisted Root device hints should accept by-path device alias

View the Description View the linked PRs

In many cases, the /dev/disk/by-path symlink is the only way to stably identify a disk without having prior knowledge of the hardware from some external source (e.g. a spreadsheet of disk serial numbers). It should be possible to specify this path in the root device hints.
Metal³ is planning to allow these paths in the `name` hint (see ~~OCPBUGS-13080~~), and assisted's implementation of root device hints (which is used in ZTP and the agent-based installer) should be changed to match.

https://github.com/openshift/assisted-service/pull/5185

Bug OCPBUGS-17424: console-operator should not panic when filtering tombstone informer events

View the Description View the linked PRs

Description of problem:

console-operator may panic when IncludeNamesFilter receives an object from a shared informer event of type cache.DeletedFinalStateUnknown.

Example job with panic: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960

Specific log that shows the full stack trace: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960/artifacts/e2e-aws-sdn-serial/gather-extra/artifacts/pods/openshift-console-operator_console-operator-748d7c6cdd-vwxmx_console-operator.log

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug MGMT-12967: Clusters installed with the Assisted Installer fail Compliance Operator scans

View the Description View the linked PRs

Description of the problem:
Assisted installer namespace `assisted-installer` is not compliant with the `ocp4-cis-configure-network-policies-namespaces` Compliance Operator scan.

How reproducible:
Everytime

Steps to reproduce:

1. Install a cluster with Assisted Intaller
2. Confirm the `assisted-installer` Namespace is present and not removed
3. Install the Red Hat Compliance Operator
4. Run a compliance scan using the `ocp4-cis`

Actual results:
Cluster fails the scan with the following warning
```
Ensure that application Namespaces have Network Policies defined high
fail
```

Expected results:
Cluster does not fail the scan

https://github.com/openshift/assisted-installer/pull/658

Bug OCPBUGS-12716: Update 4.14 ose-powervs-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-powervs-block-csi-driver/pull/28

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/30

Bug OCPBUGS-13147: CMO missing staticcheck and linting

View the Description View the linked PRs

Description of problem:

Cluster Monitoring Operator (CMO) lacks golangci-lint checking and has several violations for linters. The ones we'd be specifically interested into are the staticcheck ones as they are tied to deprecated libraries in go.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1949

Bug OCPBUGS-14585: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-agent-image/pull/87

Bug OCPBUGS-15458: Links for console-dynamic-plugin-sdk markdown docs are not working

View the Description View the linked PRs

Description of problem:

Links for both markdown documents in console-dynamic-plugin-sdk/docs are not working.
Check https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Clicking on a link in any markdown doc is not taking user to the appropriate section.

Expected results:

Clicking on a link in any markdown doc should take user to the appropriate section.

Additional info:

Bug OCPBUGS-10816: Volume unmount repeats after successful unmount, preventing pod delete

View the Description View the linked PRs

Description of problem:

We have observed a situation where:
- A workload mounting multiple EBS volumes gets stuck in a Terminating state when it finishes.
- The node that the workload ran on eventually gets stuck draining, because it gets stuck on unmounting one of the volumes from that workload, despite no containers from the workload now running on the node.

What we observe via the node logs is that the volume seems to unmount successfully. Then it attempts to unmount a second time, unsuccessfully. This unmount attempt then repeats and holds up the node.

Specific examples from the node's logs to illustrate this will be included in a private comment.

Version-Release number of selected component (if applicable):

4.11.5

How reproducible:

Has occurred on four separate nodes on one specific cluster, but the mechanism to reproduce it is not known.

Steps to Reproduce:

1.
2.
3.

Actual results:

A volume gets stuck unmounting, holding up removal of the node and completed deletion of the pod.

Expected results:

The volume should not get stuck unmounting.

Additional info:

https://github.com/openshift/aws-ebs-csi-driver/pull/224

Bug OCPBUGS-12545: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/50

Bug OCPBUGS-11215: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/27834

Bug OCPBUGS-13810: CI fails on TestAWSELBConnectionIdleTimeout

View the Description View the linked PRs

Description of problem

CI is flaky because the TestAWSELBConnectionIdleTimeout test fails. Example failures:

Version-Release number of selected component (if applicable)

I have seen these failures in 4.14 and 4.13 CI jobs.

How reproducible

Presently, search.ci reports the following stats for the past 14 days:

Found in 1.24% of runs (3.52% of failures) across 404 total runs and 34 jobs (35.15% failed)

This includes two jobs:

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 40 runs, 63% failed, 16% of failures match = 10% impact
pull-ci-openshift-cluster-ingress-operator-release-4.13-e2e-aws-operator (all) - 10 runs, 70% failed, 14% of failures match = 10% impact

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check https://search.ci.openshift.org/?search=FAIL%3A+TestAll%2Fparallel%2FTestAWSELBConnectionIdleTimeout&maxAge=336h&context=1&type=all&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job.

Actual results

The test fails because it times out waiting for DNS to resolve:

=== RUN   TestAll/parallel/TestAWSELBConnectionIdleTimeout
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2656: failed to observe expected condition: timed out waiting for the condition
    panic.go:522: deleted ingresscontroller test-idle-timeout

The above output comes from build-log.txt from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/917/pull-ci-openshift-cluster-ingress-operator-release-4.13-e2e-aws-operator/1658840125502656512.

Expected results

CI passes, or it fails on a different test.

https://github.com/openshift/cluster-ingress-operator/pull/944

Bug OCPBUGS-14784: Hypershift operator should honor 'hostedcluster.spec.configuration.ingress.loadBalancer.platform.aws.type'

View the Description View the linked PRs

Description of problem:

'hostedcluster.spec.configuration.ingress.loadBalancer.platform.aws.type' is ignored

Version-Release number of selected component (if applicable):

How reproducible:

set field to 'NLB'

Steps to Reproduce:

1. set the field to 'NLB'
2.
3.

Actual results:

a classic load balancer is created

Expected results:

Should create a Network load balancer

Additional info:

https://github.com/openshift/hypershift/pull/2669

Task MGMT-14462: Allow to deploy assisted-service with all available images

View the Description View the linked PRs

Since the change we did on https://github.com/openshift/assisted-test-infra/pull/1989, whenever deploying assisted installer services using "make run" or "make deploy_assisted_service" we are deploying with only single image - the default one (e.g. OPENSHIFT_VERSION=4.13).

https://github.com/openshift/assisted-service/pull/5167

Bug OCPBUGS-11187: EgressIP was NOT migrated to correct workers after deleting machine it was assigned in GCP XPN cluster.

View the Description View the linked PRs

Description of problem:

EgressIP was NOT migrated to correct workers after deleting machine it was assigned in GCP XPN cluster.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-29-235439

How reproducible:

Always

Steps to Reproduce:

1. Set up GCP XPN cluster.
2. Scale two new worker nodes
% oc scale --replicas=2 machineset huirwang-0331a-m4mws-worker-c -n openshift-machine-api        
machineset.machine.openshift.io/huirwang-0331a-m4mws-worker-c scaled

3. Wait the two new workers node ready.
 % oc get machineset -n openshift-machine-api
NAME                            DESIRED   CURRENT   READY   AVAILABLE   AGE
huirwang-0331a-m4mws-worker-a   1         1         1       1           86m
huirwang-0331a-m4mws-worker-b   1         1         1       1           86m
huirwang-0331a-m4mws-worker-c   2         2         2       2           86m
huirwang-0331a-m4mws-worker-f   0         0                             86m
% oc get nodes
NAME                                                          STATUS   ROLES                  AGE     VERSION
huirwang-0331a-m4mws-master-0.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
huirwang-0331a-m4mws-master-1.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
huirwang-0331a-m4mws-master-2.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-a-hfqsn.c.openshift-qe.internal   Ready    worker                 71m     v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-b-vbqf2.c.openshift-qe.internal   Ready    worker                 71m     v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   Ready    worker                 8m22s   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal   Ready    worker                 8m22s   v1.26.2+dc93b13
3. Label one new worker node as egress node
 % oc label node huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal k8s.ovn.org/egress-assignable="" 
node/huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal labeled

4. Create egressIP object
oc get egressIP
NAME         EGRESSIPS     ASSIGNED NODE                                                 ASSIGNED EGRESSIPS
egressip-1   10.0.32.100   huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   10.0.32.100
5. Label second new worker node as egress node 
% oc label node huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal k8s.ovn.org/egress-assignable="" 
node/huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal labeled
6. Delete the assigned egress node
% oc delete machines.machine.openshift.io huirwang-0331a-m4mws-worker-c-rhbkr  -n openshift-machine-api
machine.machine.openshift.io "huirwang-0331a-m4mws-worker-c-rhbkr" deleted
 % oc get nodes
NAME                                                          STATUS   ROLES                  AGE   VERSION
huirwang-0331a-m4mws-master-0.c.openshift-qe.internal         Ready    control-plane,master   87m   v1.26.2+dc93b13
huirwang-0331a-m4mws-master-1.c.openshift-qe.internal         Ready    control-plane,master   86m   v1.26.2+dc93b13
huirwang-0331a-m4mws-master-2.c.openshift-qe.internal         Ready    control-plane,master   87m   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-a-hfqsn.c.openshift-qe.internal   Ready    worker                 76m   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-b-vbqf2.c.openshift-qe.internal   Ready    worker                 76m   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal   Ready    worker                 13m   v1.26.2+dc93b13
29468 W0331 02:48:34.917391       1 egressip_healthcheck.go:162] Could not connect to huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal (10.129.4.2:9107): context       deadline exceeded
29469 W0331 02:48:34.917417       1 default_network_controller.go:903] Node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal is not ready, deleting it from egre      ss assignment
29470 I0331 02:48:34.917590       1 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:Logical_Switch_Port Row:map[o      ptions:{GoMap:map[router-port:rtoe-GR_huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column       _uuid == {6efd3c58-9458-44a2-a43b-e70e669efa72}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
29471 E0331 02:48:34.920766       1 egressip.go:993] Allocator error: EgressIP: egressip-1 assigned to node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal whi      ch is not reachable, will attempt rebalancing
29472 E0331 02:48:34.920789       1 egressip.go:997] Allocator error: EgressIP: egressip-1 assigned to node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal whi      ch is not ready, will attempt rebalancing
29473 I0331 02:48:34.920808       1 egressip.go:1212] Deleting pod egress IP status: {huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal 10.0.32.100} for EgressIP:       egressip-1

Actual results:

The egressIP was not migrated to correct worker
 oc get egressIP      
NAME         EGRESSIPS     ASSIGNED NODE                                                 ASSIGNED EGRESSIPS
egressip-1   10.0.32.100   huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   10.0.32.100

Expected results:

The egressIP should migrated to correct worker from deleted node.

Additional info:

https://github.com/openshift/cloud-network-config-controller/pull/103

Bug OCPBUGS-12904: Add missing dependencies to openstack-installer CI image

View the Description View the linked PRs

Description of problem:

In order to test proxy installations, the CI base image for OpenShift on OpenStack needs netcat.

https://github.com/openshift/installer/pull/7142

Bug OCPBUGS-14026: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/758

Bug OCPBUGS-12767: Installation failed with setting: featureSet: LatencySensitive or featureSet: CustomNoUpgrade

View the Description View the linked PRs

Description of problem:

Installation failed when setting featureSet: LatencySensitive or featureSet: CustomNoUpgrade.
When setting featureSet: CustomNoUpgrade in install-config and create cluster.See below error info:
[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         github.com/spf13/cobra@v1.6.0/command.go:968
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: k8s.io/component-base/cli.run(0xc00025c300)
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         k8s.io/component-base@v0.26.1/cli/run.go:146 +0x317
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: k8s.io/component-base/cli.Run(0x2ce59e8?)
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         k8s.io/component-base@v0.26.1/cli/run.go:46 +0x1d
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: main.main()
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         github.com/openshift/cluster-kube-controller-manager-operator/cmd/cluster-kube-controller-manager-operator/main.go:24 +0x2c
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Failed with result 'exit-code'.
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Consumed 1.935s CPU time.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 343.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Consumed 1.935s CPU time.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: Started Bootstrap a Kubernetes cluster.
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670489]: Rendering Kubernetes Controller Manager core manifests...
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: panic: interface conversion: interface {} is nil, not []interface {}
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: goroutine 1 [running]:
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/targetconfigcontroller.GetKubeControllerManagerArgs(0xc000746100?)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/targetconfigcontroller/targetconfigcontroller.go:696 +0x379
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.(*renderOpts).Run(0xc0008d22c0)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:269 +0x85c
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.NewRenderCommand.func1.1(0x0?)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:48 +0x32
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.NewRenderCommand.func1(0xc000bee600?, {0x285dffa?, 0x8?, 0x8?})
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:58 +0xc8
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).execute(0xc000bee600, {0xc00071cb00, 0x8, 0x8})
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/spf13/cobra@v1.6.0/command.go:920 +0x847
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000bee000)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).Execute(...)


When setting featureSet: LatencySensitive in install-config and create cluster.See below error info:
[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: Failed to create "cluster-infrastructure-02-config.yml" infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: [#1105] failed to create some manifests:
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: Failed to create "cluster-infrastructure-02-config.yml" infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource

Version-Release number of selected component (if applicable):

OCP version: 4.13.0-0.nightly-2023-04-21-084440

How reproducible:

always

Steps to Reproduce:

1.Create install-config.yaml like below(LatencySensitive)
  apiVersion: v1
  controlPlane:
    architecture: amd64
    hyperthreading: Enabled
    name: master
    replicas: 3
  compute:
  - architecture: amd64
    hyperthreading: Enabled
    name: worker
   replicas: 2
  metadata:
    name: wwei-426h
  platform:
   none: {}
  pullSecret: xxxxx
  featureSet: LatencySensitive
  networking:
    clusterNetwork:
    - cidr: xxxxx
      hostPrefix: 23
    serviceNetwork:
    - xxxxx
    networkType: OpenShiftSDN
  publish: External
  baseDomain: xxxxxx
  sshKey: xxxxxxx

2.Then continue to install the cluster:
openshift-install create cluster --dir <install_folder> --log-level debug

3.Create install-config.yaml like below(CustomNoUpgrade):
  apiVersion: v1
  controlPlane:
    architecture: amd64
    hyperthreading: Enabled
    name: master
    replicas: 3
  compute:
  - architecture: amd64
    hyperthreading: Enabled
    name: worker
   replicas: 2
  metadata:
    name: wwei-426h
  platform:
   none: {}
  pullSecret: xxxxx
  featureSet: CustomNoUpgrade
  networking:
    clusterNetwork:
    - cidr: xxxxx
      hostPrefix: 23
    serviceNetwork:
    - xxxxx
    networkType: OpenShiftSDN
  publish: External
  baseDomain: xxxxxx
  sshKey: xxxxxxx

4.Then continue to install the cluster:
openshift-install create cluster --dir <install_folder> --log-level debug

Actual results:

Installation failed.

Expected results:

Installation succeeded.

Additional info:

log-bundle can get from below link : https://drive.google.com/drive/folders/1kg1EeYR6ApWXbeRZTiM4DV205nwMfSQv?usp=sharing

https://github.com/openshift/cluster-config-operator/pull/320

Bug MGMT-13520: Irrelevant validations are shown for unbound agents

View the Description View the linked PRs

Description of the problem:

Some validations are only related to agents that are bound to clusters. We had a case where an agent couldn't be bound due to failing validations, and the irrelevant validations added unnecessary noise. I attached the relevant agent CR to the ticket. You can see in the Conditions:

  - lastTransitionTime: "2023-01-26T21:00:29Z"
    message: 'The agent''s validations are failing: Validation pending - no cluster,Host
      couldn''t synchronize with any NTP server,Missing inventory, or missing cluster'
    reason: ValidationsFailing
    status: "False"
    type: Validated

The only relevant validation is that there is no NTP server. "no cluster" and "Missing inventory, or missing cluster" are misleading.

How reproducible:

100%

Steps to reproduce:

1. Boot an unbound agent

2. Look at the CR

Actual results:

All validations are shown in the CR

Expected results:

Only relevant validations are shown in the CR

https://github.com/openshift/assisted-service/pull/5023

Bug OCPBUGS-11944: oauth test failures related to CertificationVerificationError (go1.20)

View the Description View the linked PRs

Description of problem:

Most recent nightly https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-04-18-152947 has a lot of OAuth test failures

Example runs:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn/1648348911074545664

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm/1648348885556400128

Error looks like:

fail [github.com/openshift/origin/test/extended/oauth/expiration.go:105]: Unexpected error:
    <*tls.CertificateVerificationError | 0xc0023b6330>: {
        UnverifiedCertificates: [
            {...


Looking at changes in the last day or so, nothing sticks out to me.

Although I believed ART bumped everything to be built with go1.20 and this error is new to go1.20:

"For a handshake failure due to a certificate verification failure, the TLS client and server now return an error of the new type CertificateVerificationError, which includes the presented certificates." - https://go.dev/doc/go1.20

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-18-152947

How reproducible:

Looks repeatable

Steps to Reproduce:

1. Build oauth, origin, and related containers with go1.20 (not clear which is causing the test failure)
2.
3.

Actual results:

Tests fail

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27883

Bug OCPBUGS-14575: HyperShift Should Not Check for ImageDigestMirrorSet in versions below OCP 4.13

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/hypershift/pull/2437 added the ability to override image registries with CR ImageDigestMirrorSet; however, ImageDigestMirrorSet is only valid for 4.13+.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Install HO on Mgmt Cluster 4.12

Steps to Reproduce:

1.
2.
3.

Actual results:

failed to populate image registry overrides: no matches for kind "ImageDigestMirrorSet" in version "config.openshift.io/v1"

Expected results:

No errors and HyperShift doesn't try to use ImageDigestMirrorSet prior to 4.13.

Additional info:

https://github.com/openshift/hypershift/pull/2650

Bug OCPBUGS-10577: Use flowcontrol/v1beta3 for apf manifests in 4.14

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-11617: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/358

Bug OCPBUGS-11636: AWS s3 policy changes block all OCP installs on AWS

View the Description View the linked PRs

Description of problem:

The ACLs are disabled for all newly created s3 buckets, this causes all OCP installs to fail: the bootstrap ignition can not be uploaded:

level=info msg=Creating infrastructure resources...
level=error
level=error msg=Error: error creating S3 bucket ACL for yunjiang-acl413-4dnhx-bootstrap: AccessControlListNotSupported: The bucket does not allow ACLs
level=error msg=	status code: 400, request id: HTB2HSH6XDG0Q3ZA, host id: V6CrEgbc6eyfJkUbLXLxuK4/0IC5hWCVKEc1RVonSbGpKAP1RWB8gcl5dfyKjbrLctVlY5MG2E4=
level=error
level=error msg=  with aws_s3_bucket_acl.ignition,
level=error msg=  on main.tf line 62, in resource "aws_s3_bucket_acl" "ignition":
level=error msg=  62: resource "aws_s3_bucket_acl" ignition {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: error creating S3 bucket ACL for yunjiang-acl413-4dnhx-bootstrap: AccessControlListNotSupported: The bucket does not allow ACLs
level=error msg=	status code: 400, request id: HTB2HSH6XDG0Q3ZA, host id: V6CrEgbc6eyfJkUbLXLxuK4/0IC5hWCVKEc1RVonSbGpKAP1RWB8gcl5dfyKjbrLctVlY5MG2E4=
level=error
level=error msg=  with aws_s3_bucket_acl.ignition,
level=error msg=  on main.tf line 62, in resource "aws_s3_bucket_acl" "ignition":
level=error msg=  62: resource "aws_s3_bucket_acl" ignition {

Version-Release number of selected component (if applicable):

4.11+

How reproducible:

Always

Steps to Reproduce:

1.Create a cluster via IPI

Actual results:

install fail

Expected results:

install succeed

Additional info:

Heads-Up: Amazon S3 Security Changes Are Coming in April of 2023 - https://aws.amazon.com/blogs/aws/heads-up-amazon-s3-security-changes-are-coming-in-april-of-2023/

https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-ownership-error-responses.html - After you apply the bucket owner enforced setting for Object Ownership, ACLs are disabled.

https://github.com/openshift/installer/pull/7081

Story CFE-859: Improve user experience of oc-mirror by clarifying the flags used for oci local catalogs

View the Description View the linked PRs

As IBM, I would like to replace flag --use-oci-feature flag with --include-oci-local-catalogs

--use-oci-feature is implying to users that this might be about using oci format for images rather than docker-v2, and this might be hard to understand and generate questions, bugs, and new misunderstood requests.
For clarity, and before this feature goes GA, this flag will be replaced by --include-local-oci-catalog in 4.14. The --use-oci-feature will be marked deprecated in 4.13, and completely removed in 4.14

As an oc-mirror user I want a well documented and intuitive process
so that I can effectively and efficiently deliver image artifacts in both connected and disconnected installs with no impact on my current workflow

Glossary:

OCI-FBC operator catalog: catalog image in oci format saved to disk, referenced with oci://path-to-image
registry based operator catalog: catalog image hosted on a container registry.

References:

https://docs.google.com/document/d/10hLyV0DZP-3uxJGOYzvlE-ZyJXcwD95yxxX4eKN0uOA/edit

Acceptance criteria:

No regression on oc-mirror use cases that are not using OCI-FBC feature
mirrorToMirror use case with oci feature flag should be successful when all operator catalogs in ImageSetConfig are OCI-FBC:
- oc-mirror -c config.yaml docker://remote-registry --use-oci-feature succeeds
- All release images, helm charts, additional images are mirrored to the remote-registry in an incremental manner (only new images are mirrored based on contents of the storageConfig)
- All catalogs OCI-FBC, selected bundles and their related images are mirrored to the remote-registry and corresponding catalogSource and ImageSourceContentPolicy generated
- All registry based catalogs, selected bundles and their related images are mirrored to the remote-registry and corresponding catalogSource and ImageSourceContentPolicy generated
mirrorToDisk use case with the oci feature flag is forbidden. The following command should fail:
- oc-mirror --from=seq_xx_tar docker://remote-registry --use-oci-feature
diskToMirror use case with oci feature flag is forbidden. The following command should fail:
- oc-mirror --config=isc.yaml file://file-dir --use-oci-feature

https://github.com/openshift/oc-mirror/pull/622

Bug OCPBUGS-16009: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1499

Bug OCPBUGS-16219: NetworkManager fail to read static network configuration

View the Description View the linked PRs

Description of problem:

When using agent based installer to provision OCP on baremetal, some of the machine fail to use the static nmconnection files, and got ip address via DHCP.
This may cause the network vaildaiton fails.

Version-Release number of selected component (if applicable):

4.13.3

How reproducible:

100%

Steps to Reproduce:

1. Generate agent iso
2. Mount it to BMC and reboot from live cd
3. Use openshift-install agent wait for to monitor the progress

Actual results:

network vaildation fails due to overlay ip address

Expected results:

vaildation success

Additional info:

https://github.com/openshift/installer/pull/7355

Bug OCPBUGS-10655: Developer catalog shows ImageStreams as samples which has no sampleRepo

View the Description View the linked PRs

Description of problem:
The dev console shows a list of samples. The user can create a sample based on a git repository. But some of these samples doesn't include a git repository reference and could not be created.

Version-Release number of selected component (if applicable):
Tested different frontend versions against a 4.11 cluster and all (oldest tested frontend was 4.8) show the sample without git repository.

But the result also depends on the installed samples operator and installed ImageStreams.

How reproducible:
Always

Steps to Reproduce:

Switch to the Developer perspective
Navigate to Add > All Samples
Search for Jboss
Click on "JBoss EAP XP 4.0 with OpenJDK 11" (for example)

Actual results:
The git repository is not filled and the create button is disabled.

Expected results:
Samples without git repositories should not be displayed in the list.

Additional info:
The Git repository is saved as "sampleRepo" in the ImageStream tag section.

https://github.com/openshift/console/pull/12667

Bug OCPBUGS-15594: Invalid Arch Chosen for cluster-config-operator

View the Description View the linked PRs

Description of problem:

Arm HCP's are currently broken. The following error message was observed in the ignition-server pod:

{"level":"error","ts":"2023-06-29T13:38:19Z","msg":"Reconciler error","controller":"secret","controllerGroup":"","controllerKind":"Secret","secret":{"name":"token-brcox-hypershift-arm-us-east-1a-dbe0ce2a","namespace":"clusters-brcox-hypershift-arm"},"namespace":"clusters-brcox-hypershift-arm","name":"token-brcox-hypershift-arm-us-east-1a-dbe0ce2a","reconcileID":"ff813140-d10a-464e-a1b0-c05859b64ef9","error":"error getting ignition payload: failed to execute cluster-config-operator: cluster-config-operator process failed: /bin/bash: line 21: /payloads/get-payload1590526115/bin/cluster-config-operator: cannot execute binary file: Exec format error\n: exit status 126","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal...

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create an Arm Mgmt Cluster
2. Create an Arm HCP

Actual results:

Error message in ignition-server pod and failure to generate appropriate payload.

Expected results:

ignition-server picks the appropriate arch based on the mgmt cluster.

Additional info:

https://github.com/openshift/hypershift/pull/2753

Story TRT-1097: Avoid failing on KubeMemoryOvercommit for single node

View the Description View the linked PRs

Testgrid for single-node-workers-upgrade-conformance shows that tests are failing due to the 'KubeMemoryOvercommit' alert.

We should avoid failing on this alert for single node environments assuming it's ok to overcommit memory on single node Openshift clusters.

Ref: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1687375398906129

https://github.com/openshift/origin/pull/28002

Bug OCPBUGS-10478: [Azure] fail to collect the vm serial log with ‘gather bootstrap’

View the Description View the linked PRs

Description of problem:

Fail to collect the vm serial log with ‘openshift-install gather bootstrap’

Version-Release number of selected component (if applicable):

 4.13.0-0.nightly-2023-03-14-053612

How reproducible:

Always

Steps to Reproduce:

1.IPI install a private cluster, Once bootstrap node boot up, before it is terminated,
2. ssh to the bastion, then try to get bootstrap log 
$openshift-install gather bootstrap --key openshift-qe.pem --bootstrap 10.0.0.5 --master 10.0.0.7 –loglevel debug
3.

Actual results:

Fail to get the vm serial logs, in the output:
…
DEBUG Gather remote logs                           
DEBUG Collecting info from 10.0.0.6                
DEBUG scp: ./installer-masters-gather.sh: Permission denied 
 EBUG Warning: Permanently added '10.0.0.6' (ECDSA) to the list of known hosts.…DEBUG Waiting for logs ...                         
DEBUG Log bundle written to /var/home/core/log-bundle-20230317033401.tar.gz 
WARNING Unable to stat /var/home/core/serial-log-bundle-20230317033401.tar.gz, skipping 
INFO Bootstrap gather logs captured here "/var/home/core/log-bundle-20230317033401.tar.gz"

Expected results:

Get the vm serial log and in the log has not the above “WARNING  Unable to stat…”

Additional info:

IPI install on local install, has the same issue.
INFO Pulling VM console logs                     
DEBUG attemping to download                       
…                       
INFO Failed to gather VM console logs: unable to download file: /root/temp/4.13.0-0.nightly-2023-03-14-053612/ipi/serial-log-bundle-20230317042338

https://github.com/openshift/installer/pull/6992

Bug OCPBUGS-14255: [4.14] Add Controller health to CEO liveness probe

View the Description View the linked PRs

We've had several forum cases and bugs already where a restart of the CEO was fixing issues that could be resolved automatically by a liveness probe.

We previously traced it down to stuck/deadlocked controllers, missing timeouts in grpc calls and other issues we haven't been able to find yet. Since the list of failures that can happen is pretty large, we should add a liveness probe to the CEO that will periodically health check:

all controllers have been running sync at least once in the last 5/10 minutes
on failure, produce a goroutine dump to analyse what went wrong

This check should not indicate whether the etcd cluster itself is healthy, it's purely for the CEO itself.

https://github.com/openshift/cluster-etcd-operator/pull/1049

Bug OCPBUGS-10411: Edit deployment don't enable save button if image stream is added

View the Description View the linked PRs

Description of problem:

While creating the deployment, if image stream is added, then while edit-deployment save button will not be enabled until imagestream tag is changed. 

On click of Reload button Save button will be automatically enabled.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Search Deployment under resources
2. create deployment with Image stream
3. edit deployment

Actual results:

On click of edit deployment the save button is disabled on change of any values

Expected results:

On click of edit deployment the save button should be enabled on change of any value

Video Link - https://drive.google.com/file/d/1luqcjQS5Azc0XRjpMNfKKqbXYSc17Rxc/view?usp=share_link

https://github.com/openshift/console/pull/12673

Bug OCPBUGS-17864: Web console slowness on Project>Project access page

View the Description View the linked PRs

Description of problem:

Cluster recently upgraded to OCP 4.12.19 experiencing serious slowness issues with Project>Project access page.
The loading time of that page grows significantly faster than the number of entries, and is very noticeable even at a relatively low number of entries.

Version-Release number of selected component (if applicable):

4.12.19

How reproducible:

Easily

Steps to Reproduce:

1. Create a namespace, and add RoleBindings for multiple users, for instance with :
$ oc -n test-namespace create rolebinding test-load --clusterrole=view --user=user01 --user=user02 --user=...
2. In Developer view of that namespace, navigate to "Project"->"Project access". The page will take a long time to load compared to the time an "oc get rolebinding" would take.

Actual results:

0 RB => instantaneous loading
40 RB => about 10 seconds until page loaded
100 RB => one try took 50 seconds, another 110 seconds
200 RB => nothing for 8 minutes, after which my web browser (Firefox) proposed to stop the page since it slowed the browser down, and after 10 minutes I stopped the attempt without ever seeing the page load.

Expected results:

Page should load almost instantly with only a few hundred role bindings

https://github.com/openshift/console/pull/13099

Task MGMT-15491: Validate vSphere disk.EnableUUID also when baremetal platform selected

View the Description View the linked PRs

Run isVSphereDiskUUIDEnabled validation also on baremetal platform installation.

From the description of https://issues.redhat.com/browse/OCPBUGS-16955:

Storage team has observed that if disk.EnableUUID flag is not enabled on vSphere VMs in any platform, including baremetal, then no symlinks are generated in /dev/disk/by-id for attached disks.

Installing ODF via LSO or something on such a platform results in somewhat fragile installation because disks themselves could be renamed on reboot and since no permanent ids exists for disks, the PVs could become invalid.

We should update baremetal installs - https://docs.openshift.com/container-platform/4.13/installing/installing_bare_metal/installing-bare-metal.html to always enable disk.EnableUUID in both IPI and UPI installs.

https://github.com/openshift/assisted-service/pull/5416

Bug OCPBUGS-11083: NTO: e2e: TuneD parameters check test is flaky

View the Description View the linked PRs

Description of problem:

After enabling realtime and high power consumption under workload hints in the performance profile, the test is falling since it cannot find stalld pid:
msg: "failed to run command [pidof stalld]: output \"\"; error \"\"; command terminated with exit code 1",

Version-Release number of selected component (if applicable):

Openshift 4.14, 4.13

How reproducible:

Often (Flaky test)

Bug OCPBUGS-12628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/60

Bug OCPBUGS-18399: OPENSHIFT_IMG_OVERRIDES is not retaining the mirroring order from ICSP/IDMS

View the Description View the linked PRs

Description of problem:

The environment variable OPENSHIFT_IMG_OVERRIDES is not retaining the order of mirrors listed under a source compared to the original mirror/source listing in the ICSP/IDMSs.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Setup a mgmt cluster with either an ICSP like:

  apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    name: image-policy-39
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - quay.io/openshift-release-dev/ocp-release
      - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
      source: quay.io/openshift-release-dev/ocp-release

2. Create a Hosted Cluster

Actual results:

Nodes cannot join the cluster because ignition cannot be generated

Expected results:

Nodes can join the cluster

Additional info:

Issue is most likely coming from here - https://github.com/openshift/hypershift/blob/dce6f51355317173be6bc525edfe059572c07690/support/util/util.go#L224

https://github.com/openshift/hypershift/pull/2977

Bug OCPBUGS-7921: Instance shouldn't be moved back from f to a

View the Description View the linked PRs

Description of problem:

Tested on gcp, there are 4 failureDomains a, b, c, f in CPMS, remove one a, a new master will be created in f. If readd f to CPMS, instance will be moved back from f to a

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Before update cpms.
      failureDomains:
        gcp:
        - zone: us-central1-a
        - zone: us-central1-b
        - zone: us-central1-c
        - zone: us-central1-f
$ oc get machine                  
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h4m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   90m
zhsungcp22-4glmq-master-plch8-1   Running   n2-standard-4   us-central1   us-central1-a   11m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h

1. Delete failureDomain "zone: us-central1-a" in cpms, new machine Running in zone f.
      failureDomains:
        gcp:
        - zone: us-central1-b
        - zone: us-central1-c
        - zone: us-central1-f 
$ oc get machine              
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h19m
zhsungcp22-4glmq-master-b7pdl-1   Running   n2-standard-4   us-central1   us-central1-f   13m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   106m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h16m
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h16m
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h16m
2. Add failureDomain "zone: us-central1-a" again, new machine running in zone a, the machine in zone f will be deleted.
      failureDomains:
        gcp:
        - zone: us-central1-a
        - zone: us-central1-f
        - zone: us-central1-c
        - zone: us-central1-b
$ oc get machine                          
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h35m
zhsungcp22-4glmq-master-5kltp-1   Running   n2-standard-4   us-central1   us-central1-a   12m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   121m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h32m
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h32m
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h32m

Actual results:

Instance is moved back from f to a

Expected results:

Instance shouldn't be moved back from f to a

Additional info:

https://issues.redhat.com//browse/OCPBUGS-7366

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/196

Bug MGMT-14723: [Staging] - P/Z cluster can't enable ODF - getting "Failed to update the cluster"

View the Description View the linked PRs

Description of the problem:

In staging, UI 2.20.6, BE 2.20.1 - not able to set ODF on, getting "Failed to update the cluster", although according to the support-level api it should be supported

How reproducible:

100%

Steps to reproduce:

1. Create new OCP 4.13 and P/Z cpu_arc

2. try to enable ODF

Actual results:

Expected results:

Bug OCPBUGS-11100: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7091

Bug OCPBUGS-11729: VSphereStorageDriver does not document the platform default

View the Description View the linked PRs

Description of problem:

API fields that are defaulted by a controller should document what their default is for each release version.
Currently the field documents that "if empty, subject to platform chosen default", but it does not state what that is.

To fix this, please add, after the platform chosen default prose:
// The current default is XYZ.

This will allow users to track the platform defaults over time from the API documentation.

I would like to see this fixed before 4.13 and 4.14 are released please, it should be pretty quick to fix if we understand what those defaults are.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/360

Bug OCPBUGS-12510: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-node-label-updater/pull/23

Bug OCPBUGS-15769: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-10884: mpath device type missing in LocalVolumeDiscovery CR

View the Description View the linked PRs

Description of problem:

When ODF StorageSystem CR gets created through the Wizard, the LocalVolumeDiscovery doesn't bring/show devices with mpath type

Version-Release number of selected component (if applicable):

OCP 4.11.31

How reproducible:

All the time

Steps to Reproduce:

1. Get OCP 4.11 running with the LSO and ODF operators
2. Configure and present mpath devices to nodes used for ODF
3. Use the ODF wizard to create a StorageSystem object
4. Inspect the LocalVolumeDiscovery results.

Actual results:

There are no devices of mpath type shown by the ODF wizard / LocalVolumeDiscovery CR

Expected results:

LocalVolumeDiscovery should discover mpath device type

Additional info:

LocalVolumeSet already works with mpath if you manually define them in .spec or  LocalVolume pointing to mpath devicePaths

Bug OCPBUGS-18442: MCO is degraded if not install image registry operator

View the Description View the linked PRs

Description of problem:

MCO depends on image registry, if not install image registry, installation will failed due to mco going to degraded

Version-Release number of selected component (if applicable):

payload image built from https://github.com/openshift/installer/pull/7421

How reproducible:

always

Steps to Reproduce:

1.Set "baselineCapabilitySet: None" when install a cluster, all the optional operators will not be installed.
2.
3.

Actual results:

09-01 15:50:34.770  level=error msg=Cluster operator machine-config Degraded is True with RenderConfigFailed: Failed to resync 4.14.0-0.ci.test-2023-08-31-033001-ci-ln-7xhl7yt-latest because: clusteroperators.config.openshift.io "image-registry" not found
09-01 15:50:34.770  level=error msg=Cluster operator machine-config Available is False with RenderConfigFailed: Cluster not available for [{operator 4.14.0-0.ci.test-2023-08-31-033001-ci-ln-7xhl7yt-latest}]: clusteroperators.config.openshift.io "image-registry" not found
09-01 15:50:34.770  level=info msg=Cluster operator network ManagementStateDegraded is False with : 
09-01 15:50:34.770  level=error msg=Cluster initialization failed because one or more operators are not functioning properly.

Expected results:

MCO should not be degraded if image registry is not installed

Additional info:

must-gather log https://drive.google.com/file/d/1E3FbPcVwZxBi33tHq7pyaHc8EM3eiTUa/view?usp=drive_link

https://github.com/openshift/machine-config-operator/pull/3901

Bug OCPBUGS-5478: Build and base images for the operator are not accessible to public

View the Description View the linked PRs

Description of problem:

I am trying to build the operator image locally and fail because the registry `registry.ci.openshift.org/ocp/` requires authorization

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. git clone git@github.com:openshift/cluster-ingress-operator.git
2. export REPO=<path to a repository to upload the image>
3. run `make release-local`

Actual results:

[skip several lines]
Step 1/10 : FROM registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.19-openshift-4.12 AS builder                                                                                             
unauthorized: authentication required

Expected results:

image is pulled and the build succeeded

Additional info:

There are two images that are not available:
- registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.19-openshift-4.12
- registry.ci.openshift.org/ocp/4.12:base

I was able to fix this by changing the images to
- registry.ci.openshift.org/openshift/release:golang-1.19                     - registry.ci.openshift.org/origin/4.12:base                                 

see https://github.com/dudinea/cluster-ingress-operator/tree/fix-build-images-not-public

I am not sure what I did is OK, but I suppose that this project,               being part of OKD should be easily buildable by the public
or at least the issue should be documented somewhere.                         
                                                        
I wanted to post this to the OKD project, but I am unable to select it in jira.

https://github.com/openshift/cluster-ingress-operator/pull/925

Bug OCPBUGS-10924: Openshift operators should be compliant with CIS benchmark rule

View the Description View the linked PRs

Description of problem:

Machine-config operator is  not compliant with CIS benchmark rule "Ensure Usage of Unique Service Accounts" [1] as part of "ocp4-cis" profile used in compliance operator [2]. Observed that machine-config operator is using the default service account where default SA comes into play if there is no other service account specified. OpenShift core  operators should be compliant with the CIS benchmark, i.e. the operators should run with their own serviceaccount rather than using the "default" one.


[1] https://static.open-scap.org/ssg-guides/ssg-ocp4-guide-cis.html#xccdf_org.ssgproject.content_group_accounts
[2] https://docs.openshift.com/container-platform/4.11/security/compliance_operator/compliance-operator-supported-profiles.html

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Core operators are using default service account

Expected results:

Core operators should run with their own service account

Additional info:

https://github.com/openshift/machine-config-operator/pull/3740

Bug OCPBUGS-11039: --container-runtime is being removed in k8s 1.27

View the Description View the linked PRs

Kubernetes 1.27 removes long deprecated --container-runtime flag, see https://github.com/kubernetes/kubernetes/pull/114017

To ensure the upgrade path between 4.13 to 4.14 isn't affected we need to backport the changes to both 4.14 and 4.13.

https://github.com/openshift/installer/pull/7036

Bug OCPBUGS-7794: image pull secret creation form doesn't re-enable Create button once it is disabled

View the Description View the linked PRs

Description of problem:

'Create' button on image pull secret creation form can not be re-enabled if it is disabled once

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-17-090603

How reproducible:

Always

Steps to Reproduce:

1. user logins to console
2. goes to Secrets -> Create Image pull secret, on the page
- Secret name: test-secret
- Authentication type: Upload configuration file, here we upload invalid JSON format file, console will give warning message 'Configuration file should be in JSON format.' and 'Create' button will be disabled
3. then we change Authentication type to 'Image registry credentials', fill up every required fields: Registry server address, Username and Password, we can see 'Create' button is still disabled

Actual results:

3. 'Create' button is still disabled, user has to cancel and fill the form again

Expected results:

3. we should re-enable Create button since we are trying to filling a form in a different way with all required fields correctly configured

Additional info:

https://github.com/openshift/console/pull/12609

Bug OCPBUGS-16025: Hide the Duplicate Pipelines Card in the DevConsole Add Page

View the Description View the linked PRs

Description of problem:

Hide the Duplicate Pipelines Card in the DevConsole Add Page

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Visit +Add Page of Dev Perspective

Actual results:

Duplicate Entry

Expected results:

No duplicates

Additional info:

https://github.com/openshift/console/pull/13007

Bug OCPBUGS-16135: awsendpointservice stuck deleting due to missing hosted zone

View the Description View the linked PRs

Description of problem:

The control-plane-operator pod gets stuck deleting an awsendpointservice if its hostedzone is already gone:

Logs:

{"level":"error","ts":"2023-07-13T03:06:58Z","msg":"Reconciler error","controller":"awsendpointservice","controllerGroup":"hypershift.openshift.io","controllerKind":"AWSEndpointService","aWSEndpointService":{"name":"private-router","namespace":"ocm-staging-24u87gg3qromrf8mg2r2531m41m0c1ji-diegohcp-west2"},"namespace":"ocm-staging-24u87gg3qromrf8mg2r2531m41m0c1ji-diegohcp-west2","name":"private-router","reconcileID":"59eea7b7-1649-4101-8686-78113f27567d","error":"failed to delete resource: NoSuchHostedZone: No hosted zone found with ID: Z05483711XJV23K8E97HK\n\tstatus code: 404, request id: f8686dd6-a906-4a5e-ba4a-3dd52ad50ec3","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

4.12.24

How reproducible:

Have not tried to reproduce yet, but should be fairly reproducible

Steps to Reproduce:

1. Install a PublicAndPrivate or Private HCP
2. Delete the Route53 Hosted Zone defined in its awsendpointservice's .status.dnsZoneID field
3. Start an uninstall
4. Observe the control-plane-operator looping on the above logs and the uninstall hanging

Actual results:

Uninstall hangs due to CPO being unable to delete the awsendpointservice

Expected results:

awsendpointservice cleans up, if the hosted zone is already gone CPO shouldn't care that it can't list hosted zones

Additional info:

https://github.com/openshift/hypershift/pull/2811

Bug OCPBUGS-17872: Azure MAO CredentialsRequest contains unnecessary network write permissions

View the Description View the linked PRs

Description of problem:

CredentialsRequest for Azure AD Workload Identity contains unnecessary network permissions.

- Microsoft.Network/applicationSecurityGroups/delete
- Microsoft.Network/applicationSecurityGroups/write
- Microsoft.Network/loadBalancers/delete
- Microsoft.Network/networkSecurityGroups/delete
- Microsoft.Network/routeTables/delete
- Microsoft.Network/routeTables/write
- Microsoft.Network/virtualNetworks/subnets/delete
- Microsoft.Network/virtualNetworks/subnets/write
- Microsoft.Network/virtualNetworks/write
- Microsoft.Resources/subscriptions/resourceGroups/delete
- Microsoft.Resources/subscriptions/resourceGroups/write

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

N/A

Steps to Reproduce:

1. Remove above permissions from the Azure Credentials request and validate that MAO continues to function in Azure AD Workload Identity cluster.

Actual results:

Unnecessary network write permissions enumerated in CredentialsRequest.

Expected results:

Only necessary permissions enumerated in CredentialsRequest.

Additional info:

Additional unnecessary permissions will be hard to pin point but these specific permissions were questioned by MSFT and are likely only needed by the installer as output by CORS-1870 investigation.

https://github.com/openshift/machine-api-operator/pull/1161

Bug OCPBUGS-5949: oc --icsp mapping scope does not match openshift icsp mapping scope

View the Description View the linked PRs

Description of problem:
The oc client has recently had functionality added to reference an icsp manifest with a variety of commands (using the --icsp flag).

The issue is that the registry/repo scope in an icsp required to trigger a mapping is different between ocp and oc. OCP icsp will match an image at the registry level, where the OC client requires exact registry + repo to match. This difference can cause major confusion (especially without adequate warning/error messages in the oc client).

Example Image to mirror: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1631b0f0bf9c6dc4f9519ceb06b6ec9277f53f4599853fcfad3b3a47d2afd404o

In OCP registry.mirrorregistry.com:5000/openshift-release-dev/ will accurately mirror the image

But using OC with --icsp , quay.io/openshift-release-dev/ocp-v4.0-art-dev is required or or the mirroring will not match.

Version-Release number of selected component (if applicable):{code:none}
oc version
Client Version: 4.11.0-202212070335.p0.g1928ac4.assembly.stream-1928ac4
Kustomize Version: v4.5.4
Server Version: 4.12.0-rc.8
Kubernetes Version: v1.25.4+77bec7a

How reproducible:

100%

Steps to Reproduce:
1. Create an ICSP file with content similar to below (Replace with your mirror registry url)

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  creationTimestamp: null
  name: image-policy
spec:
  repositoryDigestMirrors:
  - mirrors:
    - registry.mirrorregistry.com:5005/openshift-release-dev
    source: quay.io/openshift-release-dev

2. Add the ICSP to a bm openshift cluster and wait for MCP to finish node restarts
3. SSH to a cluster node
4. Try to podman pull the following image with debug log level

podman pull --log-level=debug quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1631b0f0bf9c6dc4f9519ceb06b6ec9277f53f4599853fcfad3b3a47d2afd404

5. The log will show the mirror registry is attempted (Which is similar behavior to OCP)
6. Now try to extract a the payload image from the release using oc client and --icsp flag (ICSP file should be the same manifest uses at step 1)

oc adm release extract --command=openshift-baremetal-install --to=/data/install-config-generate/installercache/registry.mirrorregistry.com:5005/openshift-release-dev/ocp-release:4.12.0-rc.8-x86_64 --insecure=false --icsp-file=/tmp/icsp-file1635083302 registry.mirrorregistry.com:5005/openshift-release-dev/ocp-release:4.12.0-rc.8-x86_64 --registry-config=/tmp/registry-config1265925963

Expected results:
openshift-baremetal-install is extracted to the proper directory using the mirrored payload image

Actual result:
oc client does not match the payload image because the icsp is not exact, so it immediately tries quay.io rather than the mirror registry

ited with non-zero exit code 1: \nwarning: --icsp-file only applies to images referenced by digest and will be ignored for tags\nerror: unable to read image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1631b0f0bf9c6dc4f9519ceb06b6ec9277f53f4599853fcfad3b3a47d2afd404: Get \"https://quay.io/v2/\": dial tcp 52.203.129.140:443: i/o timeout\n" func=github.com/openshift/assisted-service/internal/oc.execute file="/remote-source/assisted-service/app/internal/oc/release.go:404" go-id=26228 request_id=

Additional info:

I understand that oc-mirror or oc adm release mirror provides an icsp manifest to use, but as OCP itself allows for a wider scope for mapping, it can cause great confusion that oc icsp scope is not in parity. 

At the very least a warning/error message in the oc client when the icsp partially matches an image (but is not used) would be VERY useful.

https://github.com/openshift/oc/pull/1350

Task OPRUN-2941: Update cluster-policy-controller dependency

View the Description View the linked PRs

For reasons I still struggle to understand, in trying to mitigate issues stemming from the PSA changes to k8s, we decided on a convoluted architecture where one reconciler by one team (cluster-policy-controller) ignores openshift-* namespaces unless they have a specific label and are not part of the payload, while a reconciler on our team labels non-payload openshift-* namespaces appropriately so that the first one will do its security magic and keep workloads stable during this transition. This cockamamie scheme lead to a dependency between olm and cpc s.t. we can share the list of payload openshift-* namespaces.

This also means that we need to update the dependency at each release to keep parity with the OCP version of the dependency and olm.

We need to update the cpc dependency as the pipeline is blocked until we do (to letting an old version of the dependency, perhaps with a different list of payload openshift-* namespaces and breaking customer cluster or impacting their experience).

Note: this is currently blocking ART compliance PRs. We need to get this in ASAP.

https://github.com/openshift/operator-framework-olm/pull/494

Feature Request RFE-3765: Allow Ingress to Modify the HAProxy Log Length when using a Sidecar

View the Description View the linked PRs

1. Proposed title of this feature request

Allow Ingress to be modified the log length when using a sidecar

2. What is the nature and description of the request?

In the past we had the ~~RFE-1794~~ where an option was created to specify the length of the HAProxy log, however this option was only available for when redirecting the log for an external syslog. We need this option to be available for when using a sidecar to collect the logs.

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Container
        container: {}

Differently from the Syslog type, the Container type does not have any sub-parameter, which makes possible to configurate the log length.

As we can see in the ~~RFE-1794~~, the option to change the log length already exists in the haproxy configuration, but when using the sidecar, only the default value(1024) is used.

3. Why does the customer need this? (List the business requirements here)

The default log length of HAProxy is 1024. When the clients communicate to the application with the long uri arguments, it cannot catch the full access log and the parameter info. It is required a option to setup 8192 or higher.

4. List any affected packages or components.

haproxy
ingress
ingress-operator

https://github.com/openshift/cluster-ingress-operator/pull/900

Bug OCPBUGS-14095: multus mac-vlan/ipvlan/vlan cni panics when master interface in container is missing

View the Description View the linked PRs

Description of problem:

Multus mac-vlan/ipvlan/vlan cni panics when master interface in container is missing

Version-Release number of selected component (if applicable):

metallb-operator.v4.13.0-202304190216   MetalLB Operator   4.13.0-202304190216 Succeeded

How reproducible:

Create pod with multiple vlan interfaces connected to missing master interface.

Steps to Reproduce:

1. Create pod with multiple vlan interfaces connected to missing master interface in container
2. Make sure that pod stuck in ContainerCreating state 
3. Run oc describe pod PODNAME and read crash message:

 Normal   Scheduled               22s   default-scheduler  Successfully assigned cni-tests/pod-one to worker-0
  Normal   AddedInterface          21s   multus             Add eth0 [10.128.2.231/23] from ovn-kubernetes
  Normal   AddedInterface          21s   multus             Add ext0 [] from cni-tests/tap-one
  Normal   AddedInterface          21s   multus             Add ext0.1 [2001:100::1/64] from cni-tests/mac-vlan-one
  Warning  FailedCreatePodSandBox  18s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod-one_cni-tests_2e831519-effc-4502-8ea7-749eda95bf1d_0(321d7181626b8bbfad062dd7c7cc2ef096f8547e93cb7481a18b7d3613eabffd): error adding pod cni-tests_pod-one to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [cni-tests/pod-one/2e831519-effc-4502-8ea7-749eda95bf1d:mac-vlan]: error adding container to network "mac-vlan": plugin type="macvlan" failed (add): netplugin failed: "panic: runtime error: invalid memory address or nil pointer dereference\n[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x54281a]\n\ngoroutine 1 [running, locked to thread]:\npanic({0x560b00, 0x6979d0})\n\t/usr/lib/golang/src/runtime/panic.go:987 +0x3ba fp=0xc0001ad8f0 sp=0xc0001ad830 pc=0x433d7a\nruntime.panicmem(...)\n\t/usr/lib/golang/src/runtime/panic.go:260\nruntime.sigpanic()\n\t/usr/lib/golang/src/runtime/signal_unix.go:835 +0x2f6 fp=0xc0001ad940 sp=0xc0001ad8f0 pc=0x449cd6\nmain.getMTUByName({0xc00001a978, 0x4}, {0xc00002004a, 0x33}, 0x1)\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:167 +0x33a fp=0xc0001ada00 sp=0xc0001ad940 pc=0x54281a\nmain.loadConf(0xc000186770, {0xc00001e009, 0x19e})\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:120 +0x155 fp=0xc0001ada80 sp=0xc0001ada00 pc=0x5422d5\nmain.cmdAdd(0xc000186770)\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:287 +0x47 fp=0xc0001adcd0 sp=0xc0001ada80 pc=0x543b07\ngithub.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc0000bdec8, 0xc000186770, {0x5c02b8, 0xc0000e4e40}, 0x592e80)\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:166 +0x20a fp=0xc0001add60 sp=0xc0001adcd0 pc=0x5371ca\ngithub.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc0000bdec8, 0x698320?, 0xc0000bdeb0?, 0x44ed89?, {0x5c02b8, 0xc0000e4e40}, {0xc0000000f0, 0x22})\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:219 +0x2ca fp=0xc0001ade68 sp=0xc0001add60 pc=0x53772a\ngithub.com/containernetworking/cni/pkg/skel.PluginMainWithError(...)\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:273\ngithub.com/containernetworking/cni/pkg/skel.PluginMain(0x588e01?, 0x10?, 0xc0000bdf50?, {0x5c02b8?, 0xc0000e4e40?}, {0xc0000000f0?, 0x0?})\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:288 +0xd1 fp=0xc0001adf18 sp=0xc0001ade68 pc=0x537d51\nmain.main()\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:432 +0xb6 fp=0xc0001adf80 sp=0xc0001adf18 pc=0x544b76\nruntime.main()\n\t/usr/lib/golang/src/runtime/proc.go:250 +0x212 fp=0xc0001adfe0 sp=0xc0001adf80 pc=0x436a12\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0001adfe8 sp=0xc0001adfe0 pc=0x462fc1\n\ngoroutine 2 [force gc (idle)]:\nruntime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000acfb0 sp=0xc0000acf90 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.forcegchelper()\n\t/usr/lib/golang/src/runtime/proc.go:302 +0xad fp=0xc0000acfe0 sp=0xc0000acfb0 pc=0x436c6d\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000acfe8 sp=0xc0000acfe0 pc=0x462fc1\ncreated by runtime.init.6\n\t/usr/lib/golang/src/runtime/proc.go:290 +0x25\n\ngoroutine 3 [GC sweep wait]:\nruntime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000ad790 sp=0xc0000ad770 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.bgsweep(0x0?)\n\t/usr/lib/golang/src/runtime/mgcsweep.go:278 +0x8e fp=0xc0000ad7c8 sp=0xc0000ad790 pc=0x423e4e\nruntime.gcenable.func1()\n\t/usr/lib/golang/src/runtime/mgc.go:178 +0x26 fp=0xc0000ad7e0 sp=0xc0000ad7c8 pc=0x418d06\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000ad7e8 sp=0xc0000ad7e0 pc=0x462fc1\ncreated by runtime.gcenable\n\t/usr/lib/golang/src/runtime/mgc.go:178 +0x6b\n\ngoroutine 4 [GC scavenge wait]:\nruntime.gopark(0xc0000ca000?, 0x5bf2b8?, 0x1?, 0x0?, 0x0?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000adf70 sp=0xc0000adf50 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.(*scavengerState).park(0x6a0920)\n\t/usr/lib/golang/src/runtime/mgcscavenge.go:389 +0x53 fp=0xc0000adfa0 sp=0xc0000adf70 pc=0x421ef3\nruntime.bgscavenge(0x0?)\n\t/usr/lib/golang/src/runtime/mgcscavenge.go:617 +0x45 fp=0xc0000adfc8 sp=0xc0000adfa0 pc=0x4224c5\nruntime.gcenable.func2()\n\t/usr/lib/golang/src/runtime/mgc.go:179 +0x26 fp=0xc0000adfe0 sp=0xc0000adfc8 pc=0x418ca6\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000adfe8 sp=0xc0000adfe0 pc=0x462fc1\ncreated by runtime.gcenable\n\t/usr/lib/golang/src/runtime/mgc.go:179 +0xaa\n\ngoroutine 5 [finalizer wait]:\nruntime.gopark(0x0?, 0xc0000ac670?, 0xab?, 0x61?, 0xc0000ac770?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000ac628 sp=0xc0000ac608 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.runfinq()\n\t/usr/lib/golang/src/runtime/mfinal.go:180 +0x10f fp=0xc0000ac7e0 sp=0xc0000ac628 pc=0x417e0f\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000ac7e8 sp=0xc0000ac7e0 pc=0x462fc1\ncreated by runtime.createfing\n\t/usr/lib/golang/src/runtime/mfinal.go:157 +0x45\n"

Actual results:

The readable error message should be provided instead.

Expected results:

We should handle such scenario without crash and The following log should be used instead. 

Error: Failed to create container due to the missing master interface XXX.

Additional info:

https://github.com/openshift/containernetworking-plugins/pull/98

Bug OCPBUGS-3036: Non cluster-admin user is unable to Update an Operator in RHOCP 4 Web Console

View the Description View the linked PRs

Description of problem:

Users are not able to upgrade an namespace scoped operator in OpenShift console . 
Subscription tab is not visible in web console to the user with admin rights.
Only cluster-Admin users are able to update the operator.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Configure IDP. Add user. 
2. Install any operator in specific namespace.
3. Assign project admin permission to the user for the same namespace
4. Login with the user and check if `Subscription` tab is visible to update the operator.

Actual results:

User is not able to update the operator. Subscription tab is not visible to the user in web console.

Expected results:

User must get an access to update the namespace scoped operator if user has the admin permission for the same project.

Additional info:

Tried to reproduce the issue and observed same behavior in OCP 4.10.20 , OCP 4.10.25 and OCP 4.10.34

https://github.com/openshift/console/pull/12716

Bug OCPBUGS-1769: gracefully fail when iam:GetRole is denied

View the Description View the linked PRs

Description of problem:

Installer as used with AWS, during a cluster destroy, does a get-all-roles and would delete roles based on a tag. If a customer is using AWS SEA which would deny any roles doing a get-all-roles in the AWS account, the installer fails.

Instead of error-out, the installer should gracefully handle being denied get-all-roles and move onward, so that a denying SCP would not get in the way of a successful cluster destroy on AWS.

Version-Release number of selected component (if applicable):

[ec2-user@ip-172-16-32-144 ~]$ rosa version
1.2.6

How reproducible:

1. Deploy ROSA STS, private with PrivateLink with AWS SEA
2. rosa delete cluster --debug
3. watch the debug logs of the installer to see it try to get-all-roles
4. installer fails when the SCP from AWS SEA denies the get-all-roles task

Steps to Reproduce: Philip Thomson Would you please fill out the below?

Steps list above.

Actual results:

time="2022-09-01T00:10:40Z" level=error msg="error after waiting for command completion" error="exit status 4" installID=zp56pxql
time="2022-09-01T00:10:40Z" level=error msg="error provisioning cluster" error="exit status 4" installID=zp56pxql
time="2022-09-01T00:10:40Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 4" installID=zp56pxql


time="2022-09-01T00:12:47Z" level=info msg="copied /installconfig/install-config.yaml to /output/install-config.yaml" installID=55h2cvl5
time="2022-09-01T00:12:47Z" level=info msg="cleaning up resources from previous provision attempt" installID=55h2cvl5
time="2022-09-01T00:12:47Z" level=debug msg="search for matching resources by tag in ca-central-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:12:48Z" level=debug msg="search for matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:12:48Z" level=debug msg="search for IAM roles" installID=55h2cvl5
time="2022-09-01T00:12:49Z" level=debug msg="iterating over a page of 64 IAM roles" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-ConfigRecorderRole-B749E1E6: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-ConfigRecorderRole-B749E1E6 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6b4b5144-2f4e-4fde-ba1a-04ed239b84c2" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6152e9c2-9c1c-478b-a5e3-11ff2508684e" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 8636f0ff-e984-4f02-870e-52170ab4e7bb" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 2385a980-dc9b-480f-955a-62ac1aaa6718" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 02ccef62-14e7-4310-b254-a0731995bd45" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH with an explicit deny in a service control policy\n\tstatus code: 403, request id: eca2081d-abd7-4c9b-b531-27ca8758f933" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6bda17e9-83e5-4688-86a0-2f84c77db759" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 827afa4a-8bb9-4e1e-af69-d5e8d125003a" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 8dcd0480-6f9e-49cb-a0dd-0c5f76107696" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 5095aed7-45de-4ca0-8c41-9db9e78ca5a6" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I with an explicit deny in a service control policy\n\tstatus code: 403, request id: 04f7d0e0-4139-4f74-8f67-8d8a8a41d6b9" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 115f9514-b78b-42d1-b008-dc3181b61d33" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP with an explicit deny in a service control policy\n\tstatus code: 403, request id: 68da4d93-a93e-410a-b3af-961122fe8df0" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 012221ea-2121-4b04-91f2-26c31c8458b1" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36 with an explicit deny in a service control policy\n\tstatus code: 403, request id: e6c9328d-a4b9-4e69-8194-a68ed7af6c73" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 214ca7fb-d153-4d0d-9f9c-21b073c5bd35" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU with an explicit deny in a service control policy\n\tstatus code: 403, request id: 63b54e82-e2f6-48d4-bd0f-d2663bbc58bf" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K with an explicit deny in a service control policy\n\tstatus code: 403, request id: d24982b6-df65-4ba2-a3c0-5ac8d23947e1" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX with an explicit deny in a service control policy\n\tstatus code: 403, request id: e2c5737a-5014-4eb5-9150-1dd1939137c0" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7793fa7c-4c8d-4f9f-8f23-d393b85be97c" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC with an explicit deny in a service control policy\n\tstatus code: 403, request id: bef2c5ab-ef59-4be6-bf1a-2d89fddb90f1" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD with an explicit deny in a service control policy\n\tstatus code: 403, request id: ff04eb1b-9cf6-4fff-a503-d9292ff17ccd" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-PipelineRole: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-PipelineRole with an explicit deny in a service control policy\n\tstatus code: 403, request id: 85e05de8-ba16-4366-bc86-721da651d770" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=info msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a9d864e4-cfdf-483d-a0d2-9b48a117abc4" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="search for IAM users" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="iterating over a page of 0 IAM users" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="search for IAM instance profiles" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=info msg="error while finding resources to delete" error="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a9d864e4-cfdf-483d-a0d2-9b48a117abc4" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:12:57Z" level=info msg=Disassociated id=i-03d7570547d32071d installID=55h2cvl5 name=rosa-mv9dx3-xls7g-master-profile role=ROSA-ControlPlane-Role
time="2022-09-01T00:12:57Z" level=info msg=Deleted InstanceProfileName=rosa-mv9dx3-xls7g-master-profile arn="arn:aws:iam::646284873784:instance-profile/rosa-mv9dx3-xls7g-master-profile" id=i-03d7570547d32071d installID=55h2cvl5
time="2022-09-01T00:12:57Z" level=debug msg=Terminating id=i-03d7570547d32071d installID=55h2cvl5
time="2022-09-01T00:12:58Z" level=debug msg=Terminating id=i-08bee3857e5265ba4 installID=55h2cvl5
time="2022-09-01T00:12:58Z" level=debug msg=Terminating id=i-00df6e7b34aa65c9b installID=55h2cvl5
time="2022-09-01T00:13:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:49Z" level=info msg=Deleted id=rosa-mv9dx3-xls7g-sint/2e99b98b94304d80 installID=55h2cvl5
time="2022-09-01T00:17:49Z" level=info msg=Deleted id=eni-0e4ee5cf8f9a8fdd2 installID=55h2cvl5
time="2022-09-01T00:17:50Z" level=debug msg="Revoked ingress permissions" id=sg-03265ad2fae661b8c installID=55h2cvl5
time="2022-09-01T00:17:50Z" level=debug msg="Revoked egress permissions" id=sg-03265ad2fae661b8c installID=55h2cvl5
time="2022-09-01T00:17:50Z" level=debug msg="DependencyViolation: resource sg-03265ad2fae661b8c has a dependent object\n\tstatus code: 400, request id: f7c35709-a23d-49fd-ac6a-f092661f6966" arn="arn:aws:ec2:ca-central-1:646284873784:security-group/sg-03265ad2fae661b8c" installID=55h2cvl5
time="2022-09-01T00:17:51Z" level=info msg=Deleted id=eni-0e592a2768c157360 installID=55h2cvl5
time="2022-09-01T00:17:52Z" level=debug msg="listing AWS hosted zones \"rosa-mv9dx3.0ffs.p1.openshiftapps.com.\" (page 0)" id=Z072427539WBI718F6BCC installID=55h2cvl5
time="2022-09-01T00:17:52Z" level=debug msg="listing AWS hosted zones \"0ffs.p1.openshiftapps.com.\" (page 0)" id=Z072427539WBI718F6BCC installID=55h2cvl5
time="2022-09-01T00:17:53Z" level=info msg=Deleted id=Z072427539WBI718F6BCC installID=55h2cvl5
time="2022-09-01T00:17:53Z" level=debug msg="Revoked ingress permissions" id=sg-08bfbb32ea92f583e installID=55h2cvl5
time="2022-09-01T00:17:53Z" level=debug msg="Revoked egress permissions" id=sg-08bfbb32ea92f583e installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=info msg=Deleted id=sg-08bfbb32ea92f583e installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=info msg=Deleted id=rosa-mv9dx3-xls7g-aint/635162452c08e059 installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=info msg=Deleted id=eni-049f0174866d87270 installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=debug msg="search for matching resources by tag in ca-central-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:55Z" level=debug msg="search for matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:55Z" level=debug msg="no deletions from us-east-1, removing client" installID=55h2cvl5
time="2022-09-01T00:17:55Z" level=debug msg="search for IAM roles" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="iterating over a page of 64 IAM roles" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-ConfigRecorderRole-B749E1E6: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-ConfigRecorderRole-B749E1E6 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 06b804ae-160c-4fa7-92de-fd69adc07db2" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C with an explicit deny in a service control policy\n\tstatus code: 403, request id: 2a5dd4ad-9c3e-40ee-b478-73c79671d744" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2 with an explicit deny in a service control policy\n\tstatus code: 403, request id: e61daee8-6d2c-4707-b4c9-c4fdd6b5091c" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 1b743447-a778-4f9e-8b48-5923fd5c14ce" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO with an explicit deny in a service control policy\n\tstatus code: 403, request id: da8c8a42-8e79-48e5-b548-c604cb10d6f4" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7d7840e4-a1b4-4ea2-bb83-9ee55882de54" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7f2e04ed-8c49-42e4-b35e-563093a57e5b" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO with an explicit deny in a service control policy\n\tstatus code: 403, request id: cd2b4962-e610-4cc4-92bc-827fe7a49b48" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7 with an explicit deny in a service control policy\n\tstatus code: 403, request id: be005a09-f62c-4894-8c82-70c375d379a9" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 541d92f4-33ce-4a50-93d8-dcfd2306eeb0" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6dd81743-94c4-479a-b945-ffb1af763007" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a269f47b-97bc-4609-b124-d1ef5d997a91" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP with an explicit deny in a service control policy\n\tstatus code: 403, request id: 33c3c0a5-e5c9-4125-9400-aafb363c683c" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 32e87471-6d21-42a7-bfd8-d5323856f94d" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36 with an explicit deny in a service control policy\n\tstatus code: 403, request id: b2cc6745-0217-44fe-a48b-44e56e889c9e" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 09f81582-6685-4dc9-99f0-ed33565ab4f4" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU with an explicit deny in a service control policy\n\tstatus code: 403, request id: cea9116c-2b54-4caa-9776-83559d27b8f8" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K with an explicit deny in a service control policy\n\tstatus code: 403, request id: 430d7750-c538-42a5-84b5-52bc77ce2d56" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 279038e4-f3c9-4700-b590-9a90f9b8d3a2" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F with an explicit deny in a service control policy\n\tstatus code: 403, request id: 5e2f40ae-3dc7-4773-a5cd-40bf9aa36c03" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC with an explicit deny in a service control policy\n\tstatus code: 403, request id: 92a27a7b-14f5-455b-aa39-3c995806b83e" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD with an explicit deny in a service control policy\n\tstatus code: 403, request id: 0da4f66c-c6b1-453c-a8c8-dc0399b24bb9" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-PipelineRole: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-PipelineRole with an explicit deny in a service control policy\n\tstatus code: 403, request id: f2c94beb-a222-4bad-abe1-8de5786f5e59" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=info msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 829c3569-b2f2-4b9d-94a0-69644b690066" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="search for IAM users" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="iterating over a page of 0 IAM users" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="search for IAM instance profiles" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=info msg="error while finding resources to delete" error="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 829c3569-b2f2-4b9d-94a0-69644b690066" installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=info msg=Deleted id=sg-03265ad2fae661b8c installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=debug msg="search for matching resources by tag in ca-central-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=debug msg="no deletions from ca-central-1, removing client" installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=debug msg="search for IAM roles" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="iterating over a page of 64 IAM roles" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-ConfigRecorderRole-B749E1E6: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-ConfigRecorderRole-B749E1E6 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 0e8e0bea-b512-469b-a996-8722a0f7fa25" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C with an explicit deny in a service control policy\n\tstatus code: 403, request id: 288456a2-0cd5-46f1-a5d2-6b4006a5dc0e" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 321df940-70fc-45e7-8c56-59fe5b89e84f" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 45bebf36-8bf9-4c78-a80f-c6a5e98b2187" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO with an explicit deny in a service control policy\n\tstatus code: 403, request id: eea00ae2-1a72-43f9-9459-a1c003194137" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 0ef5a102-b764-4e17-999f-d820ebc1ec12" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ with an explicit deny in a service control policy\n\tstatus code: 403, request id: 107d0ccf-94e7-41c4-96cd-450b66a84101" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO with an explicit deny in a service control policy\n\tstatus code: 403, request id: da9bd868-8384-4072-9fb4-e6a66e94d2a1" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 74fbf44c-d02d-4072-b038-fa456246b6a8" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 365116d6-1467-49c3-8f58-1bc005aa251f" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I with an explicit deny in a service control policy\n\tstatus code: 403, request id: 20f91de5-cfeb-45e0-bb46-7b66d62cc749" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 924fa288-f1b9-49b8-b549-a930f6f771ce" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP with an explicit deny in a service control policy\n\tstatus code: 403, request id: 4beb233d-40d6-4016-872a-8757af8f98ee" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 77951f62-e0b4-4a9b-a20c-ea40d6432e84" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 13ad38c8-89dc-461d-9763-870eec3a6ba1" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a8fe199d-12fb-4141-a944-c7c5516daf25" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU with an explicit deny in a service control policy\n\tstatus code: 403, request id: b487c62f-5ac5-4fa0-b835-f70838b1d178" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K with an explicit deny in a service control policy\n\tstatus code: 403, request id: 97bfcb55-ae1f-4859-9c12-03de09607f79" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX with an explicit deny in a service control policy\n\tstatus code: 403, request id: ca1094f6-714e-4042-9134-75f4c6d9d0df" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F with an explicit deny in a service control policy\n\tstatus code: 403, request id: ca1db477-ee6a-4d03-8b57-52b335b2bbe6" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC with an explicit deny in a service control policy\n\tstatus code: 403, request id: 1fc32d09-588b-4d80-ad62-748f7fb55efd" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7d906cc2-eaaa-439b-97e0-503615ce5d43" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-PipelineRole: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-PipelineRole with an explicit deny in a service control policy\n\tstatus code: 403, request id: ee6a5647-20b1-4880-932b-bfd70b945077" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=info msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a424891e-48ab-4ad4-9150-9ef1076dcb9c" installID=55h2cvl5

Repeats the not authroized errors probably 50+ times.

Expected results:

For these errors not to show up during install.

Additional info:

Again this is only due to ROSA being install in an AWS SEA environment - https://github.com/aws-samples/aws-secure-environment-accelerator.

https://github.com/openshift/installer/pull/7180

Bug OCPBUGS-18149: kube-apiserver etcd storage retry broken on slow platforms

View the Description View the linked PRs

"etcdserver: leader changed" causes clients to fail.

This error should never bubble up to clients because the kube-apiserver can always retry this failure mode since it knows the data was not modified. When etcd adjusts timeouts for leader election and heartbeating for slow hardware like Azure, the hardcoded timeouts in the kube-apiserver/etcd fail. See

kube-apiserver tries to use etcd retries: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go#L308-L317
etcd retries appear to be unconditionally added: https://github.com/etcd-io/etcd/blob/main/client/v3/client.go#L243-L249 and https://github.com/etcd-io/etcd/blob/release-3.5/client/v3/client.go#L286
etcd retries retry a max of 2.5 seconds: https://github.com/etcd-io/etcd/blob/main/client/v3/options.go#L53 + https://github.com/etcd-io/etcd/blob/main/client/v3/options.go#L45
etcd retries are further reduced by zero-second retry on quorum
On azure https://github.com/openshift/cluster-etcd-operator/blob/d7d43ee21aff6b178b2104228bba374977777a84/pkg/etcdenvvar/etcd_env.go#L229 slower leader change reactions https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/hwspeedhelpers/hwhelper.go#L28 mean we are likely to exceed the number of retries for requests near the beginning of a change

Simply saying, "oh, it's hardcoded and kube" isn't good enough. We have previously had a storage shim to retry such problems. If all else fails, bringing back the small shim to retry Unavailable etcd errors longer is appropriate to fix all available clients.

Additionally, this etcd capability is being made more widely available and this bug prevents that from working.

Bug OCPBUGS-5816: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/230

Bug OCPBUGS-8113: fails to switch to kernel-rt with rhel 9.2

View the Description View the linked PRs

This came up a while ago, see https://groups.google.com/u/1/a/redhat.com/g/aos-devel/c/HuOTwtI4a9I/m/nX9mKjeqAAAJ

Basically this MC:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-override
spec:
  kernelType: realtime
  osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b4cc3995d5fc11e3b22140d8f2f91f78834e86a210325cbf0525a62725f8e099

Will degrade the node with

E0301 21:25:09.234001    3306 writer.go:200] Marking Degraded due to: error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: error: Could not depsolve transaction; 1 problem detected:
 Problem: package kernel-modules-core-5.14.0-282.el9.x86_64 requires kernel-uname-r = 5.14.0-282.el9.x86_64, but none of the providers can be installed
  - conflicting requests
: exit status 1

It's kind of annoying here because the packages to remove are now OS version dependent. A while ago I filed https://github.com/coreos/rpm-ostree/issues/2542 which would push the problem down into rpm-ostree, which is in a better situation to deal with it, and that may be the fix...but it's also pushing the problem down there in a way that's going to be maintenance pain (but, we can deal with that).

It's also possible that we may need to explicitly request installation of `kernel-rt-modules-core`...I'll look.

Bug OCPBUGS-11123: "oc adm groups sync" is not working if multiple OCP groups point to same LDAP group

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-16554: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/546

Bug OCPBUGS-18056: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/237

Bug OCPBUGS-9274: Ingress-canary daemonset does not tolerate Infra taints NoExecute

View the Description View the linked PRs

Description of problem:
Ingress-canary Daemon Set does not tolerate Infra taint "NoExecute"

Version-Release number of selected component (if applicable):
OCPv4.9

How reproducible:
Always

Steps to Reproduce:
1.Label and Taint Node
$ oc describe node worker-0.cluster49.lab.pnq2.cee.redhat.com | grep infra
Roles: custom,infra,test
node-role.kubernetes.io/infra= <----
Taints: node-role.kubernetes.io/infra=reserved:NoExecute <----
node-role.kubernetes.io/infra=reserved:NoSchedule <----

2.Edit ingress-canary ds and add NoExecute toleration
$ oc get ds -o yaml | grep -i tole -A6
tolerations:

effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
effect: NoExecute <----
key: node-role.kubernetes.io/infra <----
value: reserved <----

3. The Daemon Set configuration gets overwritten after some time, probably by the managing operator, and the pods are terminated on the infra nodes.

Actual results:
Infra taint toleration NoExecute gets overwritten :
$ oc get ds -o yaml | grep -i tole -A6
tolerations:

effect: NoSchedule
key: node-role.kubernetes.io/infra
operator: Exists

Expected results:
Ingress canary Daemon Set should be able to tolerate the NoExecute taint toleration.

Additional info: Same taint as the product documentation are used (node-role.kubernetes.io/infra)

https://github.com/openshift/cluster-ingress-operator/pull/932

Bug OCPBUGS-12714: Prometheus, promtail, node exporter consuming all CPU on a system

View the Description View the linked PRs

Description of problem:

Under heavy control plane load (bringing up ~200 pods), prometheus/promtail spikes to over 100% CPU, node_exporter goes to ~200% cpu and stays there for 5-10 minutes. Tested on a GCP cluster bot using 2 physical core (4 vcpu) workers. This starves out essential platform functions like OVS from getting any CPU and causes the data plane to go down.

Running perf against node_exporter reveals the application is consuming the majority of its CPU trying to list new interfaces being added in sysfs. This looks like it is due to disbling netlink via:

https://issues.redhat.com/browse/OCPBUGS-8282

This operation grabs the rtnl lock which can compete with other components on the host that are trying to configure networking.

Version-Release number of selected component (if applicable):

Tested on 4.13 and 4.14 with GCP.

How reproducible:

3/4 times

Steps to Reproduce:

1. Launch gcp with cluster bot
2. Create a deployment with pause containers which will max out pods on the nodes:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webserver-deployment
  namespace: openshift-ovn-kubernetes
  labels:
    pod-name: server
    app: nginx
    role: webserver
spec:
  replicas: 700
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        role: webserver
    spec:
      containers:
        - name: webserver1
          image: k8s.gcr.io/pause:3.1
          ports:
            - containerPort: 80
              name: serve-80
              protocol: TCP 
3. Watch top cpu output. Wait for node_exporter and prometheus to show very high CPU. If this does not happen, proceed to step 4.
4. Delete the deployment and then recreate it.
5. High and persistent CPU usage should now be observed.

Actual results:

CPU is pegged on the host for several minutes. Terminal is almost unresponsive. Only way to fix it was to delete node_exporter and prometheus DS.

Expected results:

Prometheus and other metrics related applications should:
1. use netlink to avoid grabbing rtnl lock
2. should be cpu limited. Certain required applications in OCP are resource unbounded (like networking data plane) to ensure the node's core functions continue to work. Metrics however should be CPU limited to avoid tooling from locking up a node.

Additional info:

Perf summary (will attach full perf output)
    99.94%     0.00%  node_exporter  node_exporter      [.] runtime.goexit.abi0
            |
            ---runtime.goexit.abi0
               |
                --99.33%--github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func2
                          |
                           --99.33%--github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1
                                     |
                                      --99.33%--github.com/prometheus/node_exporter/collector.execute
                                                |
                                                |--97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).Update
                                                |          |
                                                |           --97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).netClassSysfsUpdate
                                                |                     |
                                                |                      --97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).getNetClassInfo
                                                |                                |
                                                |                                 --97.64%--github.com/prometheus/procfs/sysfs.FS.NetClassByIface
                                                |                                           |
                                                |                                            --97.64%--github.com/prometheus/procfs/sysfs.parseNetClassIface
                                                |                                                      |
                                                |                                                       --97.61%--github.com/prometheus/procfs/internal/util.SysReadFile
                                                |                                                                 |
                                                |                                                                  --97.45%--syscall.read
                                                |                                                                            |
                                                |                                                                             --97.45%--syscall.Syscall
                                                |                                                                                       |
                                                |                                                                                        --97.45%--runtime/internal/syscall.Syscall6
                                                |                                                                                                  |
                                                |                                                                                                   --70.34%--entry_SYSCALL_64_after_hwframe
                                                |                                                                                                             do_syscall_64
                                                |                                                                                                             |
                                                |                                                                                                             |--39.13%--ksys_read
                                                |                                                                                                             |          |
                                                |                                                                                                             |          |--31.97%--vfs_read

Bug OCPBUGS-15500: openshift-tests panics when retrieving etcd logs

View the Description View the linked PRs

Description of problem:

Since we migrated some our jobs to OCP 4.14, we are experiencing a lot of flakiness with the "openshift-tests" binary which panics when trying to retrieve the logs of etcd: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2212/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted/1673615526967906304#1:build-log.txt%3A161-191

Here's the impact on our jobs:
https://search.ci.openshift.org/?search=error+reading+pod+logs&maxAge=48h&context=1&type=build-log&name=.*assisted.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Happens from time to time against OCP 4.14

Steps to Reproduce:

1. Provision an OCP cluster 4.14
2. Run the conformance tests on it with "openshift-tests"

Actual results:


The binary "openshift-tests" panics from time to time:

 [2023-06-27 10:12:07] time="2023-06-27T10:12:07Z" level=error msg="error reading pod logs" error="container \"etcd\" in pod \"etcd-test-infra-cluster-a1729bd4-master-2\" is not available" pod=etcd-test-infra-cluster-a1729bd4-master-2
[2023-06-27 10:12:07] panic: runtime error: invalid memory address or nil pointer dereference
[2023-06-27 10:12:07] [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x26eb9b5]
[2023-06-27 10:12:07] 
[2023-06-27 10:12:07] goroutine 1 [running]:
[2023-06-27 10:12:07] bufio.(*Scanner).Scan(0xc005954250)
[2023-06-27 10:12:07] 	bufio/scan.go:214 +0x855
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.IntervalsFromPodLogs({0x8d91460, 0xc004a43d40}, {0xc8b83c0?, 0xc006138000?, 0xc8b83c0?}, {0x8d91460?, 0xc004a43d40?, 0xc8b83c0?})
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/monitor/intervalcreation/podlogs.go:130 +0x8cd
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.InsertIntervalsFromCluster({0x8d441e0, 0xc000ffd900}, 0xc0008b4000?, {0xc005f88000?, 0x539, 0x0?}, 0x25e1e39?, {0xc11ecb5d446c4f2c, 0x4fb99e6af, 0xc8b83c0}, ...)
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/monitor/intervalcreation/types.go:65 +0x274
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*MonitorEventsOptions).End(0xc001083050, {0x8d441e0, 0xc000ffd900}, 0x1?, {0x7fff15b2ccde, 0x16})
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/test/ginkgo/options_monitor_events.go:170 +0x225
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*Options).Run(0xc0013e2000, 0xc00012e380, {0x8126d1e, 0xf})
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/test/ginkgo/cmd_runsuite.go:506 +0x2d9a
[2023-06-27 10:12:07] main.newRunCommand.func1.1()
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:330 +0x2d4
[2023-06-27 10:12:07] main.mirrorToFile(0xc0013e2000, 0xc0014cdb30)
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:476 +0x5f2
[2023-06-27 10:12:07] main.newRunCommand.func1(0xc0013e0300?, {0xc000862ea0?, 0x6?, 0x6?})
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:311 +0x5c
[2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).execute(0xc0013e0300, {0xc000862e40, 0x6, 0x6})
[2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:916 +0x862
[2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).ExecuteC(0xc0013e0000)
[2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd
[2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).Execute(...)
[2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:968
[2023-06-27 10:12:07] main.main.func1(0xc00011b300?)
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:96 +0x8a
[2023-06-27 10:12:07] main.main()
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:97 +0x516

Expected results:

No panics

Additional info:

The source of the panic has been pin-pointed here: https://github.com/openshift/origin/pull/27772#discussion_r1243600596

https://github.com/openshift/origin/pull/28012

Bug OCPBUGS-16623: OpenShift Router still sends traffic to its only backend when weight is 0

View the Description View the linked PRs

Description of problem:

per oc set route-backends -h output:
Routes may have one or more optional backend services with weights controlling how much traffic flows to each service.
[...]
**If all weights are zero the route will not send traffic to any backends.**

this is not the case anymore for a route with a single backend.

Version-Release number of selected component (if applicable):

at least from OCP 4.12 onward

How reproducible:

all the time

Steps to Reproduce:

1. kubectl create -f example/
2. kubectl patch route example -p '{"spec":{"to": {"weight": 0}}}' --type merge
3. curl http://localhost -H "Host: example.local"

Actual results:

curl succeeds

Expected results:

curl fails

Additional info:

https://access.redhat.com/support/cases/#/case/03567697

is regression following ~~NE-822~~. Reverting
https://github.com/openshift/router/commit/9656da7d5e2ac0962f3eaf718ad7a8c8b2172cfa makes it work again.

https://github.com/openshift/router/pull/499

Bug OCPBUGS-16783: Chore: Update OWNERS and OWNERS_ALIASES in CSI driver and operator repos

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES in all CSI driver and operator repos.

For driver repos:

1) OWNERS must have `component`:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

For operator repos:

1) OWNERS must have:

all team members of Storage team as `approvers`
`component`:
```
component: "Storage / Operators"
```

Bug OCPBUGS-18392: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/777

Bug OCPBUGS-11565: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1788

Bug OCPBUGS-13970: SecretHashAnnotation is not always removed from oauthDeployment when idp is defined

View the Description View the linked PRs

if the kubeadmin secret was deleted successfully from the guest cluster, but the `SecretHashAnnotation` annotation deletion on the oauthDeployment failed, the annotation will not be reconciled again and the annotation will never be removed.

context: https://redhat-internal.slack.com/archives/C01C8502FMM/p1684765042825929

https://github.com/openshift/hypershift/pull/2593

Bug OCPBUGS-14561: Prevent ci/prow/versions from failing on PR against release-xxx

View the Description View the linked PRs

See https://issues.redhat.com//browse/MON-3173 for details.

Having the test failing may be confusing.

+ we should make the test clearer.

https://github.com/openshift/cluster-monitoring-operator/pull/1969

Bug OCPBUGS-11736: GCP XPN Installs Require bindPrivateDNSZone Permission in host project

View the Description View the linked PRs

Description of problem:

GCP XPN installs require the permission `projects/<host-project>/roles/dns.networks.bindPrivateDNSZone` in the host project. This permission is not always provided in organizations. The installer requires this permission in order to create a private DNS zone and bind it to the shared networks.

Instead, the installer should be able to create records in a provided private zone that matches the base domain.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7155

Bug OCPBUGS-13128: Several SNO clusters fail to install because the OLM operator is unavailable/degraded (Both operator-lifecycle-manager and operator-lifecycle-manager-packageserver)

View the Description View the linked PRs

Description of problem:

While deploy 3671 SNOs via ACM and ZTP, 19 SNO clusters failed to install because the clusterversion object complained that the cluster operator operator-lifecycle-manager is not available.

Version-Release number of selected component (if applicable):

Hub OCP 4.12.14
SNO Deployed OCP 4.13.0-rc.6
ACM - 2.8.0-DOWNSTREAM-2023-04-30-18-44-29

How reproducible:

19 out of 51 failed clusters out of 3671 total installs
~.5% of installs might experience this however it represents ~37% of all install failures

Steps to Reproduce:

1.
2.
3.

Actual results:

# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers"
vm00096 version         False   True   15h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm00334 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm00593 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm01095 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm01192 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm01447 version         False   True   18h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01566 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01707 version         False   True   17h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01742 version         False   True   15h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01798 version         False   True   13h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01810 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02020 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02091 version         False   True   20h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02363 version         False   True   13h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02590 version         False   True   20h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02908 version         False   True   18h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm03253 version         False   True   14h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm03500 version         False   True   17h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm03654 version         False   True   17h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available

Expected results:

Additional info:

There appears to be two distinguishing failure signatures in the list of cluster operators, every cluster shows that the OLM isn't available and is degraded and more than half of the clusters show no information regarding the operator-lifecycle-manager-packageserver.

# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get co operator-lifecycle-manager --no-headers"
vm00096 operator-lifecycle-manager         False   True   True   15h   
vm00334 operator-lifecycle-manager         False   True   True   19h   
vm00593 operator-lifecycle-manager         False   True   True   19h   
vm01095 operator-lifecycle-manager         False   True   True   19h   
vm01192 operator-lifecycle-manager         False   True   True   19h   
vm01447 operator-lifecycle-manager         False   True   True   18h   
vm01566 operator-lifecycle-manager         False   True   True   19h   
vm01707 operator-lifecycle-manager         False   True   True   17h   
vm01742 operator-lifecycle-manager         False   True   True   15h   
vm01798 operator-lifecycle-manager         False   True   True   13h   
vm01810 operator-lifecycle-manager         False   True   True   19h   
vm02020 operator-lifecycle-manager         False   True   True   19h   
vm02091 operator-lifecycle-manager         False   True   True   20h   
vm02363 operator-lifecycle-manager         False   True   True   13h   
vm02590 operator-lifecycle-manager         False   True   True   20h   
vm02908 operator-lifecycle-manager         False   True   True   18h   
vm03253 operator-lifecycle-manager         False   True   True   14h   
vm03500 operator-lifecycle-manager         False   True   True   17h   
vm03654 operator-lifecycle-manager         False   True   True   17h
# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get co operator-lifecycle-manager-packageserver --no-headers"
vm00096 operator-lifecycle-manager-packageserver                                 
vm00334 operator-lifecycle-manager-packageserver         False   True   False   19h   
vm00593 operator-lifecycle-manager-packageserver         False   True   False   19h   
vm01095 operator-lifecycle-manager-packageserver                                 
vm01192 operator-lifecycle-manager-packageserver                                 
vm01447 operator-lifecycle-manager-packageserver                                 
vm01566 operator-lifecycle-manager-packageserver         False   True   False   19h   
vm01707 operator-lifecycle-manager-packageserver                                 
vm01742 operator-lifecycle-manager-packageserver         False   True   False   15h   
vm01798 operator-lifecycle-manager-packageserver                                 
vm01810 operator-lifecycle-manager-packageserver                                 
vm02020 operator-lifecycle-manager-packageserver                                 
vm02091 operator-lifecycle-manager-packageserver         False   True   False   20h   
vm02363 operator-lifecycle-manager-packageserver         False   True   False   13h   
vm02590 operator-lifecycle-manager-packageserver         False   True   False   20h   
vm02908 operator-lifecycle-manager-packageserver         False   True   False   18h   
vm03253 operator-lifecycle-manager-packageserver                                 
vm03500 operator-lifecycle-manager-packageserver                                 
vm03654 operator-lifecycle-manager-packageserver

Viewing the pods in the openshift-operator-lifecycle-manager for these clusters shows no packageserver pod:

# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get po -n openshift-operator-lifecycle-manager"
vm00096
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-9rm9j         1/1     Running     1 (15h ago)   15h
collect-profiles-28053720-kbsdn          0/1     Completed   0             33m
collect-profiles-28053735-dzkf8          0/1     Completed   0             18m
collect-profiles-28053750-skvcn          0/1     Completed   0             3m1s
olm-operator-66658fffbb-gj294            1/1     Running     0             15h
package-server-manager-654759688-bxnwj   1/1     Running     0             15h
vm00334
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-xcw9r         1/1     Running     1 (19h ago)   19h
collect-profiles-28053720-ppq6x          0/1     Completed   0             32m
collect-profiles-28053735-r2rvw          0/1     Completed   0             18m
collect-profiles-28053750-lgb4r          0/1     Completed   0             3m2s
olm-operator-66658fffbb-t4nxg            1/1     Running     0             19h
package-server-manager-654759688-6n7gp   1/1     Running     0             19h
vm00593
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-rwfwp         1/1     Running     1 (19h ago)   19h
collect-profiles-28053720-7p6tq          0/1     Completed   0             33m
collect-profiles-28053735-nqzn9          0/1     Completed   0             18m
collect-profiles-28053750-zppm6          0/1     Completed   0             3m2s
olm-operator-66658fffbb-4gcpv            1/1     Running     0             19h
package-server-manager-654759688-rbjdw   1/1     Running     0             19h
vm01095
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-2tp6j         1/1     Running     0          19h
collect-profiles-28053720-bnrfz          0/1     Completed   0          33m
collect-profiles-28053735-p8bl5          0/1     Completed   0          18m
collect-profiles-28053750-mg9nv          0/1     Completed   0          3m2s
olm-operator-66658fffbb-cb95l            1/1     Running     0          19h
package-server-manager-654759688-2mqdm   1/1     Running     0          19h
vm01192
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-2crgg         1/1     Running     0          19h
collect-profiles-28053720-2rknm          0/1     Completed   0          33m
collect-profiles-28053735-wc5dn          0/1     Completed   0          18m
collect-profiles-28053750-g5bhj          0/1     Completed   0          3m2s
olm-operator-66658fffbb-5hlh4            1/1     Running     0          19h
package-server-manager-654759688-xfp24   1/1     Running     0          19h
vm01447
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-p8gd4         1/1     Running     0             18h
collect-profiles-28053720-kjw4w          0/1     Completed   0             33m
collect-profiles-28053735-k7xxp          0/1     Completed   0             17m
collect-profiles-28053750-fn5gq          0/1     Completed   0             3m3s
olm-operator-66658fffbb-rshjq            1/1     Running     1 (18h ago)   18h
package-server-manager-654759688-hrmfd   1/1     Running     0             18h
vm01566
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-gbrnj         1/1     Running     0             19h
collect-profiles-28053720-2wdcp          0/1     Completed   0             33m
collect-profiles-28053735-t7x5b          0/1     Completed   0             18m
collect-profiles-28053750-wdmtt          0/1     Completed   0             3m3s
olm-operator-66658fffbb-fsxrx            1/1     Running     0             19h
package-server-manager-654759688-4mdz8   1/1     Running     1 (19h ago)   19h
vm01707
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-f2ns6         1/1     Running     0          17h
collect-profiles-28053720-72sjt          0/1     Completed   0          33m
collect-profiles-28053735-qzgx4          0/1     Completed   0          18m
collect-profiles-28053750-mrpbl          0/1     Completed   0          3m3s
olm-operator-66658fffbb-jwp2l            1/1     Running     0          17h
package-server-manager-654759688-f7bm4   1/1     Running     0          17h
vm01742
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-lhv6f         1/1     Running     1 (15h ago)   15h
collect-profiles-28053720-4kqtf          0/1     Completed   0             33m
collect-profiles-28053735-hw7kp          0/1     Completed   0             18m
collect-profiles-28053750-6ztq2          0/1     Completed   0             3m4s
olm-operator-66658fffbb-5sqlc            1/1     Running     0             15h
package-server-manager-654759688-n6sms   1/1     Running     0             15h
vm01798
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-kx7nx         1/1     Running     2 (13h ago)   13h
collect-profiles-28053720-7vlqq          0/1     Completed   0             33m
collect-profiles-28053735-m8ltn          0/1     Completed   0             18m
collect-profiles-28053750-hrfnk          0/1     Completed   0             3m4s
olm-operator-66658fffbb-5z74m            1/1     Running     1 (13h ago)   13h
package-server-manager-654759688-6jbnz   1/1     Running     0             13h
vm01810
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-v5vr6         1/1     Running     2 (19h ago)   19h
collect-profiles-28053720-m26dn          0/1     Completed   0             33m
collect-profiles-28053735-64j7f          0/1     Completed   0             18m
collect-profiles-28053750-qf69b          0/1     Completed   0             3m4s
olm-operator-66658fffbb-gxt2b            1/1     Running     0             19h
package-server-manager-654759688-dz6p6   1/1     Running     0             19h
vm02020
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-2qqk6         1/1     Running     0          19h
collect-profiles-28053720-5cktx          0/1     Completed   0          33m
collect-profiles-28053735-ls6n9          0/1     Completed   0          18m
collect-profiles-28053750-bj6gl          0/1     Completed   0          3m4s
olm-operator-66658fffbb-zsr4g            1/1     Running     0          19h
package-server-manager-654759688-2dnfd   1/1     Running     0          19h
vm02091
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-whftg         1/1     Running     1 (20h ago)   20h
collect-profiles-28053720-zqcbs          0/1     Completed   0             33m
collect-profiles-28053735-v8lf5          0/1     Completed   0             18m
collect-profiles-28053750-rshdd          0/1     Completed   0             3m5s
olm-operator-66658fffbb-876ps            1/1     Running     0             20h
package-server-manager-654759688-smc8q   1/1     Running     0             20h
vm02363
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-zgn5m         1/1     Running     1 (13h ago)   13h
collect-profiles-28053720-dpkqq          0/1     Completed   0             33m
collect-profiles-28053735-nfqmf          0/1     Completed   0             18m
collect-profiles-28053750-jfhdz          0/1     Completed   0             3m5s
olm-operator-66658fffbb-bbrgb            1/1     Running     1 (13h ago)   13h
package-server-manager-654759688-7pv96   1/1     Running     0             13h
vm02590
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-v9mvc         1/1     Running     2 (20h ago)   20h
collect-profiles-28053720-pfcbd          0/1     Completed   0             33m
collect-profiles-28053735-5dxbl          0/1     Completed   0             18m
collect-profiles-28053750-95f6g          0/1     Completed   0             3m5s
olm-operator-66658fffbb-5knlj            1/1     Running     0             20h
package-server-manager-654759688-7qkgb   1/1     Running     0             20h
vm02908
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-cnmjf         1/1     Running     0             18h
collect-profiles-28053720-ks6h7          0/1     Completed   0             33m
collect-profiles-28053735-r682b          0/1     Completed   0             18m
collect-profiles-28053750-9jrx4          0/1     Completed   0             3m5s
olm-operator-66658fffbb-7bd2v            1/1     Running     1 (18h ago)   18h
package-server-manager-654759688-5r6gq   1/1     Running     0             18h
vm03253
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-8wtgg         1/1     Running     2 (14h ago)   14h
collect-profiles-28053720-kwcgk          0/1     Completed   0             33m
collect-profiles-28053735-dv5hx          0/1     Completed   0             18m
collect-profiles-28053750-8xbmw          0/1     Completed   0             3m6s
olm-operator-66658fffbb-f2n9f            1/1     Running     0             14h
package-server-manager-654759688-tjlc9   1/1     Running     0             14h
vm03500
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-wdq9b         1/1     Running     0          17h
collect-profiles-28053720-jcmwf          0/1     Completed   0          33m
collect-profiles-28053735-tjw5j          0/1     Completed   0          18m
collect-profiles-28053750-5mjq9          0/1     Completed   0          3m6s
olm-operator-66658fffbb-q92bg            1/1     Running     0          17h
package-server-manager-654759688-2z656   1/1     Running     0          17h
vm03654
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-vq9wt         1/1     Running     0          17h
collect-profiles-28053720-dlknz          0/1     Completed   0          33m
collect-profiles-28053735-mshs7          0/1     Completed   0          18m
collect-profiles-28053750-86xrc          0/1     Completed   0          3m6s
olm-operator-66658fffbb-5qd99            1/1     Running     0          17h

https://github.com/openshift/operator-framework-olm/pull/502

Bug OCPBUGS-16073: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource-operator/pull/81

Bug OCPBUGS-19553: The file permission for pod specification files of the kube-apiserver should be updated to 600 to conform with CIS benchmarks

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-16796~~. The following is the description of the original issue:
—
Description of problem:

Observation from CISv1.4 pdf:
1.1.1 Ensure that the API server pod specification file permissions are set to 600 or more restrictive



“Ensure that the API server pod specification file has permissions of 600 or more restrictive.
OpenShift 4 deploys two API servers: the OpenShift API server and the Kube API server. The OpenShift API server delegates requests for Kubernetes objects to the Kube API server.
The OpenShift API server is managed as a deployment. The pod specification yaml for openshift-apiserver is stored in etcd.
The Kube API Server is managed as a static pod. The pod specification file for the kube-apiserver is created on the control plane nodes at /etc/kubernetes/manifests/kube-apiserver-pod.yaml. The kube-apiserver is mounted via hostpath to the kube-apiserver pods via /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml with permissions 600.”
 
To conform with CIS benchmarksChange, the pod specification file for the kube-apiserver /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml  files should be updated to 600.

$ for i in $( oc get pods -n openshift-kube-apiserver -l app=openshift-kube-apiserver -o name )
do                 
oc exec -n openshift-kube-apiserver $i -- \
stat -c %a /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml
done
644
644
644

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

The permission of the pod specification file for the kube-apiserver is 644.

Expected results:

The permission of the pod specification file for the kube-apiserver should be updated to 600.

Additional info:

PR: https://github.com/openshift/library-go/commit/19a42d2bae8ba68761cfad72bf764e10d275ad6e

Bug OCPBUGS-11280: Upgrade SNO: no resolv.conf caused by failure in forcedns dispatcher script

View the Description View the linked PRs

Description of problem:

There is forcedns dispatcher script added by assisted installed installation process that create etc/resolv.conf

This script has no shebang that caused installation to fail as no resolv.conf was generated.

I order to fix upgrades in already installed clusters we need to workaround this issue.

Version-Release number of selected component (if applicable):

4.13.0

How reproducible:

Happens every time

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3648

Bug OCPBUGS-14565: images: RHEL-8-based container image is broken

View the Description View the linked PRs

Description of problem:

Dockerfile.upi.ci.rhel8 does not work with the following error:

[3/3] STEP 26/32: RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/"
chmod: cannot access '/root/.bluemix/': No such file or directory
error: build error: building at STEP "RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/"": while running runtime: exit status 1

Version-Release number of selected component (if applicable):

master (and possibly all other branches where the ibmcli tool was introduced)

How reproducible:

always

Steps to Reproduce:

1. Try to use Dockerfile.ci.upi.rhel8
2.
3.

Actual results:

[3/3] STEP 26/32: RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/" chmod: cannot access '/root/.bluemix/': No such file or directory error: build error: building at STEP "RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/"": while running runtime: exit status 1

Expected results:

No failures

Additional info:

We should also change the downloading of the govc image with curl to importing it from the cached container in quay.io, as it is done in Dockerfile.ci.upi

https://github.com/openshift/installer/pull/7231

Feature Request RFE-2782: AWS Local Zone Support for OCP UPI/IPI

View the Description View the linked PRs

AWS Local Zone Support for OCP UPI/IPI

Current AWS Based OCP deployment models do not address Local Zones which offer lower latency and geo-proximity to OCP Cluster Consumers.

OCP Install Support for AWS Local Zones will address Customer Segments where low latency and data locality requirements enforce as deal breaker/show-stopper for our sales teams engagements.

https://github.com/openshift/installer/pull/6371

Bug OCPBUGS-12566: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-network-config-controller/pull/107

Bug OCPBUGS-14959: Error for DuplicateClusterRoleBinding and Edit ClusterRoleBinding subject in RHOCP4 Web Console

View the Description View the linked PRs

Description of problem:

When users are trying to DuplicateClusterRoleBinding and Edit ClusterRoleBinding subject in RHOCP web console , getting below error :
" Error Loading : Name parameter invalid: "system%3Acontroller%3A<name-of-role-ref>": may not contain '%' "

Version-Release number of selected component (if applicable):

Tested in OCP 4.12.18

How reproducible:

Always

Steps to Reproduce:

1. Open OpenShift web console
2. Select project : Openshift
3. Under User management -> Click Rolebindings
4. Look for any RoleBinding having Role Ref with format `system:<name>` 
5. At the end of that line, click on 3 dots where below options will be available :
- Duplicate ClusterRoleBinding
- Edit ClusterroleBinding subject
6. Select/click on any of the option

Actual results:

After selecting Duplicate ClusterRoleBinding or Edit ClusterroleBinding subject, getting below error :
Error Loading : Name parameter invalid: "system%3AXXX": may not contain '%'

Expected results:

After selecting Duplicate ClusterRoleBinding or Edit ClusterroleBinding subject, the correct/expected web page must be open.

Additional info:

When Duplicate or Edit RoleBinding `registry-registry-role` with Role Ref `system:registry` , it is working as expected.
When Duplicate or Edit RoleBinding `system:sdn-readers` with Role Ref `system:sdn-reader` , getting below error :
Error Loading : Name parameter invalid: "system%3Asdn-readers": may not contain '%'

Duplicate ClusterRoleBinding  or Edit ClusterRoleBindingBut subject working for few RoleBindings only (having Role ref system:<name>).

Screenshots are attached here : https://drive.google.com/drive/folders/1QHpdensG2gKx0tSv1zkF7Qiyert6eaSg?usp=sharing

https://github.com/openshift/console/pull/12939

Bug OCPBUGS-16374: Topology page is crashed

View the Description View the linked PRs

Description of problem:

The topology page is crashed

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Visit developer console
2. Topology view
3.

Actual results:

Error message:
TypeError
Description:
e is null
Component trace:
f@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~app/code-refs/actions~delete-revision~dev-console-add~dev-console-deployImage~dev-console-ed~cf101ec3-chunk-5018ae746e2320e4e737.min.js:26:14244
5363/t.a@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:177913
u@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:275718
8248/t.a<@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:475504
i@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:470135
withFallback()
5174/t.default@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:78258
s@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:237096
[...]
ne<@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:1592411
r@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:36:125397
t@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:21:58042
t@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:21:60087
t@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:21:54647
re@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:1592722
t.a@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:791129
t.a@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:1062384
s@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:613567
t.a@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:141:244663

Expected results:

No error should be there

Additional info:

Cloud Pak Operator is installed

https://github.com/openshift/console/pull/13093

Bug OCPBUGS-14578: HostPrefix/pod cidr mask is not setup correctly in the nodes

View the Description View the linked PRs

Description of problem:

In ROSA, user can be specified an HostPrefix, but we are currently not passing it to the HostedCluster CR. Trying to fix it, it seems that we are not setting up it correctly in the Nodes.

Version-Release number of selected component (if applicable):

4.12.16

How reproducible:

Always

Steps to Reproduce:

1. Create an HC. Inside the spec add 
  networking:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 25
2. Deploy the HC. Check its configuration.

Actual results:

oc get network cluster is showing the right config (see attachment) 
An oc describe node is always showing a /24 hostPrefix.

Note that this is valid also with the default value of /23. In the node, under podCIDR I always see something like
PodCIDR:                                   10.128.1.0/24 
PodCIDRs:                                  10.128.1.0/24

Expected results:

I would expect the pod cidr mask to be reflected in the pod configuration

Additional info:

pod cidr is correctly set

https://github.com/openshift/hypershift/pull/2731

Bug OCPBUGS-15100: [GWAPI] The DNS provider failed to ensure the record, invalid value for name (gcp)

View the Description View the linked PRs

Description of problem:

Running through instructions for a smoke test on 4.14, the DNS record is incorrectly created for the Gateway.  It is missing a trailing dot in the dnsName.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.Run through the steps in https://github.com/openshift/network-edge-tools/blob/2fd044d110eb737c94c8b86ea878a130cae0d03e/docs/blogs/EnhancedDevPreviewGatewayAPI/GettingStarted.md until the step "oc get dnsrecord -n openshift-ingress"
2. Check the status of the DNS record: "oc get dnsrecord xxx -n openshift-ingress -ojson | jq .status.zones[].conditions"

Actual results:

The status shows error conditions with a message like 'The DNS provider failed to ensure the record: googleapi: Error 400: Invalid value for ''entity.change.additions[*.gwapi.apps.ci-ln-3vxsgxb-72292.origin-ci-int-gce.dev.rhcloud.com][A].name'': ''*.gwapi.apps.ci-ln-3vxsgxb-72292.origin-ci-int-gce.dev.rhcloud.com'', invalid'

Expected results:

The status of the DNS record should show a successful publishing of the record.

Additional info:

Backport to 4.13.z

Bug OCPBUGS-13356: 'vendor' root device hint does not work correctly in ZTP/ABI

View the Description View the linked PRs

When the user specifies the 'vendor' hint, it actually checks for the value of the 'model' hint in the vendor field.

https://github.com/openshift/assisted-service/pull/5197

Bug OCPBUGS-15419: Title on Overview page has changed to "Cluster · Red Hat OpenShift"

View the Description View the linked PRs

Description of problem:

The title on Overview page has changed to "Cluster · Red Hat OpenShift" instead of "Overview · Red Hat OpenShift" that we had starting from 4.11.

Version-Release number of selected component (if applicable):

OCP 4.14

How reproducible:

Install OpenShift 4.14, login to management console and navigate to Home / Overview

Steps to Reproduce:

1. Install OpenShift 4.14 
2. login to management console 
3. Navigate to Home / Overview 
4. Load the HTML DOM and verify the HTML node <title>; title is also visible when hovering on the opened tab in Chrome or Firefox

Actual results:

Cluster · Red Hat OpenShift

HTML node: <title data-telemetry="Cluster" data-react-helmet="data-telemetry" xpath="1">Cluster · Red Hat OpenShift</title>

Expected results:

Overview · Red Hat OpenShift

Additional info:

started from 4.11 the title on that page was always Overview · Red Hat OpenShift. UI tests rely on consistent titles to detect currently opened web page. 

* It is important to notice the change has an effect on accessibility, since it is a common accessibility feature to navigate with the text speech.

https://github.com/openshift/console/pull/12951

Bug OCPBUGS-17142: [4.14] update packages in ironic containers

View the Description View the linked PRs

We'll do another pass of updates in the ironic containers

https://github.com/openshift/ironic-image/pull/387

Bug OCPBUGS-15906: ccoctl azure delete leaks role assignments

View the Description View the linked PRs

Description of problem:

Azure managed identity role assignments created using 'ccoctl azure' sub-commands are not cleaned up when running 'ccoctl azure delete'

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create Azure workload identity infrastructure using 'ccoctl azure create-all'
2. Delete Azure workload identity infrastructure using 'ccoctl azure delete'
3. Observe lingering role assignments in either the OIDC resource group if not deleted OR in the DNS Zone resource group if the OIDC resource group is deleted by providing '--delete-oidc-resource-group'.

Actual results:

Role assignments for managed identities are not deleted following 'ccoctl azure delete'

Expected results:

Role assignments for managed identities are deleted following 'ccoctl azure delete'

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/564

Bug OCPBUGS-17227: Cluster Provisioning fails with failed to fetch instance type

View the Description View the linked PRs

Description of problem:

Cluster Provisioning fails with the message:
Internal error: failed to fetch instance type, this error usually occurs if the region or the instance type is not found

This is likely because OCM uses GCP custom machine types, for example custom-4-16384 and now the installer is validating machine types per zone (see GetMachineTypeWithZones function), which don't include custom machine types.

See https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type#gcloud for more details.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

ocm create cluster cluster001 --provider=gcp --ccs=true --region=us-east1 --service-account-file=token.json --version="4.14.0-0.nightly-2023-08-02-102121-nightly" 2.

Actual results:

Cluster installation fails

Expected results:

Cluster installation succeeds

Additional info:

https://github.com/openshift/installer/pull/7388

Task HOSTEDCP-981: Make minor updates to Getting Started & Contribute pages

View the Description View the linked PRs

As a developer, I would like the Getting Started page to use numbered list so that it is easier to point people to specific sections of the document.

As a developer, I would like the Contribute page to be a numbered list so that it is easier to point people to specific line items of the document.

https://github.com/openshift/hypershift/pull/2527

Bug OCPBUGS-13355: make `oc` reuse the tokenrequest code from library-go

View the Description View the linked PRs

Description of problem:

Library-go contains code for creating token requests that should be reused by all OpenShift components. Because of time-constraints, this code did not make it to `oc` in the past.

Fix that to prevent code out-of-sync issues.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. see if https://github.com/openshift/oc/pull/991 merged

Actual results:

it hasn't merged at the time of writing this bug

Expected results:

it's merged

Additional info:

https://github.com/openshift/oc/pull/991

Bug OCPBUGS-7036: Add Git Repository (PAC) doesn't setup GitLab and Bitbucket configuration correct

View the Description View the linked PRs

Description of problem:
When adding a "Git Repository" (a tekton or pipelines Repository) and enter a GitLab or Bitbucket PAC repository the created Repository resource is invalid.

Version-Release number of selected component (if applicable):
411-4.13

How reproducible:
Always

Steps to Reproduce:
Setup a PAC git repo, you can mirror these projects if you want: https://github.com/jerolimov/nodeinfo-pac

For GitHub you need setup

an account-global "private access token" > a classic access token, see https://github.com/settings/tokens
a repo > webhook

For GitLab:

a repo > Project Access Tokens
a repo > webhook

For Bitbucket:

an account-global "app password, see https://bitbucket.org/account/settings/app-passwords/
a repo > webhook

On a cluster bot instance:

Install OpenShift Pipelines operator
Navigate to Developer perspective > Pipelines
Select Create > Repository
Enter a GitLab based git repository with Git access token and Webhook secret
Enter a Bitbucket based git repository with Git access token (webhook secret isn't supported)

Actual results:
The GitLab created resource looks like this:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: gitlab-nodeinfo-pac
spec:
  git_provider:
    secret:
      key: provider.token
      name: gitlab-nodeinfo-pac-token-gfr66
    url: gitlab.com   # missing schema
    webhook_secret:
      key: webhook.secret
      name: gitlab-nodeinfo-pac-token-gfr66
  url: 'https://gitlab.com/jerolimov/nodeinfo-pac'

The Bitbucket resource looks like this:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: bitbucket-nodeinfo-pac
spec:
  git_provider:
    secret:
      key: provider.token
      name: bitbucket-nodeinfo-pac-token-9pf75
    url: bitbucket.org   # missing schema and invalid API URL !
    webhook_secret:   # don't entered a webhook URL, see OCPBUGS-7035
      key: webhook.secret
      name: bitbucket-nodeinfo-pac-token-9pf75
  url: 'https://bitbucket.org/jerolimov/nodeinfo-pac'

The pipeline-as-code controller Pod log contains some error messages and no PipelineRun is created.

Expected results:
For GitLab:

The spec.git_provider.url should contain the schema https://, so it should be https://gitlab.com, or can be removed completely. Both work fine.
A working example:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: gitlab-nodeinfo-pac
spec:
  git_provider:
    secret:
      key: provider.token
      name: gitlab-nodeinfo-pac-token-gfr66
    url: https://gitlab.com
    webhook_secret:
      key: webhook.secret
      name: gitlab-nodeinfo-pac-token-gfr66
  url: 'https://gitlab.com/jerolimov/nodeinfo-pac'

Bitbucket:

The spec.git_provider.url should be https://api.bitbucket.org/2.0, or can be removed completely. Both work fine.
The Account Secret needs also a Bitbucket login name, passed as spec.git_provider.user.

A working example:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: bitbucket-nodeinfo-pac
spec:
  git_provider:
    user: jerolimov
    secret:
      key: provider.token
      name: bitbucket-nodeinfo-pac-token-9pf75
    webhook_secret:
      key: webhook.secret
      name: bitbucket-nodeinfo-pac-token-9pf75
  url: 'https://bitbucket.org/jerolimov/nodeinfo-pac'

A PipelineRun should be created for each push to the git repo.

Additional info:

Bitbucket use a small 2nd b.
For the Bitbucket issue see also https://github.com/openshift-pipelines/pipelines-as-code/issues/416

https://github.com/openshift/console/pull/12593

Task MGMT-14648: "sufficient-masters-count' failed" subsystem test is intermittently failing

View the Description View the linked PRs

The "sufficient-masters-count' failed" test is intermittently failing due to a suspected race condition that causes as duplicate cluster event.

"Cluster validation 'sufficient-masters-count' that used to succeed is now failing"

The aim of this ticket is to ensure that this test does not flake

https://github.com/openshift/assisted-service/pull/5223

Bug OCPBUGS-4194: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/650

Bug OCPBUGS-9956: update the default pipelineRun template name

View the Description View the linked PRs

Description of problem:

PipelineRun default template name has been updated in the backend in Pipeline operator 1.10, So we need to update the name in the UI code as well.

https://github.com/openshift/console/blob/master/frontend/packages/pipelines-plugin/src/components/pac/const.ts#L9

https://github.com/openshift/console/pull/12660

Bug OCPBUGS-10125: Update 4.14 ose-ibm-vpc-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/33

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/34

Bug OCPBUGS-10916: Secret name variable get renders in Create Image pull secret alert

View the Description View the linked PRs

Description of problem:

Seeing `Secret {{newImageSecret}} was created.` string for the created Image pull secret alert in the Container image flow.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Navigate +Add page
2. Open the Container Image form
3. click on Create an Image pull secret link and create a secret

Actual results:

Secret {{newImageSecret}} was created. get render in the alert

Expected results:

Secret <-Secret name-> was created. should render in the alert

Additional info:

https://github.com/openshift/console/pull/12681

Bug OCPBUGS-14877: Detect that number of configured hosts exceeds the replicas

View the Description View the linked PRs

Description of problem:

https://issues.redhat.com//browse/OCPBUGS-10342 tracked the issue when the number of replicas exceeded the number of hosts. However, it does not detect the case when the number of hosts exceeds the number of replicas as it was not counting the hosts correctly. Fix to detect this case correctly.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Set compute replicas in install-config.yaml
2. Add hosts in agent-config.yaml - 3 with role of master and more than 2 with role of worker.
3. The installation will fail and following error could be seen in the journal 
Jun 12 01:10:57 master-0 start-cluster-installation.sh[3879]: Hosts known and ready for cluster installation (5/3)

Actual results:

No warning regarding the number of configured hosts

Expected results:

A warning about the number of configured hosts not matching the replicas.

Additional info:

https://github.com/openshift/installer/pull/7268

Bug OCPBUGS-8381: Console shows x509 error when requesting token from oauth endpoint

View the Description View the linked PRs

Derscription of problem:

On a hypershift cluster that has public certs for OAuth configured, the console reports a x509 certificate error when attempting to display a token

Version-Release number of selected component (if applicable):

4.12.z

How reproducible:

always

Steps to Reproduce:

1. Create a hosted cluster configured with a letsencrypt certificate for the oauth endpoint.
2. Go to the console of the hosted cluster. Click on the user icon and get token.

Actual results:

The console displays an oauth cert error

Expected results:

The token displays

Additional info:

The hcco reconciles the oauth cert into the console namespace. However, it is only reconciling the self-signed one and not the one that was configured through .spec.configuration.apiserver of the hostedcluster. It needs to detect the actual cert used for oauth and send that one.

https://github.com/openshift/hypershift/pull/2279

Bug MGMT-13685: [PSI] API and Ingress VIPs fields should not allow broadcast IPs

View the Description View the linked PRs

Description of the problem:

BE 2.15.x, API and Ingress VIPs values doesn't have validation for broadcast IPs (i.e. if network is 192.168.123.0/24 --> 192.168.123.0 and 192.168.123.255).

How reproducible:

100%

Steps to reproduce:

1. Create cluster with Ingress or API vip with broadcast IP

Actual results:

Expected results:
BE should block those IPs

https://github.com/openshift/assisted-service/pull/5256

Bug OCPBUGS-10333: Workload annotation missing from deployments

View the Description View the linked PRs

Description of problem:

Missing workload annotations from deployments. This is in relation to the openshift/platform-operator repo.

Missing annotations.

Namespace name, `workload.openshift.io/allowed: management`

`target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'`. That annotation is required for the admission webhook to modify the resource for workload pinning. 

Related Enhancements: 
https://github.com/openshift/enhancements/pull/703 
https://github.com/openshift/enhancements/pull/1213

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/platform-operators/pull/82

Bug OCPBUGS-14138: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13078

Bug OCPBUGS-14301: KCM crashes when Topology cache's HasPopulatedHints method attempts concurrent map access

View the Description View the linked PRs

Description of problem:

KCM crashes when Topology cache's HasPopulatedHints method attempts concurrent map access

Miciah has started working on the upstream fix and we need to bring in the changes into openshift/kubernetes as soon as we can

https://redhat-internal.slack.com/archives/C01CQA76KMX/p1684876782205129 for more context

Version-Release number of selected component (if applicable):

How reproducible:

CI 4.14 upgrade jobs run into this problem quite often: https://search.ci.openshift.org/?search=pkg%2Fcontroller%2Fendpointslice%2Ftopologycache%2Ftopologycache.go&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:

Actual results:

KCM crashing

Expected results:

KCM not crashing

Additional info:

Bug OCPBUGS-12447: Origin should generate intervals for ovs-vswitchd Unreasonably long poll interval

View the Description View the linked PRs

We are pushing to find a resolution for ~~OCPBUGS-11591~~ and the SDN team has identified a key message that appears related in the system journald logs:

Apr 12 11:53:51.395838 ci-op-xs3rnrtc-2d4c7-4mhm7-worker-b-dwc7w ovs-vswitchd[1124]: ovs|00002|timeval(urcu4)|WARN|Unreasonably long 109127ms poll interval (0ms user, 0ms system)

We should detect this in origin and create an interval so it can be charted in the timelines, as well as a unit test that fails if detected so we can see where it's happening.

https://github.com/openshift/origin/pull/27889

Bug OCPBUGS-17782: [OVN-IC] Regression in Ovnkube-node container memory usage (380% increase)

View the Description View the linked PRs

Ovnkube-node container max memory usage was 110 MiB with 4.14.0-0.nightly-2023-05-18-231932 image and now it is 530 MiB with 4.14.0-0.nightly-2023-07-31-181848 image, for the same test (cluster-density-v2 with 800 iterations, churn=false) on 120 node environment. We observed the same pattern in the OVN-IC environment as well.

Note: As churn is false, we are calculating memory usage for only resource creation.

Grafana panel for OVN with 4.14.0-0.nightly-2023-05-18-231932 image -

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/H9pAb07fsPEOFyd5dhKLFP602A7S18uC

Grafana panel for OVN with 4.14.0-0.nightly-2023-07-31-181848 image -

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/8158bJgv3e4P2uiVernbc2E5ypBWFYHt

As the test was successfully run in the CI, we couldn't collect a must-gather. I can provide must-gather and pprof data if needed.

We observed 100 MiB to 550 MiB increase in OVN-IC between 4.14.0-0.nightly-2023-06-12-141936 and 4.14.0-0.nightly-2023-07-30-191504 versions.

OVN-IC 4.14.0-0.nightly-2023-06-12-141936

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/o5SXLdHIL8whsdgaMyXwWamipBP8J2fF

OVN-IC 4.14.0-0.nightly-2023-07-30-191504

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/NMuSQx7YAJ9jokoKMl6Me9StHp33tjwD

https://github.com/openshift/cluster-network-operator/pull/1971

Task OU-231: Add Gabriel and Jenny to OWNERS for console's components/monitoring/ dir

View the Description View the linked PRs

So that they can review and approve most observability UI changes that require console code changes.

https://github.com/openshift/console/pull/13069

Task MGMT-14042: Enable overriding ENABLE_DATA_COLLECTION

View the linked PRs

https://github.com/openshift/assisted-service/pull/5056

Bug MGMT-14108: PSI installations are producing readiness warning events

View the Description View the linked PRs

Description of the problem:

When invoking installation with assisted-service scripts (make deploy-all), as being done in installation for PSI env, the pods for assisted-service and assisted-image-service produce warning about readiness-probe validation that is failing:

Readiness probe failed: Get "http://172.28.8.39:8090/ready": dial tcp 172.28.8.39:8090: connect: connection refused

Those warnings are harmless, but they make people think that there is a problem with the running pods (or that they are not ready yet, even though the pods are marked as ready).

How reproducible:

100%

Steps to reproduce:

1. invoke make deploy-all on PSI or other places (for some reason it doesn't reproduce on minikube)

2. inspect the pod's conditions part with oc describe, and look for warnings

Actual results:

Warnings emitted

Expected results:
No warnings should be emitted for the initial setup time of each pod. The fix just requires setting initialDelaySeconds in the readinessProbe configuration, just like we did in the template: https://github.com/openshift/assisted-service/pull/4557
see also: https://github.com/openshift/assisted-service/pull/380#pullrequestreview-490308765

https://github.com/openshift/assisted-service/pull/5150

Bug OCPBUGS-10154: Update 4.14 ose-machine-api-provider-gcp image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/44

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/44

Bug OCPBUGS-12678: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/462

Bug OCPBUGS-7416: Load Kamelets as event sources/sinks from custom Camel K operator namespace

View the Description View the linked PRs

Description of problem:

ODC automatically loads all Camel K Kamelets from openshift-operators namespace in order to display those resources in the event sources/sinks catalog. This is not working when the Camel K operator is installed in another namespace (e.g. in Developer Sandbox the Camel K operator had to be installed in camel-k-operator namespace)

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Display event sources/sinks catalog in ODC on a cluster where Camel K is installed in a namespace other than openshift-operators (e.g. Developer Sandbox)

Steps to Reproduce:

1. Make sure to have a cluster where Knative eventing is available
2. Install Camel K operator in camel-k-operator namespace (e.g. via OLM)
3. Display the event source/sink catalog in ODC

Actual results:

No Kamelets are visible in the catalog

Expected results:

All Kamelets (automatically installed with the operator) should be visible as potential event sources/sinks in the catalog

Additional info:

The Kamelet resources are being watched in two namespaces (current user namespace and global operator namespace. https://github.com/openshift/console/blob/master/frontend/packages/knative-plugin/src/hooks/useKameletsData.ts#L12-L28

We should allow configuration of the global namespace or also add camel-k-operator namespace as 3rd place to look for installed Kamelets.

https://github.com/openshift/console/pull/12710

Bug OCPBUGS-19033: dnsmasq failing to start on bootstrap VM

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19017~~. The following is the description of the original issue:
—
dnsmasq isn't starting on okd-scos in the bootstrap VM

logs should it failing with "Operation not permitted"

https://github.com/openshift/installer/pull/7489

Task OU-218: Console monitoring UI should use `useResolvedExtensions` instead of `useExtensions`

View the Description View the linked PRs

`useExtensions` is not available in the dynamic plugin SDK, which prevents this functionality being copied to `monitoring-plugin`. `useResolvedExtensions` is available and provides the same functionality so we should use that instead.

https://github.com/openshift/console/pull/12987

Bug OCPBUGS-12662: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/212

Bug OCPBUGS-13303: exclude openshift-apiserver from health check for SNO static pod

View the Description View the linked PRs

For static pod readiness we check /readyz and /healthz endpoints for kube-apiserver. For SNO exclude openshift-apiserver from the health checks using the 'exclude' query parameter

Example:
> oc get --raw /readyz?verbose&exclude=api-openshift-apiserver-available

Should we also remove 'oauth-apiserver'?

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1492

Bug OCPBUGS-17064: Workers machinesets not created if replicas set to 0

View the Description View the linked PRs

Description of problem:

No MachineSet is created for workers if replicas == 0

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

replicas: 0 in install-config for workers

Steps to Reproduce:

1. Deploy a cluster with 0 worker
2. After deployment, list MachineSets
3. Zero can be found

Actual results:

No MachineSet found:
No resources found in openshift-machine-api namespace.

Expected results:

A worker MachineSet should have been created like before.

Additional info:

We broke it during CPMS integration.

https://github.com/openshift/installer/pull/7380

Bug OCPBUGS-17986: Update 4.14 ose-nutanix-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/18

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/18

Bug ACM-4127: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5027

Bug OCPBUGS-16526: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/122

Bug OCPBUGS-8224: Image Registry default to Removed on IBM cloud after 4.13.0-ec.3

View the Description View the linked PRs

Description of problem:

When install a cluster on IBM cloud, the image registry default to Removed, no storage configured after 4.13.0-ec.3
Image registry should use ibmcos object storage on IPI-IBM cluster 
https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/storage/storage.go#L182

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-27-101545

How reproducible:

always

Steps to Reproduce:

1.Install an IPI cluster on IBM cloud
2.Check image registry after install successfully
3.

Actual results:

oc get config.image/cluster -o yaml 
  spec:
    logLevel: Normal
    managementState: Removed
    observedConfig: null
    operatorLogLevel: Normal
    proxy: {}
    replicas: 1
    requests:
      read:
        maxWaitInQueue: 0s
      write:
        maxWaitInQueue: 0s
    rolloutStrategy: RollingUpdate
    storage: {}
    unsupportedConfigOverrides: null

oc get infrastructure cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-03-02T02:21:06Z"
  generation: 1
  name: cluster
  resourceVersion: "531"
  uid: 8d61a1e2-3852-40a2-bf5d-b7f9c92cda7b
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: IBMCloud
status:
  apiServerInternalURI: https://api-int.wxjibm32.ibmcloud.qe.devcluster.openshift.com:6443
  apiServerURL: https://api.wxjibm32.ibmcloud.qe.devcluster.openshift.com:6443
  controlPlaneTopology: HighlyAvailable
  etcdDiscoveryDomain: ""
  infrastructureName: wxjibm32-lmqh7
  infrastructureTopology: HighlyAvailable
  platform: IBMCloud
  platformStatus:
    ibmcloud:
      cisInstanceCRN: 'crn:v1:bluemix:public:internet-svcs:global:a/fdc2e14cf8bc4d53a67f972dc2e2c861:e8ee6ca1-4b31-4307-8190-e67f6925f83b::'
      location: eu-gb
      providerType: VPC
      resourceGroupName: wxjibm32-lmqh7
    type: IBMCloud

Expected results:

Image registry should use ibmcos object storage on IPI-IBM cluster

Additional info:

Must-gather log https://drive.google.com/file/d/1N-WUOZLRjlXcZI0t2O6MXsxwnsVPDCGQ/view?usp=share_link

https://github.com/openshift/cluster-image-registry-operator/pull/847

Bug MGMT-14973: Fix misleading logs showing wrong platform and user_managed_networking combination

View the Description View the linked PRs

Description of the problem:

When patching platform and leaving umn without change the logs shows "false" instead of nil, causing us to think that the cluster will not be in a not valid state (e.g. none + umn disabled)

time="2023-06-15T09:59:54Z" level=info msg="Platform verification completed, setting platform type to none and user-managed-networking to false" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).validateUpdateCluster" file="/assisted-service/internal/bminventory/inventory.go:1928" cluster_id=468bffe8-ce24-400e-a104-b0aab378eb75 go-id=94310 pkg=Inventory request_id=2fbb74ba-4390-4f27-b6fd-ee11ac1a7895

Steps to reproduce:

1. Create cluster with platform == OCI or vSphere with UMN enabled

2. Patch the cluster with "{"platfrom": {"type": "none"}}"

Actual results:

Log shows

setting platform type to none and user-managed-networking to false

Expected results:

setting platform type to none and user-managed-networking to nil

https://github.com/openshift/assisted-service/pull/5298

Bug OCPBUGS-13017: aws-ebs-csi-driver-controller-sa ServiceAccount does not include the HCP pull-secret in its imagePullSecrets

View the Description View the linked PRs

aws-ebs-csi-driver-controller-ca ServiceAccount does not include the HCP pull-secret in its imagePullSecrets. Thus, if a HostedCluster is created with a `pullSecret` that contains creds that the management cluster pull secret does not have, the image pull fails.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/219

Bug OCPBUGS-17359: CI fails because it pulls "openshift/origin-node" from Docker Hub and gets rate-limited

View the Description View the linked PRs

Description of problem

CI is flaky because tests pull the "openshift/origin-node" image from Docker Hub and get rate-limited:

E0803 20:44:32.429877    2066 kuberuntime_image.go:53] "Failed to pull image" err="rpc error: code = Unknown desc = reading manifest latest in docker.io/openshift/origin-node: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit" image="openshift/origin-node:latest"

This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/929/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/16871891662673059841687189166267305984. I don't know how to search for this failure using search.ci. I discovered the rate-limiting through Loki: https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%7B%22datasource%22:%22PCEB727DF2F34084E%22,%22queries%22:%5B%7B%22expr%22:%22%7Binvoker%3D%5C%22openshift-internal-ci%2Fpull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator%2F1687189166267305984%5C%22%7D%20%7C%20unpack%20%7C~%20%5C%22pull%20rate%20limit%5C%22%22,%22refId%22:%22A%22,%22editorMode%22:%22code%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%221691086303449%22,%22to%22:%221691122303451%22%7D%7D.

Version-Release number of selected component (if applicable)

This happened on 4.14 CI job.

How reproducible

I have observed this once so far, but it is quite obscure.

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check Loki using the following query:

{...} {invoker="openshift-internal-ci/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/*"} | unpack | systemd_unit="kubelet.service" |~ "pull rate limit"

Actual results

CI pulls from Docker Hub and fails.

Expected results

CI passes, or fails on some other test failure. CI should never pull from Docker Hub.

Additional info

We have been using the "openshift/origin-node" image in multiple tests for years. I have no idea why it is suddenly pulling from Docker Hub, or how we failed to notice that it was pulling from Docker Hub if that's what it was doing all along.

https://github.com/openshift/cluster-ingress-operator/pull/970

Bug OCPBUGS-19702: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7528

Bug OCPBUGS-8220: CSI Inline Volume admission plugin does not log object name correctly

View the Description View the linked PRs

Description of problem:

[CSI Inline Volume admission plugin] when using deployment/statefulset/daemonset workload with inline volume doesn't record audit logs/warning correctly

Version-Release number of selected component (if applicable):

4.13.0-0.ci.test-2023-03-02-013814-ci-ln-yd4m4st-latest (nightly build also could be reproduced)

How reproducible:

Always

Steps to Reproduce:

1. Enable feature gate to auto install the csi.sharedresource csi driver

2. Add security.openshift.io/csi-ephemeral-volume-profile: privileged to CSIDriver 'csi.sharedresource.openshift.io' # scale down the cvo,cso and shared-resource-csi-driver-operator $ oc scale --replicas=0 deploy/cluster-version-operator -n openshift-cluster-version deployment.apps/cluster-version-operator scaled $oc scale --replicas=0 deploy/cluster-storage-operator -n openshift-cluster-storage-operator deployment.apps/cluster-storage-operator scaled $ oc scale --replicas=0 deploy/shared-resource-csi-driver-operator -n openshift-cluster-csi-drivers deployment.apps/shared-resource-csi-driver-operator scaled # Add security.openshift.io/csi-ephemeral-volume-profile: privileged to CSIDriver $ oc get csidriver/csi.sharedresource.openshift.io -o yaml apiVersion: storage.k8s.io/v1 kind: CSIDriver metadata: annotations: csi.openshift.io/managed: "true" operator.openshift.io/spec-hash: 4fc61ff54015a7e91e07b93ac8e64f46983a59b4b296344948f72187e3318b33 creationTimestamp: "2022-10-26T08:10:23Z" labels: security.openshift.io/csi-ephemeral-volume-profile: privileged

3. Create different workloads with inline volume in a restricted namespace
$ oc apply -f examples/simple 
role.rbac.authorization.k8s.io/shared-resource-my-share-pod created 
rolebinding.rbac.authorization.k8s.io/shared-resource-my-share-pod created configmap/my-config created sharedconfigmap.sharedresource.openshift.io/my-share-pod created 
Error from server (Forbidden): error when creating "examples/simple/03-pod.yaml": pods "my-csi-app-pod" is forbidden: admission denied: pod my-csi-app-pod uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security enforce level that is lower than privileged 
Error from server (Forbidden): error when creating "examples/simple/04-deployment.yaml": deployments.apps "mydeployment" is forbidden: admission denied: pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security enforce level that is lower than privileged 
Error from server (Forbidden): error when creating "examples/simple/05-statefulset.yaml": statefulsets.apps "my-sts" is forbidden: admission denied: pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security enforce level that is lower than privileged

4.  Add enforce: privileged label to the test ns and create different workloads with inline volume again 
$ oc label ns/my-csi-app-namespace security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=restricted pod-security.kubernetes.io/warn=restricted --overwrite
namespace/my-csi-app-namespace labeled

$ oc apply -f examples/simple                    
role.rbac.authorization.k8s.io/shared-resource-my-share-pod created
rolebinding.rbac.authorization.k8s.io/shared-resource-my-share-pod created
configmap/my-config created
sharedconfigmap.sharedresource.openshift.io/my-share-pod created
Warning: pod my-csi-app-pod uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security warn level that is lower than privileged
pod/my-csi-app-pod created
Warning: pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security warn level that is lower than privileged
deployment.apps/mydeployment created
daemonset.apps/my-ds created
statefulset.apps/my-sts created

$ oc get po                                               
NAME                            READY   STATUS    RESTARTS   AGE
my-csi-app-pod                  1/1     Running   0          34s
my-ds-cw4k7                     1/1     Running   0          32s
my-ds-sv9vp                     1/1     Running   0          32s
my-ds-v7f9m                     1/1     Running   0          32s
my-sts-0                        1/1     Running   0          31s
mydeployment-664cd95cb4-4s2cd   1/1     Running   0          33s

5. Check the api-server audit logs
$ oc adm node-logs ip-10-0-211-240.us-east-2.compute.internal --path=kube-apiserver/audit.log | grep 'uses an inline volume provided by'| tail -1 | jq . | grep 'CSIInlineVolumeSecurity'
    "storage.openshift.io/CSIInlineVolumeSecurity": "pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security audit level that is lower than privileged"

Actual results:

In step 3 and step 4: deployment workloads the warning info pod name is empty
statefulset/daemonset workloads the warning info doesn't display
In step 5: audit logs the pod name is empty

Expected results:

In step 3 and step 4: deployment workloads the warning info pod name should be exist
statefulset/daemonset workloads the warning info should display
In step 5: audit logs the pod name shouldn't be empty it should record the workload type and pod specific names

Additional info:

Testdata:
https://github.com/Phaow/csi-driver-shared-resource/tree/test-inlinevolume/examples/simple

https://github.com/openshift/kubernetes/pull/1499

Bug OCPBUGS-8687: MAPO failing to retrieve flavour information after rotating credentials

View the Description View the linked PRs

Description of problem:

When running a cluster on application credentials, this event appears repeatedly:

ns/openshift-machine-api machineset/nhydri0d-f8dcc-kzcwf-worker-0 hmsg/173228e527 - pathological/true reason/ReconcileError could not find information for "ci.m1.xlarge"

Version-Release number of selected component (if applicable):

How reproducible:

Happens in the CI (https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/33330/rehearse-33330-periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.13-e2e-openstack-ovn-serial/1633149670878351360).

Steps to Reproduce:

1. On a living cluster, rotate the OpenStack cloud credentials
2. Invalidate the previous credentials
3. Watch the machine-api events (`oc -n openshift-machine-api get event`). A `Warning` type of issue could not find information for "name-of-the-flavour" will appear.

If the cluster was installed using a password that you can't invalidate:
1. Rotate the cloud credentials to application credentials
2. Restart MAPO (`oc -n openshift-machine-api get pods -o NAME | xargs -r oc -n openshift-machine-api delete`)
3. Rotate cloud credentials again
4. Revoke the first application credentials you set
5. Finally watch the events (`oc -n openshift-machine-api get event`)

The event signals that MAPO wasn't able to update flavour information on the MachineSet status.

Actual results:

Expected results:

No issue detecting the flavour details

Additional info:

Offending code likely around this line: https://github.com/openshift/machine-api-provider-openstack/blob/bcb08a7835c08d20606d75757228fd03fbb20dab/pkg/machineset/controller.go#L116

https://github.com/openshift/machine-api-provider-openstack/pull/63

Story MGMT-14125: Use systemd unit instead of dracut hook to configure network

View the Description View the linked PRs

Currently the assisted installer adds to the ISO a dracut hook that is executed early during the boot process. That hook generates the NetworkManager configuration files that will be used during the boot and also once the machine is installed. But that hook is not guaranteed to run before NetworkManager, and the files it generates may not be loaded by NetworkManager at the right time. We have seen such issues in the recent upgrade from RHEL 8 to RHEL 9 that is part of OpenShift 4.13. The RCHOS team recommends replacing it with a systemd unit that runs before NetworkManager.

https://github.com/openshift/assisted-service/pull/5107

Bug OCPBUGS-10167: Update 4.14 ose-gcp-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/29

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/29

Bug OCPBUGS-17253: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1529

Bug OCPBUGS-10190: Update 4.14 ose-machine-api-provider-azure image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/53

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/53

Bug OCPBUGS-10762: Machine should be Failed if Machine has a Failed state on Azure

View the Description View the linked PRs

Description of problem:

When creating machine and attaching Azure Ultra Disks as Data Disks in Arm cluster, machine is Provisioned, but checked in azure web console, instance is failed with error ZonalAllocationFailed.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-arm64-2023-03-22-204044

How reproducible:

Always

Steps to Reproduce:


/// Not Needed up to point 6 ////

1. Make sure storagecluster is already present
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ultra-disk-sc
provisioner: disk.csi.azure.com # replace with "kubernetes.io/azure-disk" if aks version is less than 1.21
volumeBindingMode: WaitForFirstConsumer # optional, but recommended if you want to wait until the pod that will use this disk is created 
parameters:
  skuname: UltraSSD_LRS
  kind: managed
  cachingMode: None
  diskIopsReadWrite: "2000"  # minimum value: 2 IOPS/GiB 
  diskMbpsReadWrite: "320"   # minimum value: 0.032/GiB
2. Create a new custom secret using the worker-data-secret  
$ oc -n openshift-machine-api get secret worker-user-data --template='{{index .data.userData | base64decode}}' | jq > userData.txt
3. Edit the userData.txt by adding below part just before the ending '}' and add a comma 
"storage": {
  "disks": [
    {
      "device": "/dev/disk/azure/scsi1/lun0",
      "partitions": [
        {
          "label": "lun0p1",
          "sizeMiB": 1024,
          "startMiB": 0
        }
      ]
    }
  ],
  "filesystems": [
    {
      "device": "/dev/disk/by-partlabel/lun0p1",
      "format": "xfs",
      "path": "/var/lib/lun0p1"
    }
  ]
},
"systemd": {
  "units": [
    {
      "contents": "[Unit]\nBefore=local-fs.target\n[Mount]\nWhere=/var/lib/lun0p1\nWhat=/dev/disk/by-partlabel/lun0p1\nOptions=defaults,pquota\n[Install]\nWantedBy=local-fs.target\n",
      "enabled": true,
      "name": "var-lib-lun0p1.mount"
    }
  ]
}
4. Extract the disabling template value using below
$ oc -n openshift-machine-api get secret worker-user-data --template='{{index .data.disableTemplating | base64decode}}' | jq > disableTemplating.txt
5. Merge the two files to create a datasecret file to be used 
$ oc -n openshift-machine-api create secret generic worker-user-data-x5 --from-file=userData=userData.txt --from-file=disableTemplating=disableTemplating.txt 


/// Not needed up to here ///

6.modify the new machineset yaml with below datadisk being seperate field as the osDisks 
          dataDisks:
          - nameSuffix: ultrassd
            lun: 0
            diskSizeGB: 4 # The same issue on the machine status fields is reproducible on x86_64 by setting 65535 to overcome the maximum limits of the Azure accounts we use.
            cachingType: None
            deletionPolicy: Delete
            managedDisk:
              storageAccountType: UltraSSD_LRS
7. scale up machineset or delete an existing machine to force the reprovisioning.

Actual results:

Machine stuck in Provisoned phase, but check from azure, it failed
$ oc get machine -o wide                
NAME                                        PHASE         TYPE               REGION      ZONE   AGE     NODE                                        PROVIDERID                                                                                                                                                                              STATE
zhsunaz3231-lds8h-master-0                  Running       Standard_D8ps_v5   centralus   1      4h15m   zhsunaz3231-lds8h-master-0                  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-0                  Running
zhsunaz3231-lds8h-master-1                  Running       Standard_D8ps_v5   centralus   2      4h15m   zhsunaz3231-lds8h-master-1                  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-1                  Running
zhsunaz3231-lds8h-master-2                  Running       Standard_D8ps_v5   centralus   3      4h15m   zhsunaz3231-lds8h-master-2                  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-2                  Running
zhsunaz3231-lds8h-worker-centralus1-sfhs7   Provisioned   Standard_D4ps_v5   centralus   1      3m23s                                               azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-worker-centralus1-sfhs7   Creating

$ oc get machine zhsunaz3231-lds8h-worker-centralus1-sfhs7 -o yaml
  - lastTransitionTime: "2023-03-23T06:07:32Z"
    message: 'Failed to check if machine exists: vm for machine zhsunaz3231-lds8h-worker-centralus1-sfhs7
      exists, but has unexpected ''Failed'' provisioning state'
    reason: ErrorCheckingProvider
    status: Unknown
    type: InstanceExists
  - lastTransitionTime: "2023-03-23T06:07:05Z"
    status: "True"
    type: Terminable
  lastUpdated: "2023-03-23T06:07:32Z"
  phase: Provisioned

Expected results:

Machine should be failed if failed in azure

Additional info:

must-gather: https://drive.google.com/file/d/1z1gyJg4NBT8JK2-aGvQCruJidDHs0DV6/view?usp=sharing

https://github.com/openshift/machine-api-provider-azure/pull/56

Task OU-198: Enable Alertmanager config web console UI tests

View the Description View the linked PRs

Background

Tests were temporarily disabled by https://issues.redhat.com//browse/OCPBUGS-14964

Outcomes

All Alertmanager config page UI tests should be running again in CI.

https://github.com/openshift/console/pull/12943

Bug MGMT-14904: Ignition override max size returns error 500

View the Description View the linked PRs

Description of the problem:

Staging , Ignition override test was passing successfully before , looks like in latest code the returned api code exception changed to 500 (internal server error) .

Before that we have error 400 api code exception.

(Pdb++) cluster.patch_discovery_ignition(ignition=ignition_override)
 'image_type': None,
 'kernel_arguments': None,
 'proxy': None,
 'pull_secret': None,
 'ssh_authorized_key': None,
 'static_network_config': None}     (/home/benny/assisted-test-infra/src/service_client/assisted_service_api.py:169)
*** assisted_service_client.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'vary': 'Accept-Encoding,Origin', 'date': 'Sun, 11 Jun 2023 04:26:53 GMT', 'content-length': '141', 'x-envoy-upstream-service-time': '1538', 'server': 'envoy', 'set-cookie': 'bd0de3dae0f495ebdb32e3693e2b9100=de3a34d29f1e78d0c404b6c5e84b502b; path=/; HttpOnly; Secure; SameSite=None'})
HTTP response body: {"code":"500","href":"","id":500,"kind":"Error","reason":"The ignition archive size (365 KiB) is over the maximum allowable size (256 KiB)"}
Traceback (most recent call last):
  File "/home/benny/assisted-test-infra/src/assisted_test_infra/test_infra/helper_classes/cluster.py", line 501, in patch_discovery_ignition
    self._infra_env.patch_discovery_ignition(ignition_info=ignition)
  File "/home/benny/assisted-test-infra/src/assisted_test_infra/test_infra/helper_classes/infra_env.py", line 116, in patch_discovery_ignition
    self.api_client.patch_discovery_ignition(infra_env_id=self.id, ignition_info=ignition_info)
  File "/home/benny/assisted-test-infra/src/service_client/assisted_service_api.py", line 407, in patch_discovery_ignition
    self.update_infra_env(infra_env_id=infra_env_id, infra_env_update_params=infra_env_update_params)
  File "/home/benny/assisted-test-infra/src/service_client/assisted_service_api.py", line 170, in update_infra_env
    self.client.update_infra_env(infra_env_id=infra_env_id, infra_env_update_params=infra_env_update_params)
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api/installer_api.py", line 1696, in update_infra_env
    (data) = self.update_infra_env_with_http_info(infra_env_id, infra_env_update_params, **kwargs)  # noqa: E501
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api/installer_api.py", line 1767, in update_infra_env_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api_client.py", line 325, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api_client.py", line 157, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api_client.py", line 383, in request
    return self.rest_client.PATCH(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/rest.py", line 289, in PATCH
    return self.request("PATCH", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/rest.py", line 228, in request
    raise ApiException(http_resp=r)
(Pdb++)

How reproducible:

Always

Steps to reproduce:

Run test:
test_discovery_ignition_exceed_size_limit
Actual results:

Returns error 500

Expected results:

erorr 400

https://github.com/openshift/assisted-service/pull/5291

Bug OCPBUGS-10169: Update 4.14 telemeter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/452

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/telemeter/pull/452

Bug OCPBUGS-12260: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/148

Bug OCPBUGS-15210: Could not update rolebinding "openshift-monitoring/cluster-monitoring-operator-techpreview-only"

View the Description View the linked PRs

Description of problem:

The upgrade to 4.14.0-ec.2 from 4.14.0-ec.1 was blocked by the error message on the UI:

Could not update rolebinding "openshift-monitoring/cluster-monitoring-operator-techpreview-only" (531 of 993): the object is invalid, possibly due to local cluster configuration

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Unblocked by 

oc --context build02 delete rolebinding cluster-monitoring-operator-techpreview-only -n openshift-monitoring --as system:admin
rolebinding.rbac.authorization.k8s.io "cluster-monitoring-operator-techpreview-only" deleted

https://github.com/openshift/cluster-monitoring-operator/pull/2008

Bug OCPBUGS-12659: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/71

Bug OCPBUGS-13808: Console SDK components should be using GroupVersionKind object

View the Description View the linked PRs

Description of problem:

Some of the components in Console Dynamic Plugin SDK take `GroupVersionKind` type, which is string for the `groupVersionKind` prop, but instead they should be using new `K8sGroupVersionKind` object.

Version-Release number of selected component (if applicable):

How reproducible:

always

Bug OCPBUGS-10153: Update 4.14 ose-kube-storage-version-migrator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/192

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/194

Bug OCPBUGS-16540: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/164

Bug OCPBUGS-17252: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer/pull/695

Bug OCPBUGS-13408: $ openshift-install agent create agent-config-template --dir=./foo gives nothing in the INFO log-level

View the Description View the linked PRs

Description of problem:

The agent-config-template creation command give no INFO log in the output, however, it generates the file.

Version-Release number of selected component (if applicable):

v4.13

How reproducible:

$ openshift-install agent create agent-config-template --dir=./foo

Steps to Reproduce:

1.
2.
3.

Actual results:

$ openshift-install agent create agent-config-template --dir=./foo
INFO

Expected results:

Additional info:

$ openshift-install agent create agent-config-template --dir=./foo
INFO Created Agent Config Template in . directory

https://github.com/openshift/installer/pull/7408

Bug OCPBUGS-16656: Devfile import fails on master branch

View the Description View the linked PRs

Description of problem:

On the openshift/console master branch, a devfile import fails by default. I have noticed that when a repository url has a .git extension, the pod fails due to a bug where the container image is trying to pull from dockerhub rather than the openshift image registry. For example, the container image is Image:          devfile-sample-code-with-quarkus.git:latest but the image from the imagestreamtag is image-registry.openshift-image-registry.svc:5000/maysun/devfile-sample-code-with-quarkus.git@sha256:e6aa9d29be48b33024eb271665d11a7557c9f140c9bd58aeb19fe4570fffb421.

A pod describe shows the expected error "Failed to pull image "devfile-sample-code-with-quarkus.git:latest": rpc error: code = Unknown desc = reading manifest latest in docker.io/library/devfile-sample-code-with-quarkus.git: requested access to the resource is denied".

However, during import, if you were to remove the .git extention from the repository link, the import is successful.

I only see this on the master branch and it seems to be fine on my local crc which is on OpenShift version: 4.13.0

Version-Release number of selected component (if applicable):

4.13.z

How reproducible:

Always

Steps to Reproduce:

1. Build from openshift/console master
2. Import Devfile sample
3. If repo has a .git extension, pod fails with the wrong image

Actual results:

POD describe:

Failed to pull image "devfile-sample-code-with-quarkus.git:latest": rpc error: code = Unknown desc = reading manifest latest in docker.io/library/devfile-sample-code-with-quarkus.git: requested access to the resource is denied

Expected results:

Successful running pod

Additional info:

Fine on Openshift 4.13.0, tested on local crc:

$ crc version
WARN A new version (2.23.0) has been published on https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.23.0/crc-macos-installer.pkg 
CRC version: 2.20.0+f3a947
OpenShift version: 4.13.0
Podman version: 4.4.4

https://github.com/openshift/console/pull/13050

Bug OCPBUGS-17264: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/68

Bug OCPBUGS-7801: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12626

Bug OCPBUGS-19731: [Nutanix]No host has enough available memory for VM, machine stuck in Provisioning and machineset scale/delete cannot delete machines

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-5969~~. The following is the description of the original issue:
—
Description of problem:

Nutanix machine without enough memory stuck in Provisioning and machineset scale/delete cannot work

Version-Release number of selected component (if applicable):

Server Version: 
4.12.0
4.13.0-0.nightly-2023-01-17-152326

How reproducible:

Always

Steps to Reproduce:

1. Install Nutanix Cluster 
Template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/tree/master/functionality-testing/aos-4_12/ipi-on-nutanix//versioned-installer
master_num_memory: 32768
worker_num_memory: 16384
networkType: "OVNKubernetes"
installer_payload_image: quay.io/openshift-release-dev/ocp-release:4.12.0-x86_64 2.
3. Scale up the cluster worker machineset from 2 replicas to 40 replicas
4. Install a Infra machinesets with 3 replicas, and a Workload machinesets with 1 replica
Refer to this doc https://docs.openshift.com/container-platform/4.11/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-nutanix_creating-infrastructure-machinesets  and config the following resource
VCPU=16
MEMORYMB=65536
MEMORYSIZE=64Gi

Actual results:

1. The new infra machines stuck in 'Provisioning' status for about 3 hours.

% oc get machines -A | grep Prov                                               
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      175m

2. Checking the Nutanix web console, I found 
infra machine 'qili-nut-big-jh468-infra-jnznv' had the following msg
"
No host has enough available memory for VM qili-nut-big-jh468-infra-48mdt (8d7eb6d6-a71e-4943-943a-397596f30db2) that uses 4 vCPUs and 65536MB of memory. You could try downsizing the VM, increasing host memory, power off some VMs, or moving the VM to a different host. Maximum allowable VM size is approximately 17921 MB
"

infra machine 'qili-nut-big-jh468-infra-jnznv' is not round

infra machine 'qili-nut-big-jh468-infra-xp7xb' is in green without warning.
But In must gather I found some error:
03:23:49openshift-machine-apinutanixcontrollerqili-nut-big-jh468-infra-xp7xbFailedCreateqili-nut-big-jh468-infra-xp7xb: reconciler failed to Create machine: failed to update machine with vm state: qili-nut-big-jh468-infra-xp7xb: failed to get node qili-nut-big-jh468-infra-xp7xb: Node "qili-nut-big-jh468-infra-xp7xb" not found

3. Scale down the worker machineset from 40 replicas to 30 replicas can not work. Still have 40 Running worker machines and 40 Ready nodes after about 3 hours.

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-infra      3         3                             176m
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h1m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             176m

% oc get machines -A | grep worker| grep Running -c
40

% oc get nodes | grep worker | grep Ready -c
40

4. I delete the infra machineset, but the machines still in Provisioning status and won't get deleted

% oc delete machineset -n openshift-machine-api   qili-nut-big-jh468-infra
machineset.machine.openshift.io "qili-nut-big-jh468-infra" deleted

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h26m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             3h21m

% oc get machines -A | grep -v Running
NAMESPACE               NAME                                PHASE          TYPE   REGION    ZONE              AGE
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-workload-qdkvd                                                     3h22m

Expected results:

The new infra machines should be either Running or Failed.
Cluster worker machinest scaleup and down should not be impacted.

Additional info:

must-gather download url will be added to the comment.

https://github.com/openshift/machine-api-provider-nutanix/pull/53

Bug OCPBUGS-3176: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-12559: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/80

Bug OCPBUGS-9357: On an SNO node one of the CatalogSources gets deleted after multiple reboots

View the Description View the linked PRs

Description of problem:

On an SNO node one of the CatalogSources gets deleted after multiple reboots.

In the initial stage we have 2 catalogsources:

$ oc get catsrc -A
NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE
openshift-marketplace certified-operators Intel SRIOV-FEC Operator grpc Red Hat 20h
openshift-marketplace redhat-operators Red Hat Operators Catalog grpc Red Hat 18h

After running several node reboots, one of the catalogsouce doesn't show up anylonger:

$ oc get catsrc -A
NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE
openshift-marketplace certified-operators Intel SRIOV-FEC Operator grpc Red Hat 21h

Version-Release number of selected component (if applicable):
4.11.0-fc.3

How reproducible:
Inconsistent but reproducible

Steps to Reproduce:

1. Deploy and configure SNO node via ZTP process. Configuration sets up 2 CatalogSources in a restricted environment for redhat-operators and certified-operators

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: certified-operators
namespace: openshift-marketplace
spec:
displayName: Intel SRIOV-FEC Operator
image: registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/olm/far-edge-sriov-fec:v4.11
publisher: Red Hat
sourceType: grpc
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: redhat-operators
namespace: openshift-marketplace
spec:
displayName: Red Hat Operators Catalog
image: registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/olm/redhat-operators:v4.11
publisher: Red Hat
sourceType: grpc

2. Reboot the node via `sudo reboot` several times

3. Check catalogsources

Actual results:

$ oc get catsrc -A
NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE
openshift-marketplace certified-operators Intel SRIOV-FEC Operator grpc Red Hat 22h

Expected results:

All catalogsources created initially are still present.

Additional info:

Attaching must-gather.

https://github.com/operator-framework/operator-marketplace/pull/520

Bug OCPBUGS-15852: Single node cannot be installed if etcd appears in the hostname

View the Description View the linked PRs

Description of problem:

Users cannot install single-node-openshift if the hostname contains the word etcd

Version-Release number of selected component (if applicable):

Probably since 4.8

How reproducible:

100%

Steps to Reproduce:

1. Install SNO with either Assisted or BIP
2. Make sure node hostname is etcd-1 (e.g. via DHCP hostname)

Actual results:

Bootstrap phase never ends

Expected results:

Bootstrap phase should complete successfully

Additional info:

This code is the likely culprit - it uses a naive way to check if etcd is running, accidentally capturing the node name (which contains etcd) in the crictl output as "evidence" that etcd is still running, so it never completes.

See ~~OCPBUGS-15826~~ (aka ~~AITRIAGE-7677~~)

https://github.com/openshift/installer/pull/7304

Bug OCPBUGS-7353: CheckNodePerf firing on infra nodes.

View the Description View the linked PRs

Description of problem:

CheckNodePerf is running on non master nodes, when the worker role label is not present.

Version-Release number of selected component (if applicable):

How reproducible:

in a Vmware cluster create a infra MCP, and label a node as role:infra

vsphere-problem-detector-operator will produce CheckNodePerf alerts and logs like

CheckNodePerf: xxxxxx failed: master node has disk latency of greater than 100ms

https://docs.openshift.com/container-platform/4.10/machine_management/creating-infrastructure-machinesets.html#creating-infra-machines_creating-infrastructure-machinesets

Steps to Reproduce:

1.
2.
3.

Actual results:

CheckNodePerf: xxxxx failed: master node has disk latency of greater than 100ms

Expected results:

no log entry, and no alert

Additional info:

The code only considers worker and master labels, also very complex nesting of conditions.

https://github.com/openshift/vsphere-problem-detector/blob/ca408db88a70cfa5aefa3128dff971a555994c29/pkg/check/node_perf.go#L133-L143

https://github.com/openshift/vsphere-problem-detector/pull/106

Task MGMT-15595: Split assisted-service client into separate go module

View the Description View the linked PRs

This will allow the installer to depend on just the client/api/models modules, and not pull in all of the dependencies of the service (such as libnmstate).

https://github.com/openshift/assisted-service/pull/5434

Bug OCPBUGS-13533: Sync ironic-image with upstream metal3

View the Description View the linked PRs

Regular sync with upstream source on metal3

https://github.com/openshift/ironic-image/pull/368

Bug OCPBUGS-17940: IPI on Power VS cannot deploy image-registry operator in the disconnected scenario.

View the Description View the linked PRs

Description of problem:

When deploying a disconnected cluster with the installer, the image-registry operator will fail to deploy because it cannot reach the COS endpoint.

Version-Release number of selected component (if applicable):

How reproducible:

Easily

Steps to Reproduce:

1. Deploy a disconnected cluster with the installer
2. Watch the image-registry operator, it will  fail to deploy

Actual results:

image-registry operator doesn't deploy because the COS endpoint is unreachable.

Expected results:

image-registry operator should deploy

Additional info:

Fix identified.

https://github.com/openshift/installer/pull/7430

Bug OCPBUGS-12651: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/114

Bug OCPBUGS-14810: Update OWNERS and OWNERS_ALIASES in livenessprobe repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-livenessprobe/pull/45

Bug OCPBUGS-15918: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1903

Bug OCPBUGS-19465: Cluster Version Operator does not correctly reconcile SCC resources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18386~~. The following is the description of the original issue:
—
How reproducible:

Always

Steps to Reproduce:

1. the Kubernetes API introduces a new Pod Template parameter (`ephemeral`)
2. this parameter is not in the allowed list of the default SCC
3. customer is not allowed to edit the default SCCs nor we have a  mechanism in  place to update the built in SCCs AFAIK
4. users of existing clusters cannot use the new parameter without creating manual SCCs and assigning this SCC to service accounts themselves which looks clunky. This is documented in https://access.redhat.com/articles/6967808

Actual results:

Users of existing clusters cannot use ephemeral volumes after an upgrade

Expected results:

Users of existing clusters *can* use ephemeral volumes after an upgrade

Current status

https://github.com/openshift/cluster-version-operator/pull/972

Bug OCPBUGS-13148: cgroupv1 support for cpu balancing is broken for non-SNO nodes

View the Description View the linked PRs

Description of problem:

Deployment of a standard masters+workers cluster using 4.13.0-rc.6 does not configure the cgroup structure according to OCPNODE-1539

Version-Release number of selected component (if applicable):

OCP 4.13.0-rc.6

How reproducible:

Always

Steps to Reproduce:

1. Deploy the cluster
2. Check for presence of /sys/fs/cgroup/cpuset/system*
3. Check the status of cpu balancing of the root cpuset cgroup (should be disabled)

Actual results:

No system cpuset exists and all services are still present in the root cgroup with cpu balancing enabled.

Expected results:

Additional info:

The code has a bug we missed. It is nested under the Workload partitioning check on line https://github.com/haircommander/cluster-node-tuning-operator/blob/123e26df30c66fd5c9836726bd3e4791dfd82309/pkg/performanceprofile/controller/performanceprofile/components/machineconfig/machineconfig.go#L251

Bug OCPBUGS-19379: Intermittent 504 Gateway Time-out

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18999~~. The following is the description of the original issue:
—
Description of problem:

Image pulls fail with http status 504, gateway timeout until image registry pods are restarted.

Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Intermittent

Steps to Reproduce:

1.
2.
3.

Actual results:

Images can't be pulled: 
podman pull registry.ci.openshift.org/ci/applyconfig:latest Trying to pull registry.ci.openshift.org/ci/applyconfig:latest... Getting image source signatures Error: reading signatures: downloading signatures for sha256:83c1b636069c3302f5ba5075ceeca5c4a271767900fee06b919efc3c8fa14984 in registry.ci.openshift.org/ci/applyconfig: received unexpected HTTP status: 504 Gateway Time-out


Image registry pods contain errors:
time="2023-09-01T02:25:39.596485238Z" level=warning msg="error authorizing context: access denied" go.version="go1.19.10 X:strictfipsruntime" http.request.host=registry.ci.openshift.org http.request.id=3e805818-515d-443f-8d9b-04667986611d http.request.method=GET http.request.remoteaddr=18.218.67.82 http.request.uri="/v2/ocp/4-dev-preview/manifests/sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0" http.request.useragent="containers/5.24.1 (github.com/containers/image)" vars.name=ocp/4-dev-preview vars.reference="sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0"

Expected results:

Image registry does not return gateway timeouts

Additional info:

Must gather(s) attached, additional information in linked OHSS ticket.

https://github.com/openshift/image-registry/pull/381

Bug OCPBUGS-10200: Update 4.14 openshift-enterprise-haproxy-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/455

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/455

Bug OCPBUGS-10622: TestNewAppRun unit test failing

View the Description View the linked PRs

Description of problem:

Unit test failing 

=== RUN   TestNewAppRunAll/app_generation_using_context_dir
    newapp_test.go:907: app generation using context dir: Error mismatch! Expected <nil>, got supplied context directory '2.0/test/rack-test-app' does not exist in 'https://github.com/openshift/sti-ruby'
    --- FAIL: TestNewAppRunAll/app_generation_using_context_dir (0.61s)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

see for example https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/1376/pull-ci-openshift-oc-master-images/1638172620648091648

Actual results:

unit tests fail

Expected results:

TestNewAppRunAll unit test should pass

Additional info:

https://github.com/openshift/oc/pull/1377

Bug OCPBUGS-12289: Update 4.14 golang-github-prometheus-alertmanager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-alertmanager/pull/70

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-alertmanager/pull/70

Bug OCPBUGS-15658: [azuredisk-csi-driver] Track enablePerformancePlus issue

View the Description View the linked PRs

Description of problem:

This Jira is filed to track upstream issue (fix and backport) 
https://github.com/kubernetes-sigs/azuredisk-csi-driver/issues/1893

Version-Release number of selected component (if applicable):

4.14

https://github.com/openshift/azure-disk-csi-driver/pull/45

Bug OCPBUGS-9174: cluster-readers role is not authorized to view NetworkAttachmentDefinition

View the Description View the linked PRs

Description of problem:
An un-privileged user with cluster-readers role cannot view NetworkAttachmentDefinition resource.

Version-Release number of selected component (if applicable):
oc Version: 4.10.0-202203141248.p0.g6db43e2.assembly.stream-6db43e2
OCP Version: 4.10.4
Kubernetes Version: v1.23.3+e419edf
ose-multus-cni:v4.1.0-7.155662231

How reproducible:
100%

Steps to Reproduce:
1. In an OCP cluster with multus installed - search which roles can view ("get") NetworkAttachmentDefinition resource, and see if "cluster-readers" role is part of this list, by running:
$ oc adm policy who-can get network-attachment-definitions | grep "cluster-reader"

Actual results:
Empty output

Expected results:
Non-empty output with "cluster-readers" in it, e.g. when running the same command for the Namespace resource:
$ oc adm policy who-can get namespace | grep "cluster-reader"
system:cluster-readers

https://github.com/openshift/cluster-network-operator/pull/1343

Bug OCPBUGS-11099: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12724

Bug OCPBUGS-20254: Upgrade from OpenShift 4.13 to 4.14 Leaves Network Operator Degraded

View the Description View the linked PRs

Description of problem:

After upgrading from OpenShift 4.13 to 4.14 with Kuryr network type, the network operator shows as Degraded and the cluster version reports that it's unable to apply the 4.14 update. The issue seems to be related to mtu settings, as indicated by the message: "Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]."

Version-Release number of selected component (if applicable):

Upgrading from 4.13 to 4.14
4.14.0-0.nightly-2023-09-15-233408
Kuryr network type
RHOS-17.1-RHEL-9-20230907.n.1

How reproducible:

Consistently reproducible on attempting to upgrade from 4.13 to 4.14.

Steps to Reproduce:

1.Install OpenShift version 4.13 on OpenStack. 
2.Initiate an upgrade to OpenShift version 4.14.

Actual results:

The network operator shows as Degraded with the message:

network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
 
Additionally, "oc get clusterversions" shows:

Unable to apply 4.14.0-0.nightly-2023-09-15-233408: wait has exceeded 40 minutes for these operators: network

Expected results:

The upgrade should complete successfully without any operator being degraded.

Additional info:

Some components remain at version 4.13.13 despite the upgrade attempt. Specifically, the dns, machine-config, and network operators are still at version 4.13.13. :

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE                                                                                                         
authentication                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
baremetal                                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-controller-manager                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-credential                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cluster-autoscaler                         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
config-operator                            4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
console                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
control-plane-machine-set                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
csi-snapshot-controller                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
dns                                        4.13.13                              True        False         False      13h                                                                                                                     
etcd                                       4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
image-registry                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
ingress                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
insights                                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-apiserver                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-controller-manager                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-scheduler                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-storage-version-migrator              4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-api                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-approver                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-config                             4.13.13                              True        False         False      13h                                                                                                                     
marketplace                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
monitoring                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
node-tuning                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
openshift-apiserver                        4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-controller-manager               4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-samples                          4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
operator-lifecycle-manager                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
service-ca                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
storage                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h

https://github.com/openshift/cluster-network-operator/pull/2046

Bug OCPBUGS-16435: Bump samples operator k8s dep to v0.27.2

View the Description View the linked PRs

Description of problem:

Updating the k* version to v0.27.2 in cluster samples operator for OCP 4.14 release

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/514

Bug OCPBUGS-4959: oc-mirror error on second synchronisation with no change / diff (invalide sequence order)

View the Description View the linked PRs

Description of problem:

I get synchronization error in fully disconnected environment when i synchronize two time with the target mirror and there no change / diff between first synchronization and second. The first time synchronization works, on second synchronization there is an error and exit code -1.

This case occurs when you want synchronize your disconnected registry regularly and there is no change between two synchronization.

This case is presented hereafter:
https://docs.openshift.com/container-platform/4.11/installing/disconnected_install/installing-mirroring-disconnected.html#oc-mirror-differential-updates_installing-mirroring-disconnected

In documentation we have:

« Like this, the desired mirror content can be declared in the imageset configuration file statically while the mirror jobs are executed regularly, for example as part of a cron job. This way, the mirror can be kept up to date in an automated fashion”

The main question is how to synchronize fully disconnected registry regularly (with no change between each synchronization) without returning error.

Version-Release number of selected component (if applicable):

oc-mirror 4.11

How reproducible:

Follow https://docs.openshift.com/container-platform/4.11/installing/disconnected_install/installing-mirroring-disconnected.html#mirroring-image-set-full and synchronize two time with target mirror.

Steps to Reproduce:

1. oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls 
2. oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls

Actual results:

oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls 
Checking push permissions for quay-server.example.com Publishing image set from archive "output-dir/mirror_seq1_000000.tar" to registry "quay-server.example.com" error: error during publishing, expecting imageset with prefix mirror_seq2: invalid mirror sequence order, want 2, got 1

=> return -1

Expected results:

oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls 
...
No diff from last synchronization, nothing to do

=> return 0

Additional info:

Error is trigered in pkg/cli/mirror/sequence.go

+       default:
+               // Complete metadata checks
+               // UUID mismatch will now be seen as a new workspace.
+               klog.V(3).Info("Checking metadata sequence number")
+               currRun := current.PastMirror
+               incomingRun := incoming.PastMirror
+               if incomingRun.Sequence != (currRun.Sequence + 1) {
+                       return &ErrInvalidSequence{currRun.Sequence + 1, incomingRun.Sequence}
+               }

Error management in ./pkg/cli/mirror/mirror.go may be warning, no difference and return 0 instead of -1.

          }
        case diskToMirror:
                dir, err := o.createResultsDir()
                if err != nil {
                        return err
                }
                o.OutputDir = dir

                // Publish from disk to registry
                // this takes care of syncing the metadata to the
                // registry backends.
                mapping, err = o.Publish(ctx)
                if err != nil {
                        serr := &ErrInvalidSequence{}
                        if errors.As(err, &serr) {
                                return fmt.Errorf("error during publishing, expecting imageset with prefix mirror_seq%d: %v", serr.wantSeq, err)
                        }
                        return err
                }

https://github.com/openshift/oc-mirror/pull/605

Bug OCPBUGS-10714: [GWAPI] OSSM 2.4 spec.techPreview.controlPlaneMode field not supported anymore

View the Description View the linked PRs

Description of problem:

OSSM Daily builds were updated to no longer support the spec.techPreview.controlPlaneMode field and OSSM will not create a SMCP as a result. The field needs to be updated to spec.mode.

Gateway API enhanced dev preview is currently broken (currently using latest 2.4 daily build because 2.4 is unreleased). This should be resolved before OSSM 2.4 is GA.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

100%

Steps to Reproduce:

1. Follow instructions in http://pastebin.test.redhat.com/1092754

Actual results:

CIO fails to create a SMCP

"error": "failed to create ServiceMeshControlPlane openshift-ingress/openshift-gateway: admission webhook \"smcp.validation.maistra.io\" denied the request: the spec.techPreview.controlPlaneMode field is not supported in version 2.4+; use spec.mode"

Expected results:

CIO is able to create a SMCP

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/901

Bug OCPBUGS-13309: ci: break due to ingress-operator feature gate change

View the Description View the linked PRs

ingress-operator will not start

https://github.com/openshift/cluster-ingress-operator/pull/908

https://github.com/openshift/hypershift/pull/2543

Bug MGMT-14242: installation of day2 arm worker in a day1 x86 cluster is blocked by assisted service

View the Description View the linked PRs

Description of the problem:
e2e-metal-assisted-day2-arm-workers-periodic job fails to install the day2 ARM worker because the the service marks the setup incompatible:

  time="2023-04-04T12:03:37Z" level=error msg="cannot use arm64 architecture because it's not compatible on version  of OpenShift" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).handlerClusterInfoOnRegisterInfraEnv" file="/assisted-service/internal/bminventory/inventory.go:4466" pkg=Inventory
time="2023-04-04T12:03:37Z" level=error msg="Failed to register InfraEnv test-infra-infra-env-fd527e12 with id 3e21770d-d607-431c-967c-5f632bec0cfb. Error: cannot use arm64 architecture because it's not compatible on version  of OpenShift" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).RegisterInfraEnvInternal.func1" file="/assisted-service/internal/bminventory/inventory.go:4528" cluster_id=3e21770d-d607-431c-967c-5f632bec0cfb go-id=235 pkg=Inventory request_id=f8dd7eeb-efa7-4828-a8c5-e1486a8bc1d2

See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2109/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted-day2-arm-workers/1643199500098998272

How reproducible:

Run the job e2e-metal-assisted-day2-arm-workers which:

install a day1 x86 cluster
Add a day2 ARM worker to the day1 x86 cluster

Steps to reproduce:

Actual results:

The job fails to add the day2 worker and the assisted service log shows:
"Error: cannot use arm64 architecture because it's not compatible on version of OpenShift"

Expected results:

The installation of the day2 ARM worker succeed without errors.

Elior Erez I assign this ticket to you as it looks like it is linked to the feature support code, can you have a look?

https://github.com/openshift/assisted-service/pull/5119

Bug OCPBUGS-13636: AWS: Govcloud: add new SC2S (us-isob-east-1) and TC2S regions (us-iso-west-1)

View the Description View the linked PRs

Description of problem:

PRs were previously merged to add SC2S support via AWS SDK here:

https://github.com/openshift/installer/pull/5710
https://github.com/openshift/installer/pull/5597
https://github.com/openshift/cluster-ingress-operator/pull/703

However, further updates to add support for SC2S region (us-isob-east-1) and new TC2S region (us-iso-west-1) are still required.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Try to deploy a cluster on us-isob-east-1 or us-iso-west-1
2.
3.

Actual results:

Regions are not supported

Expected results:

Additional info:

Both TC2S and SC2S support ALIAS records now.

https://github.com/openshift/installer/pull/6184

Bug OCPBUGS-14862: Active Endpoint Connection blocks cluster uninstallation

View the Description View the linked PRs

Description of problem:

For unknown reasons, the management cluster AWS endpoint service sometimes has an active connection leftover. This blocks the uninstallation, as the AWS endpoint service cannot be deleted before this connection is rejected.

Version-Release number of selected component (if applicable):

4.12.z,4.13.z,4.14.z

How reproducible:

Irregular

Steps to Reproduce:

1.
2.
3.

Actual results:

AWSEndpointService cannot be deleted by the hypershift operator, the uninstallation is stuck

Expected results:

There are no leftover active AWSEndpoint connections when deleting the AWSEndpointService and it can be deleted properly.

OR

Hypershift operator rejects active endpoint connections when trying to delete AWSEndpointServices from the management cluster aws account

Additional info:

Added mustgathers in comment.

https://github.com/openshift/hypershift/pull/2700

Bug OCPBUGS-15168: Oauth Server invalidly proxies cloud IAM traffic

View the Description View the linked PRs

Description of problem:

In the Konnectivity SOCKS proxy: currently the default is to proxy cloud endpoint traffic: https://github.com/openshift/hypershift/blob/main/konnectivity-socks5-proxy/main.go#L61

Due to this after this change: https://github.com/openshift/hypershift/commit/0c52476957f5658cfd156656938ae1d08784b202

The oauth server had a behavior change where it began to proxy iam traffic instead of not proxying it. This causes a regression in Satellite environments running with an HTTP_PROXY server. The original network traffic path needs to be restored

Version-Release number of selected component (if applicable):

4.13 4.12

How reproducible:

100%

Steps to Reproduce:

1. Setup HTTP_PROXY IBM Cloud Satellite environment
2. In the oauth-server pod run a curl against iam (curl -v https://iam.cloud.ibm.com)
3. It will log it is using proxy

Actual results:

It is using proxy

Expected results:

It should send traffic directly (as it does in 4.11 and 4.10)

Additional info:

https://github.com/openshift/hypershift/pull/2699

Bug OCPBUGS-19953: [AWS SC2S] ec2:DescribeSecurityGroupRules is not supported in SC2S region.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18830~~. The following is the description of the original issue:
—
Description of problem:

Failed to install cluster on SC2S region as:

level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Always

Steps to Reproduce:

1. Create an OCP cluster on SC2S

Actual results:

Install fail:
level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Expected results:

Install succeed.

Additional info:

* C2S region is not affected

https://github.com/openshift/installer/pull/7543

Bug OCPBUGS-10504: AWSPrivateLink is not updated on conflicting entries with VPCEndpointServcieName field

View the Description View the linked PRs

Description of problem:

When you migrate a HostedCluster, the AWSEndpointService conflicts from the old MGMT Server with the new MGMT Server. The AWSPrivateLink_Controller does not have any validation when this happens. This is needed to make the Disaster Recovery HC Migration works. So the issue will raise up when the nodes of the HostedCluster cannot join the new Management cluster because the AWSEndpointServiceName is still pointing to the old one.

Version-Release number of selected component (if applicable):

4.12
4.13
4.14

How reproducible:

Follow the migration procedure from upstream documentation and the nodes in the destination HostedCluster will keep in NotReady state.

Steps to Reproduce:

1. Setup a management cluster with the 4.12-13-14/main version of the HyperShift operator.
2. Run the in-place node DR Migrate E2E test from this PR https://github.com/openshift/hypershift/pull/2138:
bin/test-e2e \
  -test.v \
  -test.timeout=2h10m \
  -test.run=TestInPlaceUpgradeNodePool \
  --e2e.aws-credentials-file=$HOME/.aws/credentials \
  --e2e.aws-region=us-west-1 \
  --e2e.aws-zones=us-west-1a \
  --e2e.pull-secret-file=$HOME/.pull-secret \
  --e2e.base-domain=www.mydomain.com \
  --e2e.latest-release-image="registry.ci.openshift.org/ocp/release:4.13.0-0.nightly-2023-03-17-063546" \
  --e2e.previous-release-image="registry.ci.openshift.org/ocp/release:4.13.0-0.nightly-2023-03-17-063546" \
  --e2e.skip-api-budget \
  --e2e.aws-endpoint-access=PublicAndPrivate

Actual results:

The nodes stay in NotReady state

Expected results:

The nodes should join the migrated HostedCluster

Additional info:

https://github.com/openshift/hypershift/pull/2290

Bug OCPBUGS-13124: Forced BMH reboot fails when image URL has changed

View the Description View the linked PRs

Description of problem:

When forcing a reboot of a BMH with the annotation  reboot.metal3.io: '{"force": true}' with a new preprovisioningimage URL the host never reboots.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-05-03-150228

How reproducible:

100%

Steps to Reproduce:

1. Create a BMH and stall the provisioning process at "provisioning"
2. Set a new URL in the preprovisioningimage
3. Set the force reboot annotation on the BMH (reboot.metal3.io: '{"force": true}')

Actual results:

Host does not reboot and the annotation remains on the BMH

Expected results:

Host reboots into the new image

Additional info:

This was reproduced using assisted installer (MCE central infrastructure management)

https://github.com/openshift/baremetal-operator/pull/276

Bug MGMT-14526: Possible issue with validateNoWildcardDNS resolution validation

View the Description View the linked PRs

This is a ticket created based off a GitHub comment from a random user

Description of the problem:

See GitHub comment

How reproducible:

Unknown

Steps to reproduce:

1. See GitHub comment

Actual results:

DNS wildcard validation failure is a false-postiive

Expected results:

DNS wildcard validation should probably avoid domain-search

Bug OCPBUGS-10695: Dual-stack IPI installation fails when configuring multiple interfaces on nodes

View the Description View the linked PRs

Description of problem:

During cluster installation if the host systems had multiple dual-stack interfaces configured via install-config.yaml, the installation will fail. Notably, when a single-stack ipv4 installation is attempted with multiple interfaces it is successful. Additionally, when a dual-stack installation is attempted with only a single interface it is successful.

Version-Release number of selected component (if applicable):

Reproduced on 4.12.1 and 4.12.7

How reproducible:

100%

Steps to Reproduce:

1. Assign an IPv4 and an IPv6 address to both the apiVIPs and ingressVIPs parameters in the install-config.yaml
2. Configure all hosts with at least two interfaces in the install-config.yaml
3. Assign an IPv4 and an IPv6 address to each interface in the install-config.yaml
4. Begin cluster installation and wait for failure

Actual results:

Failed cluster installation

Expected results:

Successful cluster installation

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/227

Bug OCPBUGS-17690: prometheus-adapter removed --logtostderr

View the Description View the linked PRs

The cli option --logtostderr was removed in prometheus-adapter v0.11. CMO uses this argument and this currently blocks the update to v0.11: https://github.com/openshift/k8s-prometheus-adapter/pull/72

Iiuc we can simply drop this argument.

https://github.com/openshift/cluster-monitoring-operator/pull/2075

Bug OCPBUGS-18052: SNO installation does not finish due to machine-config waiting for a non existing machine config

View the Description View the linked PRs

Description of problem:

SNO installation does not finish due to machine-config waiting for a non existing machine config.

 oc get co machine-config
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config             True        True          True       14h     Unable to apply 4.14.0-0.nightly-2023-08-23-075058: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 1, ready 0, updated: 0, unavailable: 1)]]

oc -n openshift-machine-config-operator logs machine-config-daemon-2stpc --tail 5
Defaulted container "machine-config-daemon" out of: machine-config-daemon, kube-rbac-proxy
I0824 07:39:12.117508   22874 daemon.go:1370] In bootstrap mode
E0824 07:39:12.117525   22874 writer.go:226] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-231b9341930d0616544ad05989a5c1b8" not found
W0824 07:40:12.131400   22874 daemon.go:1630] Failed to persist NIC names: open /etc/systemd/network: no such file or directory
I0824 07:40:12.131417   22874 daemon.go:1370] In bootstrap mode
E0824 07:40:12.131429   22874 writer.go:226] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-231b9341930d0616544ad05989a5c1b8" not found

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-23-075058

How reproducible:

100%

Steps to Reproduce:

1. Deploy SNO with Telco DU profile
2. Wait for installation to finish

Actual results:

Installation doesn't complete due to master MCP being degraded waiting for a non-existing machineconfig.

Expected results:

Installation succeeds.

Additional info:

Attaching sosreport and must-gather

https://github.com/openshift/cluster-node-tuning-operator/pull/778

Bug OCPBUGS-18883: CPMS failure domains should be omitted when a single failure domain is present

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18113~~. The following is the description of the original issue:
—
Description of problem:

When the installer generates a CPMS, it should only add the `failureDomains` field when there is more than one failure domain. When there is only one failure domain, the fields from the failure domain, eg the zone, should be injected directly into the provider spec and the failure domain should be omitted.

By doing this, we avoid having to care about failure domain injection logic for single zone clusters. Potentially avoiding bugs (such as some we have seen recently).

IIRC we already did this for OpenStack, but AWS, Azure and GCP may not be affected.

Version-Release number of selected component (if applicable):

How reproducible:

Can be demonstrated on Azure on the westus region which has no AZs available. Currently the installer creates the following, which we can omit entirely:
```
failureDomains:
  platform: Azure
  azure:
  - zone: ""
```

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7483

Bug OCPBUGS-14089: Check permission and accessibility of non-default SCs on vSphere platform for CSI

View the Description View the linked PRs

Apart from default SC, we should check if non-default SCs that were created on vSphere platform use datastore for which OCP has accessibility and necessary permissions.

This will avoid hard to debug errors in cases where customer creates additional SC but forgets to give necessary permission to newer datastore.

https://github.com/openshift/vsphere-problem-detector/pull/119

Bug OCPBUGS-7485: When Creating Sample Devfile from the Samples Page, Topology Icon is not set

View the Description View the linked PRs

Description of problem:

When Creating Sample Devfile from the Samples Page, corresponding Topology Icon for the app is not set. This issue is not observed when we create a BuildImage from the Samples page.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Create a Sample Devfile App from the Samples Page
2. Go to the Topology Page and check the icon of the app created.

Actual results:

The generic Openshift logo is displayed

Expected results:

Need to show the corresponding app icon (Golang, Quarkus, etc.)

Additional info:

In case of creating sample of BuilderImage, the icon gets properly set as per the BuilderImage used.

Current label: app.openshift.io/runtime=dotnet-basic
Change to: app.openshift.io/runtime=dotnet

https://github.com/openshift/console/pull/12725

Bug OCPBUGS-10106: Update 4.14 configmap-reload image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/51

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/51

Bug OCPBUGS-11310: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-operator/pull/266

Bug OCPBUGS-18297: Ironic: sqlite3.OperationalError: database is locked

View the Description View the linked PRs

Description of problem:

4.14-e2e-metal-ipi-sdn-bm jobs are failing with

2023-08-29 15:43:27.066 1 ERROR ironic.api.method [None req-00977b71-1b61-4452-8f6c-a43a47b1e92e - - - - - -] Server-side error: "<Future at 0x7fe7b2b86250 state=finished raised OperationalError>". Detail: 
Traceback (most recent call last):
File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
self.engine.dialect.do_commit(self.connection)
File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
dbapi_connection.commit()
sqlite3.OperationalError: database is locked

https://github.com/openshift/ironic-image/pull/396

Bug OCPBUGS-19344: 4.14 & 4.15 Azure Install Failures: Kubelet stopped posting node status

View the Description View the linked PRs

Description of problem:

Install issues for 4.14 && 4.15 where we lose contact with kublet on master nodes.

https://search.ci.openshift.org/?search=Kubelet+stopped+posting+node+status&maxAge=168h&context=1&type=build-log&name=periodic.*4.14.*azure.*sdn&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

This search shows its happening on about 35% of azure sdn 4.14 jobs over the past week at least. There are no ovn hits.

1703590387039342592/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes.json

                    {
                        "lastHeartbeatTime": "2023-09-18T02:33:11Z",
                        "lastTransitionTime": "2023-09-18T02:35:39Z",
                        "message": "Kubelet stopped posting node status.",
                        "reason": "NodeStatusUnknown",
                        "status": "Unknown",
                        "type": "Ready"
                    }

4.14 is interesting as it is a minor upgrade from 4.13 and we see the install failures with a master node dropping out.

Focusing on periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade/1703590387039342592

Build log shows

[36mINFO[0m[2023-09-18T02:03:03Z] Using explicitly provided pull-spec for release initial (registry.ci.openshift.org/ocp/release:4.13.0-0.ci-2023-09-17-050449)

ipi-azure-conf shows region centralus (not the single zone westus)

get ocp version: 4.13
/output
Azure region: centralus

oc_cmds/nodes shows master-1 not ready

ci-op-82xkimh8-0dd98-9g9wh-master-1                  NotReady   control-plane,master   82m   v1.26.7+c7ee51f   10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 413.92.202309141211-0 (Plow)

ci-op-82xkimh8-0dd98-9g9wh-master-1-boot.log shows ignition

install log shows we have lost contact

time="2023-09-18T03:15:33Z" level=error msg="Cluster operator kube-apiserver Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: [Missing operand on node ci-op-82xkimh8-0dd98-9g9wh-master-0, Missing operand on node ci-op-82xkimh8-0dd98-9g9wh-master-2]\nNodeControllerDegraded: The master nodes not ready: node \"ci-op-82xkimh8-0dd98-9g9wh-master-1\" not ready since 2023-09-18 02:35:39 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)"

4.15 4.15.0-0.ci-2023-09-17-172341 and 4.14 4.14.0-0.ci-2023-09-18-020137

Version-Release number of selected component (if applicable):

How reproducible:

We are seeing this on a high number of failed payloads for 4.14 && 4.15. Additional recent failures

4.14.0-0.ci-2023-09-17-012321
aggregated-azure-sdn-upgrade-4.14-minor shows failures like: Passed 5 times, failed 0 times, skipped 0 times: we require at least 6 attempts to have a chance at success indicating that only 5 of the 10 runs were valid.
Checking install logs shows we have lost master-2

time="2023-09-17T02:44:22Z" level=error msg="Cluster operator kube-apiserver Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: [Missing operand on node ci-op-crj5cf00-0dd98-p5snd-master-1, Missing operand on node ci-op-crj5cf00-0dd98-p5snd-master-0]\nNodeControllerDegraded: The master nodes not ready: node \"ci-op-crj5cf00-0dd98-p5snd-master-2\" not ready since 2023-09-17 02:01:49 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)"

oc_cmds/nodes also shows master-2 not ready

4.15.0-0.nightly-2023-09-17-113421 install analysis failed due to azure tech preview oc_cmds/nodes shows master-1 not ready

4.15.0-0.ci-2023-09-17-112341 aggregated-azure-sdn-upgrade-4.15-minor only 5 of 10 runs are valid sample oc_cmds/nodes shows master-0 not ready

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3928

Bug OCPBUGS-5872: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3581

Bug OCPBUGS-12267: OLM k8sResourcePrefix x-descriptor dropdown unexpectedly clears selections

View the Description View the linked PRs

Description of problem:

When using the k8sResourcePrefix x-descriptor with custom resource kinds, the form-view dropdown selection currently doesn't accept the initial user selection...requiring the user to make their selection twice. Also...if the configuration panel contains multiple custom resource dropdowns, then each previous dropdown selection on the panel is also cleared each time the user configures another custom resource dropdown, requiring the user to also reconfigure each previous selection.Here's an example of my configuration below:specDescriptors:
          - displayName: Collection
            path: collection
            x-descriptors:
              - >-
                urn:alm:descriptor:io.kubernetes:abc.zzz.com:v1beta1:Collection
          - displayName: Endpoints
            path: 'mapping[0].endpoints[0].name'
            x-descriptors:
              - >-
                urn:alm:descriptor:io.kubernetes:abc.zzz.com:v1beta1:Endpoint
          - displayName: Requested Credential Secret
            path: 'mapping[0].endpoints[0].credentialName'
            x-descriptors:
              - 'urn:alm:descriptor:io.kubernetes:Secret'
          - displayName: Namespaces
            path: 'mapping[0].namespace'
            x-descriptors:
              - 'urn:alm:descriptor:io.kubernetes:Namespace'
With this configuration, when a user wants to select a Collection or Endpoint from the form view dropdown, the user is forced to make their selection twice before the selection is accepted in the dropdown. Also, if the user does configure the Collection dropown, and then decides to configure the Endpoint dropdown, once the Endpoint selection is made, the Collection dropdown is then cleared.

Version-Release number of selected component (if applicable):

4.8

How reproducible:

Always

Steps to Reproduce:

1. Create a new project: 
  oc new-project descriptor-test
2. Create the resources in this gist: 
  oc create -f https://gist.github.com/TheRealJon/99aa89c4af87c4b68cd92a544cd7c08e/raw/a633ad172ff071232620913d16ebe929430fd77a/reproducer.yaml
3. In the admin console, go to the installed operators page in project 'descriptor-test'
4. Select Mock Operator from the list
5. Select "Create instance" in the Mock Resource provided API card
6. Scroll to the field-1
7. Select 'example-1' from the dropdown

Actual results:

Selection is not retained on the first click.

Expected results:

The selection should be retained on the first click.

Additional info:

In addition to this behavior, if a form has multiple k8sResourcePrefix dropdown fields, they all get cleared when attempting to select an item from one of them.

https://github.com/openshift/console/pull/12758

Bug OCPBUGS-12564: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/245

Bug OCPBUGS-7840: HCCO overwrites kubernetes service endpoints that are managed by KAS

View the Description View the linked PRs

Description of problem:

The kube apiserver manages the endpoints resource of the default/kubernetes service so that pods can access the kube apiserver. It does this via the --advertise-address flag and the container port for the kube apiserver pod. Currently the HCCO overwrites the endpoints resource with another port. This conflicts with what the KAS manages, it should not do that.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create an AWS publicAndPrivate cluster with DNS hostnames and a Route publishing strategy for the apiserver.

Actual results:

The HCCO overwrites the default/kubernetes endpoints resource in the guest cluster.

Expected results:

The HCCO does not overwrite the default/kubernetes endpoints resource

Additional info:

https://github.com/openshift/hypershift/pull/2964

Bug OCPBUGS-11632: `oc adm must-gather` should not exit code 1 when cluster with abnormal operator

View the Description View the linked PRs

Description of problem:

when cluster with abnormal operator status , run the `oc adm must-gather` will exit with code 1 .

Version-Release number of selected component (if applicable):

4.12/4.13

Actual results:

     [36m[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-gfcpc deleted[0m
      [36m[0m
      [36m[0m
      [36mReprinting Cluster State:[0m
      [36mWhen opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:[0m
      [36mClusterID: 0ba6ca81-e6d8-4d15-b345-70f81bd5a005[0m
      [36mClusterVersion: Stable at "4.13.0-0.nightly-2023-04-01-062001"[0m
      [36mClusterOperators:[0m
      [36m	clusteroperator/cloud-credential is not upgradeable because Upgradeable annotation cloudcredential.openshift.io/upgradeable-to on cloudcredential.operator.openshift.io/cluster object needs updating before upgrade. See Manually Creating IAM documentation for instructions on preparing a cluster for upgrade.[0m
      [36m	clusteroperator/ingress is progressing: ingresscontroller "test-34166" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...[0m
      [36m).[0m
      [36mNot all ingress controllers are available.[0m
      [36m[0m
      [36m[0m
      [36m[0m
      [36mSTDERR:[0m
      [36merror: yaml: line 7: did not find expected key[0m[0m
      [36m[33m[08:06:46] INFO> Exit Status: 1
Expected results:
{code:none}
abnormal status of any of the operators should not affect must-gather's exit code

Additional info:

Bug OCPBUGS-14425: Alibaba clusters are TechPreview and should not be upgradeable

View the Description View the linked PRs

Description of problem:

Alibaba clusters were never declared GA. They are still in TechPreview.
We do not allow upgrades between TechPreview clusters in minor streams (eg 4.12 to 4.13)

To allow a future deprecation and removal of the platform, we will prevent upgrades past 4.13.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-19703: Internal Registry Secrets merge causing excessive API calls

View the Description View the linked PRs

This is a manual clone of https://issues.redhat.com/browse/OCPBUGS-18902 for backporting purposes.

In this recent PR that merged, a number of API calls do not use caches causing excessive calls.

Done when:

-Change all Get() calls to use listers

-API call metric should decrease

https://github.com/openshift/machine-config-operator/pull/3941

Story HOSTEDCP-801: Annotate for External DNS hosted CP components for `Private` clusters

View the Description View the linked PRs

When a HostedCluster is configured as `Private`, annotate the necessary hosted CP components (API and OAuth) so that External DNS can still create public DNS records (pointing to private IP resources).

The External DNS record should be pointing to the resource for the PrivateLink VPC Endpoint. "We need to specify the IP of the A record. We can do that with a cluster IP service."

Context: https://redhat-internal.slack.com/archives/C01C8502FMM/p1675432805760719

https://github.com/openshift/hypershift/pull/2286

Bug OCPBUGS-8328: aws-ebs-csi-driver-operator ServiceAccount does not include the HCP pull-secret in its imagePullSecrets

View the Description View the linked PRs

aws-ebs-csi-driver-operator ServiceAccount does not include the HCP pull-secret in its imagePullSecrets. Thus, if a HostedCluster is created with a `pullSecret` that contains creds that the management cluster pull secret does not have, the image pull fails.

https://github.com/openshift/cluster-storage-operator/pull/346

Bug OCPBUGS-16889: CEO needs to handle optional MachineAPI

View the Description View the linked PRs

Description of problem:

Quoting Joel: In 4.14 there's been an effort to make Machine API optional, anything that that relies on the CRD needs to be able to detect that the CRD is not installed and then not error should that be the case. You should be able to use a discovery client to determine if the API group is installed or not

We have several controllers and informers that are depending on the machine API to be at least available to list and sync caches with. When the API is not installed at all the depending controllers are blocked forever and eventually get killed by the aliveness probe. That causes hot restart loops that cause installations to fail.

https://redhat-internal.slack.com/archives/C027U68LP/p1690436286860899

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. install a machineAPI=false cluster
2. ??? 
3. watch it fail

Bug OCPBUGS-18181: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-16776~~.

https://github.com/openshift/installer/pull/7517

Bug OCPBUGS-7516: CPMS create two replace machines when deleting a master machine on vSphere

View the Description View the linked PRs

Description of problem:

CPMS create two replace machines when deleting a master machine on vSphere.

Sorry, I have to revisit this https://issues.redhat.com/browse/OCPBUGS-4297 as I see all the related pr are merged, but I met twice on this template cluster
ipi-on-vsphere/versioned-installer-vmc7-ovn-winc-thin_pvc-ci, once on ipi-on-vsphere/versioned-installer-vmc7-ovn template cluster today

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-13-235211

How reproducible:

Three times

Steps to Reproduce:

1. On this template cluster
ipi-on-vsphere/versioned-installer-vmc7-ovn-winc-thin_pvc-ci, the first time I met this is after update all the 3 master machines using RollingUpdate strategy, then I delete a master machine. But seems the redundant machine was automatically deleted, because there was only one replacement machine when I revisit it.

liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-djlxv-2   Running                          47m
huliu-vs15b-75tr7-master-h76sp-1   Running                          58m
huliu-vs15b-75tr7-master-wtzb7-0   Running                          70m
huliu-vs15b-75tr7-worker-gzsp9     Running                          4h43m
huliu-vs15b-75tr7-worker-vcqqh     Running                          4h43m
winworker-4cltm                    Running                          4h19m
winworker-qd4c4                    Running                          4h19m
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-vs15b-75tr7-master-djlxv-2
machine.machine.openshift.io "huliu-vs15b-75tr7-master-djlxv-2" deleted
^C
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE          TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-bzd4h-2   Provisioning                          34s
huliu-vs15b-75tr7-master-djlxv-2   Deleting                              48m
huliu-vs15b-75tr7-master-gzhlk-2   Provisioning                          35s
huliu-vs15b-75tr7-master-h76sp-1   Running                               59m
huliu-vs15b-75tr7-master-wtzb7-0   Running                               70m
huliu-vs15b-75tr7-worker-gzsp9     Running                               4h44m
huliu-vs15b-75tr7-worker-vcqqh     Running                               4h44m
winworker-4cltm                    Running                               4h20m
winworker-qd4c4                    Running                               4h20m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-bzd4h-2   Running                          38m
huliu-vs15b-75tr7-master-h76sp-1   Running                          97m
huliu-vs15b-75tr7-master-wtzb7-0   Running                          108m
huliu-vs15b-75tr7-worker-gzsp9     Running                          5h22m
huliu-vs15b-75tr7-worker-vcqqh     Running                          5h22m
winworker-4cltm                    Running                          4h57m
winworker-qd4c4                    Running                          4h57m 

2.Then I change the strategy to OnDelete, and after update all the 3 master machines using OnDelete strategy, then I delete a master machine. 

liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-hzhgq-0   Running                          137m
huliu-vs15b-75tr7-master-kj9zf-2   Running                          89m
huliu-vs15b-75tr7-master-kz6cx-1   Running                          59m
huliu-vs15b-75tr7-worker-gzsp9     Running                          7h46m
huliu-vs15b-75tr7-worker-vcqqh     Running                          7h46m
winworker-4cltm                    Running                          7h21m
winworker-qd4c4                    Running                          7h21m
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-vs15b-75tr7-master-hzhgq-0
machine.machine.openshift.io "huliu-vs15b-75tr7-master-hzhgq-0" deleted
^C
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE          TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-hzhgq-0   Deleting                              138m
huliu-vs15b-75tr7-master-kb687-0   Provisioning                          26s
huliu-vs15b-75tr7-master-kj9zf-2   Running                               90m
huliu-vs15b-75tr7-master-kz6cx-1   Running                               60m
huliu-vs15b-75tr7-master-qn6kq-0   Provisioning                          26s
huliu-vs15b-75tr7-worker-gzsp9     Running                               7h47m
huliu-vs15b-75tr7-worker-vcqqh     Running                               7h47m
winworker-4cltm                    Running                               7h22m
winworker-qd4c4                    Running                               7h22m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-kb687-0   Running                          154m
huliu-vs15b-75tr7-master-kj9zf-2   Running                          4h5m
huliu-vs15b-75tr7-master-kz6cx-1   Running                          3h34m
huliu-vs15b-75tr7-master-qn6kq-0   Running                          154m
huliu-vs15b-75tr7-worker-gzsp9     Running                          10h
huliu-vs15b-75tr7-worker-vcqqh     Running                          10h
winworker-4cltm                    Running                          9h
winworker-qd4c4                    Running                          9h
liuhuali@Lius-MacBook-Pro huali-test % oc get co     
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      5h13m   
baremetal                                  4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
cloud-controller-manager                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
cloud-credential                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
cluster-autoscaler                         4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
config-operator                            4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
console                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      145m    
control-plane-machine-set                  4.13.0-0.nightly-2023-02-13-235211   True        False         True       10h     Observed 1 updated machine(s) in excess for index 0
csi-snapshot-controller                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
dns                                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
etcd                                       4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
image-registry                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
ingress                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
insights                                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-apiserver                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-controller-manager                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-scheduler                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-storage-version-migrator              4.13.0-0.nightly-2023-02-13-235211   True        False         False      6h18m   
machine-api                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
machine-approver                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
machine-config                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      3h59m   
marketplace                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
monitoring                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
network                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
node-tuning                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
openshift-apiserver                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      145m    
openshift-controller-manager               4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
openshift-samples                          4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
operator-lifecycle-manager                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-02-13-235211   True        False         False      6h7m    
service-ca                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
storage                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      3h57m   
liuhuali@Lius-MacBook-Pro huali-test %  

3.On ipi-on-vsphere/versioned-installer-vmc7-ovn template cluster, 
after update all the 3 master machines using RollingUpdate strategy, no issue,
then delete a master machine, no issue, 
then change the strategy to OnDelete, and replace the master machines one by one, when I delete the last one, two replace machines created.

liuhuali@Lius-MacBook-Pro huali-test % oc get co 
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      73m     
baremetal                                  4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
cloud-controller-manager                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
cloud-credential                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
cluster-autoscaler                         4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
config-operator                            4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
console                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      129m    
control-plane-machine-set                  4.13.0-0.nightly-2023-02-13-235211   True        True          False      9h      Observed 1 replica(s) in need of update
csi-snapshot-controller                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
dns                                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
etcd                                       4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
image-registry                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
ingress                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
insights                                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
kube-apiserver                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
kube-controller-manager                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
kube-scheduler                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
kube-storage-version-migrator              4.13.0-0.nightly-2023-02-13-235211   True        False         False      3h22m   
machine-api                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
machine-approver                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
machine-config                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
marketplace                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
monitoring                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
network                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
node-tuning                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
openshift-apiserver                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
openshift-controller-manager               4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
openshift-samples                          4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
operator-lifecycle-manager                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-02-13-235211   True        False         False      46m     
service-ca                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
storage                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      77m    
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15a-kjm6h-master-55s4l-1   Running                          84m
huliu-vs15a-kjm6h-master-ppc55-2   Running                          3h4m
huliu-vs15a-kjm6h-master-rqb52-0   Running                          53m
huliu-vs15a-kjm6h-worker-6nbz7     Running                          9h
huliu-vs15a-kjm6h-worker-g84xg     Running                          9h
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-vs15a-kjm6h-master-ppc55-2
machine.machine.openshift.io "huliu-vs15a-kjm6h-master-ppc55-2" deleted
^C
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE          TYPE   REGION   ZONE   AGE
huliu-vs15a-kjm6h-master-55s4l-1   Running                               85m
huliu-vs15a-kjm6h-master-cvwzz-2   Provisioning                          27s
huliu-vs15a-kjm6h-master-ppc55-2   Deleting                              3h5m
huliu-vs15a-kjm6h-master-qp9m5-2   Provisioning                          27s
huliu-vs15a-kjm6h-master-rqb52-0   Running                               54m
huliu-vs15a-kjm6h-worker-6nbz7     Running                               9h
huliu-vs15a-kjm6h-worker-g84xg     Running                               9h liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15a-kjm6h-master-55s4l-1   Running                          163m
huliu-vs15a-kjm6h-master-cvwzz-2   Running                          79m
huliu-vs15a-kjm6h-master-qp9m5-2   Running                          79m
huliu-vs15a-kjm6h-master-rqb52-0   Running                          133m
huliu-vs15a-kjm6h-worker-6nbz7     Running                          10h
huliu-vs15a-kjm6h-worker-g84xg     Running                          10h
liuhuali@Lius-MacBook-Pro huali-test %

Actual results:

CPMS create two replace machines when deleting a master machine, and the two replace machines exist there for a long time

Expected results:

CPMS should only create one replace machine when deleting a master machine, or quickly delete the redundant machine

Additional info:

Must-gather: https://drive.google.com/file/d/1aCyFn9okNxRz7nE3Yt_8g6Kx7sPSGCg2/view?usp=sharing for ipi-on-vsphere/versioned-installer-vmc7-ovn-winc-thin_pvc-ci template cluster
https://drive.google.com/file/d/1i0fWSP0-HqfdV5E0wcNevognLUQKecvl/view?usp=sharing for ipi-on-vsphere/versioned-installer-vmc7-ovn template cluster

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/207

Bug OCPBUGS-16793: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/33

Bug OCPBUGS-19686: when ovn ipsec pod stop/restart it kills pluto preventing further IPsec IKE communication

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19494~~. The following is the description of the original issue:
—
Description of problem:

ipsec container kills pluto even if that was started by systemd

Version-Release number of selected component (if applicable):

on any 4.14 nightly

How reproducible:

every time

Steps to Reproduce:

1. enable N-S ipsec
2. enable E-W IPsec
3. kill/stop/delete one of the ipsec-host pods

Actual results:

pluto is killed on that host

Expected results:

pluto keeps running

Additional info:

https://github.com/yuvalk/cluster-network-operator/blob/37d1cc72f4f6cd999046bd487a705e6da31301a5/bindata/network/ovn-kubernetes/common/ipsec-host.yaml#L235
this should be removed

https://github.com/openshift/cluster-network-operator/pull/2029

Bug OCPBUGS-10690: startupProbe for UWM prometheus is still 15m

View the Description View the linked PRs

Description of problem:

according to PR: https://github.com/openshift/cluster-monitoring-operator/pull/1824, startupProbe for UWM prometheus/platform prometheus should be 1 hour, but startupProbe for UWM prometheus is still 15m after enabled UWM, platform promethues does not have issue, startupProbe is increased to 1 hour

$ oc -n openshift-user-workload-monitoring get pod prometheus-user-workload-0 -oyaml | grep startupProbe -A20
    startupProbe:
      exec:
        command:
        - sh
        - -c
        - if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready;
          elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready;
          else exit 1; fi
      failureThreshold: 60
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 3
...

$ oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml | grep startupProbe -A20
    startupProbe:
      exec:
        command:
        - sh
        - -c
        - if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready;
          elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready;
          else exit 1; fi
      failureThreshold: 240
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 3

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-19-052243

How reproducible:

always

Steps to Reproduce:

1. enable UWM, check startupProbe for UWM prometheus/platform prometheus
2.
3.

Actual results:

startupProbe for UWM prometheus is still 15m

Expected results:

startupProbe for UWM prometheus should be 1 hour

Additional info:

since startupProbe for platform prometheus is increased to 1 hour, and no similar bug for UWM prometheus, won't fix the issue is OK.

https://github.com/openshift/cluster-monitoring-operator/pull/1930

Bug OCPBUGS-2153: TenantID is ignored in some cases

View the Description View the linked PRs

When ProjectID is not set, TenantID might be ignored in MAPO.

Context: When setting additional networks in Machine templates, networks can be identified by the means of a filter. The network filter has both TenantID and ProjectID as fields. TenantID was ignored.

Steps to reproduce:
Create a Machine or a MachineSet with a template containing a Network filter that sets a TenantID.

```
networks:

filter:
id: 'the-network-id'
tenantId: '123-123-123'
```

One cheap way of testing this could be to pass a valid network ID and set a bogus tenantID. If the machine gets associated with the network, then tenantID has been ignored and the bug is present. If instead MAPO errors, then in means that it has taken tenantID into consideration.

https://github.com/openshift/machine-api-provider-openstack/pull/61

Bug OCPBUGS-15657: [azurefile-csi-driver] Track selectRandomMatchingAccount issue

View the Description View the linked PRs

Description of problem:

This Jira is filed to track upstream issue (fix and backport) https://github.com/kubernetes-sigs/azurefile-csi-driver/issues/1308

Version-Release number of selected component (if applicable):

4.14

https://github.com/openshift/azure-file-csi-driver/pull/31

Bug OCPBUGS-20249: Hosted clusters default KAS PSA config should be consistent with OCP

View the Description View the linked PRs

Description of problem:

[Hypershift] default KAS PSA config should be consistent with OCP 
 enforce: privileged

Version-Release number of selected component (if applicable):

Cluster version is 4.14.0-0.nightly-2023-10-08-220853

How reproducible:

Always

Steps to Reproduce:

1. Install OCP cluster and hypershift operator
2. Create hosted cluster
3. Check the default kas config of the hosted cluster

Actual results:

The hosted cluster default kas PSA config enforce is 'restricted'
$ jq '.admission.pluginConfig.PodSecurity' < `oc extract cm/kas-config -n clusters-9cb7724d8bdd0c16a113 --confirm`
{
  "location": "",
  "configuration": {
    "kind": "PodSecurityConfiguration",
    "apiVersion": "pod-security.admission.config.k8s.io/v1beta1",
    "defaults": {
      "enforce": "restricted",
      "enforce-version": "latest",
      "audit": "restricted",
      "audit-version": "latest",
      "warn": "restricted",
      "warn-version": "latest"
    },
    "exemptions": {
      "usernames": [
        "system:serviceaccount:openshift-infra:build-controller"
      ]
    }
  }
}

Expected results:

The hosted cluster default kas PSA config enforce should be 'privileged' in

https://github.com/openshift/hypershift/blob/release-4.13/control-plane-operator/controllers/hostedcontrolplane/kas/config.go#L93

Additional info:

References: OCPBUGS-8710

https://github.com/openshift/hypershift/pull/3083

Bug OCPBUGS-7415: oauth user:check-access scoped tokens can not be used to check access as intended

View the Description View the linked PRs

Description of problem:

oauth user:check-access scoped tokens can not be used to check access as intended.  SelfSubjectAccessReviews from such scoped token always report allowed: false, denied: true.  Unless the SelfSubjectAccessReview is checking access for ability to create SelfSubjectAccessReviews.  This does not seem like the intended behavior per documentation.

https://docs.openshift.com/container-platform/4.12/authentication/tokens-scoping.html

oauth user:check-access scoped tokens only have authorization for SelfSubjectAccessReview.  This is as intended.  This seems to be limited by the scopeauthorizor.  However, the authorizor used by SelfSubjectAccessReview includes this filter, meaning the returned response is useless (you can only check-access to SelfSubjectAccessReview itself instead of using the token to check access of RBAC of the parent user the token is scoped from).

https://github.com/openshift/kubernetes/blob/master/openshift-kube-apiserver/authorization/scopeauthorizer/authorizer.go

https://github.com/openshift/kubernetes/blob/master/pkg/registry/authorization/selfsubjectaccessreview/rest.go

Version-Release number of selected component (if applicable):

How reproducible:

Create user:check-access scoped token.  Token must not have user:full scope.  Use the token to do a SelfSubjectAccessReview.

Steps to Reproduce:

1. Create user:check-access scoped token.  Must not have user:full scope.
2. Use the token to do a SelfSubjectAccessReview against a resource the parent user has access to.
3. Observe the status response is allowed: false, denied: true.

Actual results:

Unable to check user access with a user:check-access scoped token.

Expected results:

Ability to check user access with a user:check-access scoped token, without user:full scope which would give the token full access and abilities of the parent user.

Additional info:

https://github.com/openshift/kubernetes/pull/1493

Bug OCPBUGS-13379: Add a test which checks expected number of reboots for SNO

View the Description View the linked PRs

Some tests may cause unexpected reboots of nodes. On HA setups this is checked by "should report ready nodes the entire duration of the test run" test, which ensures Prometheus metric for node readiness didn't flip.

On SNO however we can't use the metrics, as the prometheus will go down along with the node and the node would become ready again before Prometheus/kube-state-metrics is up again. For SNO we have to check that the node has expected number of reboots - number of "rendered-master/rendered-worker" MC + 1

https://github.com/openshift/origin/pull/27993

Bug OCPBUGS-19701: Remove dependency on k8s.io/kubernetes packages

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18906~~. The following is the description of the original issue:
—
Using packages from k8s.io/kubernetes is not supported: https://github.com/kubernetes/kubernetes/issues/79384#issuecomment-505627280

This came about in this slack thread: https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1694210392218409?thread_ts=1694207119.447459&cid=C02CZNQHGN8

https://github.com/openshift/machine-config-operator/pull/3940

Bug OCPBUGS-19821: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/813

Bug OCPBUGS-7836: The MCD has a non-functional pivot command that should be deprecated

View the Description View the linked PRs

Description of problem:

The MCDaemon has a codepath for "pivot" used in older versions, and then as part of solutions articles to initiate a direct pivot to an ostree version, mostly used when things fail.

As of 4.12 this codepath should no longer work due to us switching to new format OSImage, so we should fully deprecate it.

This is likely where it fails:
https://github.com/openshift/machine-config-operator/blob/ecc6bf3dc21eb33baf56692ba7d54f9a3b9be1d1/pkg/daemon/rpm-ostree.go#L248

Version-Release number of selected component (if applicable):

4.12+

How reproducible:

Not sure but should be 100%

Steps to Reproduce:

1. Follow https://access.redhat.com/solutions/5598401
2.
3.

Actual results:

fails

Expected results:

MCD telling you pivot is deprecated

Additional info:

https://github.com/openshift/machine-config-operator/pull/3666

Bug OCPBUGS-16088: Secret generated by CCO on STS Manual Mode cluster does not have default section

View the Description View the linked PRs

Description of problem:

Secrets generated by CCO in STS mode is different than the one created by ccoctl on cmdline.

ccoctl generates:

[default]
sts_regional_endpoints = regional
role_arn = arn:aws:iam::269733383066:role/jsafrane-1-5h8rm-openshift-cluster-csi-drivers-aws-efs-cloud-cre
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

CCO generates:

sts_regional_endpoints = regional
role_arn = arn:aws:iam::269733383066:role/jsafrane-1-5h8rm-openshift-cluster-csi-drivers-aws-efs-cloud-cre
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

IMO these two should be the same. AWS EFS CSI driver does not work without "[default]" at the beginning.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-11-092038

How reproducible:

Always

Steps to Reproduce:

1. Create a Manual mode, STS cluster in AWS.
2. Create a CredentialsRequest which provides .spec.cloudTokenPath and .spec.providerSpec.stsIAMRoleARN.
3. Observe that secret is created by CCO in the target namespace specified by the CredentialsRequest.

Actual results:

The secrets does not have [default] in the `data` content.

Expected results:

https://github.com/openshift/cloud-credential-operator/pull/565

Task MGMT-14793: Assisted discovery core and root user shell should have proxy environment set

View the Description View the linked PRs

Background

When we run our agent we set the proxy environment variables as can be seen here

When the user SSHs into the host, the shell does not have those environment variables set.

Issue

This means that when the user is trying to debug network connectivity (for example, in day-2 users often SSH to see why they can't reach the day-1 cluster's API), they will usually try to run curl to see whether they can reach the URL themselves, but it might behave differently than the agent because the shell, by default, doesn't use the proxy settings.

Solution

Set the default environment variables (through .profile) of the core and root shells to include the same proxy environment variables as the agent, so that when the user logs into the host to run commands, they would have the same proxy settings as the ones the agent has.

Example

One example where we ran into this issue is when a customer forgot to set the correct noProxy settings in the UI during day-2, and so the agent was complaining about not being able to reach the day-1 API server (as the API server is unreachable through the proxy), but when we SSHd into the host and tried to curl, everything seemed to be working fine. Only after we ran tcpdump to see the difference in requests that we noticed the agent was routing requests through the proxy but curl wasn't, because the shell didn't have the proxy settings by default. If the shell had the correct proxy settings, it would've been easier to troubleshoot the problem.

https://github.com/openshift/assisted-service/pull/5373

Bug OCPBUGS-160: NS autolabeler requires RoleBinding subject namespace to be set when using ServiceAccount

View the Description View the linked PRs

Description of problem:

The NS autolabeler should adjust the PSS namespace labels such that a previously permitted workload (based on the SCCs it has access to) can still run.

The autolabeler requires the RoleBinding's .subjects[].namespace to be set when .subjects[].kind is ServiceAccount even though this is not required by the RBAC system to successfully bind the SA to a Role

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.7.0-0.ci-2021-05-21-142747
Server Version: 4.12.0-0.nightly-2022-08-15-150248
Kubernetes Version: v1.24.0+da80cd0

How reproducible: 100%

Steps to Reproduce:

---
apiVersion: v1
kind: Namespace
metadata:
name: test

---
apiVersion: v1
kind: ServiceAccount
metadata:
name: mysa
namespace: test

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: myrole
namespace: test
rules:
- apiGroups:
- security.openshift.io
resourceNames:
- privileged
resources:
- securitycontextconstraints
verbs:
- use

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: myrb
namespace: test
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: myrole
subjects:
- kind: ServiceAccount
name: mysa
#namespace: test # This is required for the autolabeler

---
kind: Job
apiVersion: batch/v1
metadata:
name: myjob
namespace: test
spec:
template:
spec:
containers:
- name: ubi
image: registry.access.redhat.com/ubi8
command: ["/bin/bash", "-c"]
args: ["whoami; sleep infinity"]
restartPolicy: Never
securityContext:
runAsUser: 0
serviceAccount: mysa
terminationGracePeriodSeconds: 2
{{}}

Actual results:

Applying the manifest, above, the Job's pod will not start:

$ kubectl -n test describe job/myjob...Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 20s job-controller Error creating: pods "myjob-zxcvv" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "ubi" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "ubi" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "ubi" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "ubi" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Warning FailedCreate 20s job-controller Error creating: pods "myjob-fkb9x" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "ubi" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "ubi" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "ubi" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "ubi" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Warning FailedCreate 10s job-controller Error creating: pods "myjob-5klpc" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "ubi" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "ubi" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "ubi" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "ubi" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Uncommenting the "namespace" field in the RoleBinding will allow it to start as the autolabeler will adjust the Namespace labels.

However, the namespace field isn't actually required by the RBAC system. Instead of using the autolabeler, the pod can be allowed to run by (w/o uncommenting the field):

$ kubectl label ns/test security.openshift.io/scc.podSecurityLabelSync=false
namespace/test labeled
$ kubectl label ns/test pod-security.kubernetes.io/enforce=privileged --overwrite
namespace/test labeled

We now see that the pod is running as root and has access to the privileged scc:

$ kubectl -n test get po -oyaml
apiVersion: v1
items:
- apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.129.2.18/23"],"mac_address":"0a:58:0a:81:02:12","gateway_ips":["10.129.2.1"],"ip_address":"10.129.2.18/23","gateway_ip":"10.129.2.1"'}}
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.129.2.18"
],
"mac": "0a:58:0a:81:02:12",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.129.2.18"
],
"mac": "0a:58:0a:81:02:12",
"default": true,
"dns": {}
}]
openshift.io/scc: privileged
creationTimestamp: "2022-08-16T13:08:24Z"
generateName: myjob-
labels:
controller-uid: 1867dbe6-73b2-44ea-a324-45c9273107b8
job-name: myjob
name: myjob-rwjmv
namespace: test
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: myjob
uid: 1867dbe6-73b2-44ea-a324-45c9273107b8
resourceVersion: "36418"
uid: 39f18dea-31d4-4783-85b5-8ae6a8bec1f4
spec:
containers:
- args:
- whoami; sleep infinity
command:
- /bin/bash
- -c
image: registry.access.redhat.com/ubi8
imagePullPolicy: Always
name: ubi
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-6f2h6
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: mysa-dockercfg-mvmtn
nodeName: ip-10-0-140-172.ec2.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
runAsUser: 0
serviceAccount: mysa
serviceAccountName: mysa
terminationGracePeriodSeconds: 2
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-6f2h6
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- configMap:
items:
- key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:24Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:28Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:28Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:24Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: cri-o://8fd1c3a5ee565a1089e4e6032bd04bceabb5ab3946c34a2bb55d3ee696baa007
image: registry.access.redhat.com/ubi8:latest
imageID: registry.access.redhat.com/ubi8@sha256:08e221b041a95e6840b208c618ae56c27e3429c3dad637ece01c9b471cc8fac6
lastState: {}
name: ubi
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-08-16T13:08:28Z"
hostIP: 10.0.140.172
phase: Running
podIP: 10.129.2.18
podIPs:
- ip: 10.129.2.18
qosClass: BestEffort
startTime: "2022-08-16T13:08:24Z"
kind: List
metadata:
resourceVersion: ""
{{}}

$ kubectl -n test logs job/myjob
root

Expected results:

The autolabeler should properly follow the RoleBinding back to the SCC

Additional info:

https://github.com/openshift/cluster-policy-controller/pull/107

Bug OCPBUGS-15893: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-12165: Wrong cleanup of stale conditions from OCPBUGS-2783

View the Description View the linked PRs

Description of problem:

While updating a cluster to 4.12.11, which contains the bug fix for [OCPBUGS-7999|https://issues.redhat.com/browse/OCPBUGS-7999] (which is the 4.12.z backport of [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783], it seems that the older {{{Custom|Default}RouteSync{Degraded|Progressing}}} conditions are not cleaned up as they should, as per [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783] resolution, while the newer ones are added.

Due to this, on an upgrade to 4.12.11 (or higher, until this bug is fixed), it is possible to hit a problem very similar to the one that lead to [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783] in the first place, but while upgrading to 4.12.11.

So, we need to do a proper cleanup of the older conditions.

Version-Release number of selected component (if applicable):

4.12.11 and higher

How reproducible:

Always in what regards the wrong conditions. It only leads to issues if one of the wrong conditions was in unhealthy state.

Steps to Reproduce:

1. Upgrade
2.
3.

Actual results:

Both new (and correct) conditions plus older (and wrong) conditions.

Expected results:

Both new (and correct) conditions only.

Additional info:

Problem seems to be that the stale conditions controller is created[1] with a list that says {{CustomRouteSync}} and {{DefaultRouteSync}}, while that list should be {{CustomRouteSyncDegraded}}, {{CustomRouteSyncProgressing}}, {{DefaultRouteSyncDegraded}} and {{DefaultRouteSyncProgressing}}. I read the source code of the controller a bit and it seems that it does not admit prefixes but performs a literal comparison.

[1] - https://github.com/openshift/console-operator/blob/0b54727/pkg/console/starter/starter.go#L403-L404

https://github.com/openshift/console-operator/pull/757

Bug OCPBUGS-13111: Error logs related to NTO Service during HostedCluster creation

View the Description View the linked PRs

Description of problem:
During the creation of a new HostedCluster, the control-plane-operator reports several lines of logs like

{"level":"error","ts":"2023-05-04T05:24:03Z","msg":"failed to remove service ca annotation and secret: %w","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","hostedControlPlane":{"name":"demo-02","namespace":"clusters-demo-02"},"namespace":"clusters-demo-02","name":"demo-02","reconcileID":"5ffe0a7f-94ce-4745-b89d-4d5168cabe8d","error":"failed to get service: Service \"node-tuning-operator\" not found","stacktrace":"github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane.(*HostedControlPlaneReconciler).reconcile\n\t/hypershift/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:929\ngithub.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane.(*HostedControlPlaneReconciler).update\n\t/hypershift/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:830\ngithub.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane.(*HostedControlPlaneReconciler).Reconcile\n\t/hypershift/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:677\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Until the Service / Secret are created.

Version-Release number of selected component (if applicable):

Management cluster: 4.14.0-nightly
Hosted Cluster: 4.13.0 or 4.14.0-nightly

How reproducible:

Always

Steps to Reproduce:

1. Create a hosted cluster

Actual results:

HostedCluster is created but there are several unnecessary "error" logs in the control-plane-operator

Expected results:

No error logs from control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:removeServiceCAAnnotationAndSecret() during normal cluster creation

Additional info:

https://github.com/openshift/hypershift/pull/2513

Bug OCPBUGS-18363: Regression issue: '/etc/cni/multus' is not mounted in multus-thick

View the Description View the linked PRs

Marko Luksa mentioned multus missing '/etc/cni/multus/net.d' mount in OCP4.14 and here's the repro step (verivied in multus team)

Our original reproducer would be too complex, so I had to write a simple one for you:
Use a 4.14 OpenShift cluster
Create the CNI plugin installer DaemonSet in namespace test:

oc apply -f https://gist.githubusercontent.com/luksa/c4d444e918124604839c424339c29a62/raw/1454bd389138980ea3f93bcfaf6026d4821e3543/noop-cni-plugin-installer.yaml

Create the test Deployment:

oc apply -f https://gist.githubusercontent.com/luksa/4c7c144ef88b1b0d8f772d6eacdeec14/raw/06b161fdb8c71406f4531d35550bd507a6a25200/test-deployment.yaml

Describe the test pod:

oc -n test describe po test

The last event shows the following:

ERRORED: error configuring pod [test/test-6cf67dcfb6-hgszq] networking: Multus: [test/test-6cf67dcfb6-hgszq/3e8a6f0d-ce84-4885-a7a7-43506669339f]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: GetCNIConfig: err in GetCNIConfigFromFile: No networks found in /etc/cni/multus/net.d

The same reproducer runs fine on OCP 4.13

https://github.com/openshift/cluster-network-operator/pull/1979

Bug OCPBUGS-12507: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/node_exporter/pull/128

Bug OCPBUGS-13969: Bump openshift-router to k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/router vendors Kubernetes 1.26 packages. OpenShift 4.14 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/router/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, k8s.io/apiserver, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.

https://github.com/openshift/router/pull/486

Bug OCPBUGS-5129: Unable to set capabilities with agent installer based installation

View the Description View the linked PRs

Description of problem:

I attempted to install a BM SNO with the agent based installer.
In the install_config, I disabled all supported capabilities except marketplace. Install_config snippet: 

capabilities:
  baselineCapabilitySet: None
  additionalEnabledCapabilities:
  - marketplace

The system installed fine but the capabilities config was not passed down to the cluster. 

clusterversion: 
status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - CSISnapshot
      - Console
      - Insights
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
      knownCapabilities:
      - CSISnapshot
      - Console
      - Insights
      - Storage
      - baremetal
      - marketplace
      - openshift-samples

oc -n kube-system get configmap cluster-config-v1 -o yaml
apiVersion: v1
data:
  install-config: |
    additionalTrustBundlePolicy: Proxyonly
    apiVersion: v1
    baseDomain: ptp.lab.eng.bos.redhat.com
    bootstrapInPlace:
      installationDisk: /dev/disk/by-id/wwn-0x62cea7f04d10350026c6f2ec315557a0
    compute:
    - architecture: amd64
      hyperthreading: Enabled
      name: worker
      platform: {}
      replicas: 0
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform: {}
      replicas: 1
    metadata:
      creationTimestamp: null
      name: cnfde8
    networking:
      clusterNetwork:
      - cidr: 10.128.0.0/14
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.16.231.0/24
      networkType: OVNKubernetes
      serviceNetwork:
      - 172.30.0.0/16
    platform:
      none: {}
    publish: External
    pullSecret: ""

Version-Release number of selected component (if applicable):

4.12.0-rc.5

How reproducible:

100%

Steps to Reproduce:

1. Install SNO with agent based installer as described above
2.
3.

Actual results:

Capabilities installed

Expected results:

Capabilities not installed

Additional info:

https://github.com/openshift/installer/pull/6923

Bug OCPBUGS-13006: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1946

Bug OCPBUGS-14015: Create helm release page doesn't show a YAML editor when schema isn't available (httpd-imagestreams chart)

View the Description View the linked PRs

Description of problem:
When try to import the Helm chart "httpd-imagestreams" the "Create Helm Release" page shows a info alert that the form isn't avaiable because there isn't a schema for this helm chart. But the YAML view is also not visible.

Info Alert:

Form view is disabled for this chart because the schema is not available

Version-Release number of selected component (if applicable):
4.9-4.14 (current master)

How reproducible:
Always

Steps to Reproduce:

Switch to the developer perspective
Navigate to Add > Helm Chart
Search and select "httpd-imagestreams", click the card and then Create to open the "Create Helm Release" page

Actual results:

Form / YAML switch is disabled
Info alert is shown: Form view is disabled for this chart because the schema is not available
There is no YAML editor

Expected results:

It's fine that the Form/ YAML switch is disabled
Info alert is also fine
YAML editor should be displayed

Additional info:
The chart yaml is available here and doesn't contain a schema (at the moment).

https://github.com/openshift-helm-charts/charts/blob/main/charts/redhat/redhat/httpd-imagestreams/0.0.1/src/Chart.yaml

https://github.com/openshift/console/pull/12914

Bug OCPBUGS-8305: IPI on Power VS clusters cannot deploy MCO

View the Description View the linked PRs

Description of problem:

machine-config-operator will fail on clusters deployed with IPI on Power Virtual Server with the following error:

Cluster not available for []: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: spec.infra.status.platformStatus.powervs.resourceGroup: Invalid value: "": spec.infra.status.platformStatus.powervs.resourceGroup in body should match '^[a-zA-Z0-9-_

Version-Release number of selected component (if applicable):

4.14 and 4.13

How reproducible:

100%

Steps to Reproduce:

1. Deploy with openshift-installer to Power VS
2. Wait for masters to start deploying
3. Error will appear for the machine-config CO

Actual results:

MCO fails

Expected results:

MCO should come up

Additional info:

Fix has been identified

https://github.com/openshift/installer/pull/6928

Bug OCPBUGS-16599: Pipelines Creation YAML form is not allowing v1beta1 YAMLs get created

View the Description View the linked PRs

Description of problem:

Pipelines Creation YAML form is not allowing v1beta1 YAMLs get created

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Open the Pipelines Creation YAML form
2. Paste the following YAML
3. Submit the form

Actual results:

The form doesnot submit, stating version mismatch. Expects v1, got v1beta1

Expected results:

We must support the creation of both the versions in the YAML form

Additional info:

The issue is not observed when the "Import from YAML" Form is used.

Attachment: https://drive.google.com/file/d/1B_sAuGREgmX800JXGmrL30iByowfHzs7/view?usp=sharing

https://github.com/openshift/console/pull/13034

Story MCO-640: Move all log functions to klog

View the Description View the linked PRs

https://github.com/kubernetes/klog is the favored fork of glog, which resolves a lot of issues that are not supported in containerized environments

https://github.com/openshift/machine-config-operator/pull/3734

Bug OCPBUGS-16615: Prometheus reporting telemetry test intermittent failures due to server side rate limiting

View the Description View the linked PRs

Description of problem:

The TRT ComponentReadiness tool shows what looks like a regression (https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-05-16%2023%3A59%3A59&baseRelease=4.13&baseStartTime=2023-04-16%2000%3A00%3A00&capability=Other&component=Monitoring&confidence=95&environment=ovn%20no-upgrade%20amd64%20aws%20hypershift&excludeArches=heterogeneous%2Carm64%2Cppc64le%2Cs390x&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2023-07-20%2023%3A59%3A59&sampleRelease=4.14&sampleStartTime=2023-07-13%2000%3A00%3A00&testId=openshift-tests%3A79898d2e28b78374d89e10b38f88107b&testName=%5Bsig-instrumentation%5D%20Prometheus%20%5Bapigroup%3Aimage.openshift.io%5D%20when%20installed%20on%20the%20cluster%20should%20report%20telemetry%20%5BLate%5D%20%5BSkipped%3ADisconnected%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&variant=hypershift)

in the "[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster should report telemetry [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" test.

In the ComponentReadiness link above, you can see the sample runs (linked with red "F").

Version-Release number of selected component (if applicable):

4.14

How reproducible:

The pass rate in 4.13 is 100% vs. 81% in 4.14

Steps to Reproduce:

1.  There query above focuses on "periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance" jobs and the specific test mentioned.  You can see the failures by clicking on the red "F"s
2.
3.

Actual results:

The failures look like:

{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:365]: Unexpected error:
    <errors.aggregate | len:2, cap:2>: 
    [promQL query returned unexpected results:
    metricsclient_request_send{client="federate_to",job="telemeter-client",status_code="200"} >= 1
    [], promQL query returned unexpected results:
    federate_samples{job="telemeter-client"} >= 10
    []]
    [
        <*errors.errorString | 0xc0017611b0>{
            s: "promQL query returned unexpected results:\nmetricsclient_request_send{client=\"federate_to\",job=\"telemeter-client\",status_code=\"200\"} >= 1\n[]",
        },
        <*errors.errorString | 0xc00203d380>{
            s: "promQL query returned unexpected results:\nfederate_samples{job=\"telemeter-client\"} >= 10\n[]",
        },
    ]

Expected results:

Query should succeed

Additional info:

I set the severity to Major because this looks like a regression from where it was in the 5 weeks before 4.13 went GA.

Task ODC-7318: Update ODC owners

View the Description View the linked PRs

Add Vikram to frontend approvers
Remove Andrew from that list

https://github.com/openshift/console/pull/12839

Bug OCPBUGS-13789: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/489

Bug OCPBUGS-13921: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/523

Bug OCPBUGS-13946: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-1973: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/341

Bug MGMT-14721: Cluster fails to prepare when providing local ICSP in install config in SaaS

View the Description View the linked PRs

Description of the problem:

When providing an ICSP in the install config for caching images locally when also using the SaaS the cluster fails to prepare for installation because oc adm release extract is trying to use the ICSP from the install config.

How reproducible:

100% on a fresh deploy, but if the installer cache is already warmed up 0%

Steps to reproduce:

1. Deploy fresh replicas to the SaaS environment

2. Create a cluster

3. Override install config and add ICSP content for an inaccessable (from the SaaS) registry

4. Install cluster

Actual results:

Cluster fails to prepare with an error like:

Failed to prepare the installation due to an unexpected error: failed generating install config for cluster f3e55b14-297d-453b-8ef4-953caebefc67: failed to get installer path: command 'oc adm release extract --command=openshift-install --to=/data/install-config-generate/installercache/quay.io/openshift-release-dev/ocp-release:4.13.0-x86_64 --insecure=false --icsp-file=/tmp/icsp-file1525063401 quay.io/openshift-release-dev/ocp-release:4.13.0-x86_64 --registry-config=/tmp/registry-config882468533' exited with non-zero exit code 1: warning: --icsp-file only applies to images referenced by digest and will be ignored for tags error: unable to read image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81be8aec46465412abbef5f1ec252ee4a17b043e82d31feac13d25a8a215a2c9: unauthorized: access to the requested resource is not authorized . Please retry later

Expected results:

Installer image is pulled successfully.

Additional Information

This seems to have been introduced in https://github.com/openshift/assisted-service/pull/4115 when we started pulling ICSP information from the install config.

https://github.com/openshift/assisted-service/pull/5245

Bug OCPBUGS-10807: multus-admission-controller should not run as root under Hypershift-managed CNO

View the Description View the linked PRs

Description of problem:

Cluster Network Operator managed component multus-admission-controller does not conform to Hypershift control plane expectations.

When CNO is managed by Hypershift, multus-admission-controller and other CNO-managed deployments should run with non-root security context. If Hypershift runs control plane on kubernetes (as opposed to Openshift) management cluster, it adds pod security context to its managed deployments, including CNO, with runAsUser element inside. In such a case CNO should do the same, set security context for its managed deployments, like multus-admission-controller, to meet Hypershift security rules.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift using Kube management cluster
2.Check pod security context of multus-admission-controller

Actual results:

no pod security context is set on multus-admission-controller

Expected results:

pod security context is set with runAsUser: xxxx

Additional info:

Corresponding CNO change

https://github.com/openshift/hypershift/pull/2319

Bug OCPBUGS-15568: Cluster resource quota should control resource limits across namespaces showing regression in 4.14

View the Description View the linked PRs

Description of problem:
Component Readiness is showing a regression in 4.14 compared to 4.13 in the rt variant of test Cluster resource quota should control resource limits across namespaces. Example

{  fail [github.com/openshift/origin/test/extended/quota/clusterquota.go:107]: unexpected error: timed out waiting for the condition
Ginkgo exit error 1: exit with code 1}

Looker studio graph (scroll down to see) shows the regression started around May 24th.

Version-Release number of selected component (if applicable):

How reproducible:
4.13 Sippy shows 100% success rate vs. 4.14 which is down to about 91%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Historical pass rate was 100%

Additional info:

Bug OCPBUGS-15154: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17285: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/33

Bug OCPBUGS-12233: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/535

Bug OCPBUGS-12996: Prometheus UI reports Error opening React index.html: open web/ui/static/react/index.html: no such file or directory

View the Description View the linked PRs

Description of problem:
Same for OCP 4.14.

In OCP 4.13 when trying to reach prometheus UI  via port-forward, e.g. `oc port-forward prometheus-k8s-0` the UI url($HOST:9090/graph) is returning `Error opening React index.html: open web/ui/static/react/index.html: no such file or directory`

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-24-061922

How reproducible:

100%

Steps to Reproduce:

1.  oc -n openshift-monitoring port-forward prometheus-k8s-0 9090:9090 --address='0.0.0.0' 

2. curl http://localhost:9090/graph

Actual results:

Error opening React index.html: open web/ui/static/react/index.html: no such file or directory

Expected results:

Prometheus UI is loaded

Additional info:

 The UI loads fine when following the same steps in 4.12.

https://github.com/openshift/prometheus/pull/162

Task HOSTEDCP-1102: Remove Release Version Check When Reconciling ImageContentSources

View the Description View the linked PRs

Removes the version check on reconciling the image content type policy since that is not needed in release image versions greater than 4.13.

https://github.com/openshift/hypershift/pull/2847

Story API-1537: rebase openshift/apiserver

View the linked PRs

https://github.com/openshift/openshift-apiserver/pull/360

Bug OCPBUGS-14668: visiting Configurations page returns error Cannot read properties of undefined (reading 'apiGroup')

View the Description View the linked PRs

Description of problem:

visiting global configurations page will return error after 'Red Hat OpenShift Serverless' is installed, the error persist even operator is uninstalled

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-06-212044

How reproducible:

Always

Steps to Reproduce:

1. Subscribe 'Red Hat OpenShift Serverless' from OperatorHub, wait for the operator to be successfully installed
2. Visit Administration -> Cluster Settings -> Configurations tab

Actual results:

react_devtools_backend_compact.js:2367 unhandled promise rejection: TypeError: Cannot read properties of undefined (reading 'apiGroup') 
    at r (main-chunk-e70ea3b3d562514df486.min.js:1:1)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1
    at Array.map (<anonymous>)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1
overrideMethod @ react_devtools_backend_compact.js:2367
window.onunhandledrejection @ main-chunk-e70ea3b3d562514df486.min.js:1

main-chunk-e70ea3b3d562514df486.min.js:1 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'apiGroup')
    at r (main-chunk-e70ea3b3d562514df486.min.js:1:1)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1
    at Array.map (<anonymous>)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1

Expected results:

no errors

Additional info:

https://github.com/openshift/console/pull/12882

Bug OCPBUGS-19737: Faster risk cache warming

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19512~~. The following is the description of the original issue:
—
~~OCPBUGS-5469~~ and backports began prioritizing later target releases, but we still wait 10m between different PromQL evaluations while evaluating conditional update risks. This ticket is tracking work to speed up cache warming, and allows changes that are too invasive to be worth backporting.

Definition of done:

When presented with new risks, the CVO will initially evaluate one PromQL expression every second or so, instead of waiting 10m between different evaluations. Each PromQL expression will still only be evaluated once every hour or so, to avoid excessive load on the PromQL engine.

Acceptance Criteria:

After changing the channel and receiving a new graph conditional risks are evaluated as quickly as possible, ideally less than 500ms per unique risk

https://github.com/openshift/cluster-version-operator/pull/973

Bug OCPBUGS-16684: CR.status.provisioned set to true for ignored CRs

View the Description View the linked PRs

Description of problem:

In an STS cluster with the TechPreviewNoUpgrade featureset enabled, CCO ignores CRs whose .spec.providerSpec.stsIAMRoleARN is unset. 

While the CR controller does not provision a Secret for the aforementioned type of CRs, it still sets .status.provisioned to true for them.

Steps to Reproduce:

1. Create an STS cluster, enable the feature set. 

2. Create a dummy CR like the following:
fxie-mac:cloud-credential-operator fxie$ cat cr2.yaml
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: test-cr-2
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - action:
      - ec2:CreateTags
      effect: Allow
      resource: '*'
  secretRef:
    name: test-secret-2
    namespace: default
  serviceAccountNames:
  - default

3. Check CR.status
fxie-mac:cloud-credential-operator fxie$ oc get credentialsrequest test-cr-2 -n openshift-cloud-credential-operator -o yaml
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  creationTimestamp: "2023-07-24T09:21:44Z"
  finalizers:
  - cloudcredential.openshift.io/deprovision
  generation: 1
  name: test-cr-2
  namespace: openshift-cloud-credential-operator
  resourceVersion: "180154"
  uid: 34b36cac-3fca-4fa5-a003-a9b64c5fbf00
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - action:
      - ec2:CreateTags
      effect: Allow
      resource: '*'
  secretRef:
    name: test-secret-2
    namespace: default
  serviceAccountNames:
  - default
status:
  lastSyncGeneration: 0
  lastSyncTimestamp: "2023-07-24T09:39:40Z"
  provisioned: true

https://github.com/openshift/cloud-credential-operator/pull/583

Bug OCPBUGS-7973: [IBMCloud] destroyed the private cluster, fail to cleanup the dns records

View the Description View the linked PRs

Description of problem:

After destroyed the private cluster, the cluster's dns records left.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-02-26-022418 
4.13.0-0.nightly-2023-02-26-081527

How reproducible:

always

Steps to Reproduce:

1.create a private cluster
2.destroy the cluster
3.check the dns record  
$ibmcloud dns zones | grep private-ibmcloud.qe.devcluster.openshift.com (base_domain)
3c7af30d-cc2c-4abc-94e1-3bcb36e01a9b   private-ibmcloud.qe.devcluster.openshift.com     PENDING_NETWORK_ADD
$zone_id=3c7af30d-cc2c-4abc-94e1-3bcb36e01a9b
$ibmcloud dns resource-records $zone_id
CNAME:520c532f-ca61-40eb-a04e-1a2569c14a0b   api-int.ci-op-wkb4fgd6-eef7e.private-ibmcloud.qe.devcluster.openshift.com   CNAME   60    10a7a6c7-jp-tok.lb.appdomain.cloud   
CNAME:751cf3ce-06fc-4daf-8a44-bf1a8540dc60   api.ci-op-wkb4fgd6-eef7e.private-ibmcloud.qe.devcluster.openshift.com       CNAME   60    10a7a6c7-jp-tok.lb.appdomain.cloud   
CNAME:dea469e3-01cd-462f-85e3-0c1e6423b107   *.apps.ci-op-wkb4fgd6-eef7e.private-ibmcloud.qe.devcluster.openshift.com    CNAME   120   395ec2b3-jp-tok.lb.appdomain.cloud

Actual results:

the dns records of the cluster were left

Expected results:

created dns record by installer are all deleted, after destroyed the cluster

Additional info:

this block create private cluster later, caused the maximum limit of 5 wildcard records are easily reached. (qe account limitation)
checking the *ingress-operator.log of the failed cluster, got the error: "createOrUpdateDNSRecord: failed to create the dns record: Reached the maximum limit of 5 wildcard records."

https://github.com/openshift/installer/pull/6924

Bug OCPBUGS-8068: The node's last_error disappears briefly on cleaning failure

View the Description View the linked PRs

It is caused by the power off routine, which initialises last_error to None. The field is later restored, but BMO manages to observe and record the wrong value.

This issue is not trivial to reproduce in the product. You need ~~OCPBUGS-2471~~ to land first, then you need to trigger the cleaning failure several times. I used direct access to Ironic via CLI to abort cleaning (`baremetal node abort <node name>`) during deprovisioning. After a few attempts you can observe the following in the BMH's status:

status:
  errorCount: 2
  errorMessage: 'Cleaning failed: '
  errorType: provisioning error

The empty message after the colon is a sign of this bug.

https://github.com/openshift/ironic-image/pull/353

Bug MGMT-15340: Static networking fails for long interface names

View the Description View the linked PRs

Description of the problem:

If an interface name is over 15 characters long network manager refuses to allow the interface to come up.

How reproducible:

Depends on the system interface names

Steps to reproduce:

1. Create a cluster with static networking (a vlan with a large id works best)

2. Boot a host with the discovery ISO

Actual results:

Host interface does not come up if the resulting interface name is over 15 characters

Expected results:

Interfaces should always come up

Additional info:

Slack thread: https://redhat-internal.slack.com/archives/CUPJTHQ5P/p1689956128746919?thread_ts=1689774706.220319&cid=CUPJTHQ5P

Attached a screenshot of the log stating the connection name is too long.

This happens because our script to apply static networking on a host uses the host interface name and appends the extension nmstate added for the interface.

In this case the interface name was enp94s0f0np0 with a vlan id of 2507. This meant that the resulting interface name was enp94s0f0np0.2507 (17 characters).

When configuring this interface manually as a workaround the user stated that the interface name (not the vlan id) was truncated to accommodate the length limit.
So in this case the valid interface created by nmcli was "enp94s0f0n.2507" we should attempt to replicate this behavior.

Also attached a screenshot of the working interface.

https://github.com/openshift/assisted-service/pull/5389

Bug OCPBUGS-14002: showTooltips check box is not aligned with other icons

View the Description View the linked PRs

Description of problem:

'Show tooltips' toggle is added on resource YAML page, but the checkbox icon seems not aligned with other icons

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-23-103225

How reproducible:

Always

Steps to Reproduce:

1. goes to any resource YAML page, check 'Show tooltips' icon position
2.
3.

Actual results:

1. the checkbox is a little above other icons, see screenshot https://drive.google.com/file/d/10wKeRaaE76GBXBph93wAkFCWYGrEKcA9/view?usp=share_link

Expected results:

1. all icons should be aligned

Additional info:

https://github.com/openshift/console/pull/12894

Bug OCPBUGS-15999: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-13960~~.

https://github.com/openshift/installer/pull/7310

Feature Request RFE-4144: Support new Azure LoadBalancer 100min idle TCP timeout

View the Description View the linked PRs

1. Proposed title of this feature request
Support new Azure LoadBalancer 100min idle TCP timeout

2. What is the nature and description of the request ?
When provisioning a service of type LoadBalancer for OCP cluster on Azure, it is possible to customize TCP idle timeouts in minutes using the LoadBalancer annotations 'service.beta.kubernetes.io/azure-load-balancer-tcp-idle-timeout'

Currently, min and max values are hardcoded to respectively 4 an 30 in both legacy Azure Cloud Provider implementation and cloud Provider Azure

Recently Azure upgraded its implementation to support a max of 100 min for idle timeout, corresponding documentation should be updated soon Configure TCP reset and idle timeout for Azure Load Balancer. It is now possible to use such idle timeout with more than 30min manually in Azure portal or with Azure cli but not possible from K8s load balancer as max value is still 30min in K8s code.
Error message returned is

`Warning  SyncLoadBalancerFailed  2s (x3 over 18s)    service-controller  Error syncing load balancer: failed to ensure load balancer: idle timeout value must be a whole number representing minutes between 4 and 30`

3. Why does the customer need this? (List the business requirements here)
Customer is migrating workloads from on premise datacenter to Azure. Using idle timeout with more than 30min is critical to migrate some of our customer links to Azure and is preventing the migration until this is supported by Openshift

4. List any affected packages or components.
Azure cloud controler

https://github.com/openshift/cloud-provider-azure/pull/80

Bug OCPBUGS-11595: HAProxy Segfaulting

View the Description View the linked PRs

Seeing segfault failures related to HAProxy on multiple platforms that begin around the same time as the [HAProxy bump|http://example.com] like:

{ nodes/ci-op-5s09hi2q-0dd98-rwds8-worker-centralus1-8nkx5/journal.gz:Apr 10 06:21:54.317971 ci-op-5s09hi2q-0dd98-rwds8-worker-centralus1-8nkx5 kernel: haproxy[302399]: segfault at 0 ip 0000556eadddafd0 sp 00007fff0cceed50 error 4 in haproxy[556eadc00000+2a3000]}

Sippy Node Process Segfaulted

release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade/1645265104259780608

periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-ovn-upgrade/1645265114720374784

periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade/1644449798939480064

Bug OCPBUGS-12739: IPv6 ingress VIP not configured in keepalived on vSphere Dual-stack

View the Description View the linked PRs

Description of problem:

The IPv6 VIP does not seem to be present in the keepalived.conf.

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd65:10:128::/56
    hostPrefix: 64
  machineNetwork:
  - cidr: 192.168.110.0/23
  - cidr: fd65:a1a8:60ad::/112
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
  - fd65:172:16::/112
platform:
  vsphere:
    apiVIPs:
    - 192.168.110.116
    - fd65:a1a8:60ad:271c::1116
    ingressVIPs:
    - 192.168.110.117
    - fd65:a1a8:60ad:271c::1117
    vcenters:
    - datacenters:
      - IBMCloud
      server: ibmvcenter.vmc-ci.devcluster.openshift.com

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-04-21-084440

How reproducible:

Frequently.
2 failures out of 3 attemps.

Steps to Reproduce:

1. Install vSphere dual-stack with dual VIPs, see above config
2. Check keepalived.conf
for f in $(oc get pods -n openshift-vsphere-infra -l app=vsphere-infra-vrrp --no-headers -o custom-columns=N:.metadata.name  ) ; do oc -n openshift-vsphere-infra exec -c keepalived $f -- cat /etc/keepalived/keepalived.conf | tee $f-keepalived.conf ; done

Actual results:

IPv6 VIP is not in keepalived.conf

Expected results:

vrrp_instance rbrattai_INGRESS_1 {
    state BACKUP
    interface br-ex
    virtual_router_id 129
    priority 20
    advert_int 1

    unicast_src_ip fd65:a1a8:60ad:271c::cc
    unicast_peer {
        fd65:a1a8:60ad:271c:9af:16a9:cb4f:d75c
        fd65:a1a8:60ad:271c:86ec:8104:1bc2:ab12
        fd65:a1a8:60ad:271c:5f93:c9cf:95f:9a6d
        fd65:a1a8:60ad:271c:bb4:de9e:6d58:89e7
        fd65:a1a8:60ad:271c:3072:2921:890:9263
    }
...
    virtual_ipaddress {
        fd65:a1a8:60ad:271c::1117/128
    }
...
}

Additional info:

See ~~OPNET-207~~

Bug OCPBUGS-14072: TestAlertmanagerUWMSecrets test flaky

View the Description View the linked PRs

colored textTestAlertmanagerUWMSecrets is one the of test that times out see https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/1971/pull-ci-openshift-cluster-monitoring-operator-master-e2e-agnostic-operator/1661649123104788480. Apparently it takes longer for UWM alertmanager to become ready.

https://github.com/openshift/cluster-monitoring-operator/pull/1973

Bug OCPBUGS-11092: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3654

Bug OCPBUGS-13532: Host can get stuck on inspecting if the network secret is updated

View the Description View the linked PRs

Description of problem:

It seems that we don't correctly update the network data secret version in the PreprovisioningImage, resulting in BMO assuming that the image is still stale, while the image-customization-controller assumes it's done. As a result, the host is stuck in inspecting.

How reproducible:

What I think I did is to add a network data secret to a host which already has a preprovisioningimage previously created. I need to check if I can repeat it.

Actual results:

Host in inspecting, BMO logs show

{"level":"info","ts":"2023-05-11T11:52:52.348Z","logger":"controllers.BareMetalHost","msg":"network data in pre-provisioning image is out of date","baremetalhost":"openshift-machine-api/oste
st-extraworker-0","provisioningState":"inspecting","latestVersion":"9055823","currentVersion":"9055820"}

Indeed, the image has the old version:

status:
  architecture: x86_64
  conditions:
  - lastTransitionTime: "2023-05-11T11:27:51Z"
    message: Generated image
    observedGeneration: 1
    reason: ImageSuccess
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-05-11T11:27:51Z"
    message: ""
    observedGeneration: 1
    reason: ImageSuccess
    status: "False"
    type: Error
  format: iso
  imageUrl: http://metal3-image-customization-service.openshift-machine-api.svc.cluster.local/231b39d5-1b83-484c-9096-aa87c56a222a
  networkData:
    name: ostest-extraworker-0-network-config-secret
    version: "9055820"

What I find puzzling is that we even have two versions of the secret. I only created it once.

Bug OCPBUGS-16959: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7373

Bug OCPBUGS-8310: Bump openshift/origin to kube 1.26.2

View the Description View the linked PRs

Bump to pick up fixes.

https://github.com/openshift/origin/pull/27764

Story AUTH-362: KRP: Maintenance Work

View the Description View the linked PRs

What

Address issues and PRs.

In particular:

Make downstream version bump
Merge Standa's Open PR.

Why

A healthy open source repo is being maintained and keeps users.

https://github.com/openshift/kube-rbac-proxy/pull/70

Bug OCPBUGS-14399: Unable to set protectKernelDefaults from "true" to "false" in kubelet.conf

View the Description View the linked PRs

Description of problem:

Unable to set protectKernelDefaults from "true" to "false" in kubelet.conf on the nodes in RHOCP4.13 although this was possible in RHOCP4.12.

Version-Release number of selected component (if applicable):

   Red Hat OpenShift Container Platform Version Number: 4
   Release Number: 13
   Kubernetes Version: v1.26.3+b404935
   Docker Version: N/A
   Related Package Version: 
	   - cri-o-1.26.3-3.rhaos4.13.git641290e.el9.x86_64
   Related Middleware/Application: none
   Underlying RHEL Release Number: Red Hat Enterprise Linux CoreOS release 4.13
   Underlying RHEL Architecture: x86_64
   Underlying RHEL Kernel Version: 5.14.0-284.13.1.el9_2.x86_64
   
Drivers or hardware or architecture dependency: none

How reproducible:


 always

Steps to Reproduce:

    1. Deploy OCP cluster using RHCOS
    2. Set protectKernelDefaults as true using the document [1]

Actual results:

protectKernelDefaults can't be set.

Expected results:

 protectKernelDefaults can be set.

Additional info:



protectKernelDefaults in NOT set in kubelet.conf

    ---
    # oc debug node/ocp4-worker1

    # chroot /host

    # cat /etc/kubernetes/kubelet.conf
      ...
      "protectKernelDefaults": true, <- NOT modified. Moreover, the format is changed to json.
      ...
    ---

Also    "protectKernelDefaults: false" does not seem to be set into the machineConfig created by kubeletConfig Kind. See below:

    ---
    # oc get mc 99-worker-generated-kubelet -o yaml
    ...
    storage:
      files:
      - contents:
          compression: "" 
          source: data:text/plain;charset=utf-8;base64, [The contents of kubelet.conf encoded with base64]
        mode: 420
        overwrite: true
        path: /etc/kubernetes/kubelet.conf

    // Write [The contents of kubelet.conf encoded with base64] to the file.
    # vim kubelet.conf 

    // Decode [The contents of kubelet.conf encoded with base64]
    # cat kubelet.conf | base64 -d
    ...
    "protectKernelDefaults": true, <- "protectKernelDefaults: false" is not set.
    ----



[1] https://access.redhat.com/solutions/6974438

https://github.com/openshift/machine-config-operator/pull/3736

Bug OCPBUGS-14813: Update OWNERS and OWNERS_ALIASES in external-snapshotter repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-snapshotter/pull/102

Bug MGMT-15368: Failing to scale down nodepool - machines stuck in Deleting phase

View the Description View the linked PRs

Description of the problem:

After creating successfully a hosted cluster using CAPI agent with 6 worker nodes (on two different subnets), I attempted to scale down the nodepool to 0 replicas.

2 agents returned to infraenv in "known-unbound" state, but the other 4 are still bound to the cluster., and their related machines CR are stuck in phase Deleting

$ oc get machines.cluster.x-k8s.io -n clusters-hosted-1
NAME                        CLUSTER          NODENAME            PROVIDERID                                     PHASE      AGE   VERSION
hosted-1-6655884866-dr4mv   hosted-1-vhc4f   hosted-rwn-1-1      agent://4cc93549-45cd-42a9-8c61-5d72b802ebe5   Deleting   94m   4.14.0-ec.3
hosted-1-6655884866-fkfjf   hosted-1-vhc4f   hosted-worker-1-0   agent://324afeeb-1af1-45d9-a2ba-f1101ffb6a6b   Deleting   94m   4.14.0-ec.3
hosted-1-6655884866-nzflz   hosted-1-vhc4f   hosted-rwn-1-2      agent://50b12199-7e95-4b3a-a5ce-d4aa0fa7909e   Deleting   94m   4.14.0-ec.3
hosted-1-6655884866-pc67l   hosted-1-vhc4f   hosted-worker-1-2   agent://284eb9e6-4375-4e59-9a11-a0a3131aa08b   Deleting   94m   4.14.0-ec.3

In the capi-provider pod logs I have the following:

time="2023-07-25T15:23:27Z" level=error msg="failed to add finalizer agentmachine.agent-install.openshift.io/deprovision to resource hosted-1-2ntnh clusters-hosted-1" func="github.com/openshift/cluster-api-provider-agent/controllers.(*AgentMachineReconciler).handleDeletionHook" file="/remote-source/app/controllers/agentmachine_controller.go:206" agent_machine=hosted-1-2ntnh agent_machine_namespace=clusters-hosted-1 error="Operation cannot be fulfilled on agentmachines.capi-provider.agent-install.openshift.io \"hosted-1-2ntnh\": StorageError: invalid object, Code: 4, Key: /kubernetes.io/capi-provider.agent-install.openshift.io/agentmachines/clusters-hosted-1/hosted-1-2ntnh, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 75febba6-8e98-4fca-861f-e83c467a3368, UID in object meta: "

and

time="2023-07-25T15:23:50Z" level=error msg="Failed to get agentMachine clusters-hosted-1/hosted-1-l4pp7" func="github.com/openshift/cluster-api-provider-agent/controllers.(*AgentMachineReconciler).Reconcile" file="/remote-source/app/controllers/agentmachine_controller.go:95" agent_machine=hosted-1-l4pp7 agent_machine_namespace=clusters-hosted-1 error="AgentMachine.capi-provider.agent-install.openshift.io \"hosted-1-l4pp7\" not found"

Actual results:

4 out of 6 agents are still bound to cluster

Expected results:

The nodepool is scaled to 0 replicas

https://github.com/openshift/hypershift/pull/2944

Bug OCPBUGS-12990: Custom `Downloads` route is not being updated within the `https://custom-console-route/command-line-tools`

View the Description View the linked PRs

Description of problem:

After customizing the routes for Console and Downloads, the `Downloads` route is not being updated within the `https://custom-console-route/command-line-tools` and still pointing the old/default downloads route.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Customize Console and Downloads routes.
2. Access the web-console using custom console route.
3. Go to Command-line-tools.
4. Try to access the downloads urls.

Actual results:

While accessing the downloads urls, it is pointing towards default/old downloads route

Expected results:

While accessing the downloads urls, it should be pointing towards custom downloads route

Additional info:

https://github.com/openshift/console-operator/pull/761

Bug OCPBUGS-14965: Static DHCP scripts SELinux problem setting hostname

View the Description View the linked PRs

Description of problem:

As discovered in https://bugzilla.redhat.com/show_bug.cgi?id=2111632 the dispatcher scripts don't have permission to set the hostname directly. We need to use systemd-run to get them into an appropriate SELinux context.

I doubt the static DHCP scripts are still being used intentionally since we have proper static IP support now, but since the fix is pretty trivial we should go ahead and do it since technically the feature is still supported.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3746

Bug OCPBUGS-17049: CR.status.lastSyncGeneration is not updated

View the Description View the linked PRs

Description of problem:

CR.status.lastSyncGeneration is not updated in STS mode (AWS).

Steps to Reproduce:

See https://issues.redhat.com/browse/OCPBUGS-16684.

https://github.com/openshift/cloud-credential-operator/pull/585

Bug OCPBUGS-2960: Master stuck in a creating/deleting loop when drop vmsize field from the CPMS providerSpec

View the Description View the linked PRs

Description of problem:

On Azure when drop vmsize or location field from cpms's providerSpec, a master will be in a creating/deleting loop.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-10-25-210451

How reproducible:

always

Steps to Reproduce:

1. Create an Azure cluster with a CPMS
2. Activate the CPMS
3. Drop the vmsize field from the providerSpec

Actual results:

New machine is created, deleted, created, deleted ...
$ oc get machine         
NAME                                    PHASE      TYPE              REGION   ZONE   AGE
zhsuncpms1-7svhz-master-0               Running    Standard_D8s_v3   eastus   2      3h21m
zhsuncpms1-7svhz-master-1               Running    Standard_D8s_v3   eastus   3      3h21m
zhsuncpms1-7svhz-master-2               Running    Standard_D8s_v3   eastus   1      3h21m
zhsuncpms1-7svhz-master-l489k-0         Deleting                                     0s
zhsuncpms1-7svhz-worker-eastus1-6vsl4   Running    Standard_D4s_v3   eastus   1      3h16m
zhsuncpms1-7svhz-worker-eastus2-dpvp9   Running    Standard_D4s_v3   eastus   2      3h16m
zhsuncpms1-7svhz-worker-eastus3-sg7dx   Running    Standard_D4s_v3   eastus   3      19m
$ oc get machine  
NAME                                    PHASE     TYPE              REGION   ZONE   AGE
zhsuncpms1-7svhz-master-0               Running   Standard_D8s_v3   eastus   2      3h26m
zhsuncpms1-7svhz-master-1               Running   Standard_D8s_v3   eastus   3      3h26m
zhsuncpms1-7svhz-master-2               Running   Standard_D8s_v3   eastus   1      3h26m
zhsuncpms1-7svhz-master-wmnfq-0                                                     1s
zhsuncpms1-7svhz-worker-eastus1-6vsl4   Running   Standard_D4s_v3   eastus   1      3h21m
zhsuncpms1-7svhz-worker-eastus2-dpvp9   Running   Standard_D4s_v3   eastus   2      3h21m
zhsuncpms1-7svhz-worker-eastus3-sg7dx   Running   Standard_D4s_v3   eastus   3      24m

$ oc get controlplanemachineset   
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         4         3                               Active   25m
$ oc get co control-plane-machine-set      
NAME                        VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
control-plane-machine-set   4.12.0-0.nightly-2022-10-25-210451   True        True          False      4h38m   Observed 3 replica(s) in need of update

Expected results:

Errors are logged and no machine is created or new machine could be created successful.

Additional info:

Drop vmSize, we can create new machine, seems default value is Standard_D4s_v3, but don't allow update.
$ oc get machine                
NAME                                      PHASE         TYPE              REGION   ZONE   AGE
zhsunazure11-cdbs8-master-0               Running       Standard_D8s_v3   eastus   2      4h7m
zhsunazure11-cdbs8-master-000             Provisioned   Standard_D4s_v3   eastus   2      48s
zhsunazure11-cdbs8-master-1               Running       Standard_D8s_v3   eastus   3      4h7m
zhsunazure11-cdbs8-master-2               Running       Standard_D8s_v3   eastus   1      4h7m
zhsunazure11-cdbs8-worker-eastus1-5v66l   Running       Standard_D4s_v3   eastus   1      4h1m
zhsunazure11-cdbs8-worker-eastus1-test    Running       Standard_D4s_v3   eastus   1      7m45s
zhsunazure11-cdbs8-worker-eastus2-hm9bm   Running       Standard_D4s_v3   eastus   2      4h1m
zhsunazure11-cdbs8-worker-eastus3-7j9kf   Running       Standard_D4s_v3   eastus   3      4h1m

$ oc edit machineset zhsuncpms1-7svhz-worker-eastus3         
error: machinesets.machine.openshift.io "zhsuncpms1-7svhz-worker-eastus3" could not be patched: admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.vmSize: Required value: vmSize should be set to one of the supported Azure VM sizes

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/175

Bug OCPBUGS-15308: CPMSO: fix linting issue comment in test

View the Description View the linked PRs

Description of problem:

A leftover comment in CPMSO tests is causing a linting issue.

Version-Release number of selected component (if applicable):

4.13.z, 4.14.0

How reproducible:

Always

Steps to Reproduce:

1. make lint
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/221

Bug OCPBUGS-16380: create-cluster-and-infraenv service fails on disconnected env

View the Description View the linked PRs

Description of problem:

When using a disconnected env and OPENSHIFT_INSTALL_RELEASE_IMAGE_MIRROR env var is specified, the create-cluster-and-infraenv service fails[*].
Seems that the issue happens due to a missing registries.conf in the assisted-service container, which is required for pulling the image.

[*[
create-cluster-and-infraenv[2784]: level=fatal msg="Failed to register cluster with assisted-service: command 'oc adm release info -o template --template '{{.metadata.version}}' --insecure=true quay.io/openshift-release-dev/ocp-release@sha256:3c050cb52fdd3e65c518d4999d238ec026ef724503f275377fee6bf0d33093ab --registry-config=/tmp/registry-config1560177852' exited with non-zero exit code 1: \nerror: unable to read image quay.io/openshift-release-dev/ocp-release@sha256:3c050cb52fdd3e65c518d4999d238ec026ef724503f275377fee6bf0d33093ab: Get "http://quay.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\n"

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. Add registries.conf with mirror config set to a local registry (e.g. use imageContentSources in install-config)
2. Ensure that a custom release image mirror that refers the registry is set on OPENSHIFT_INSTALL_RELEASE_IMAGE_MIRROR env var.
3. Boot the machine on a disconnected env.

Actual results:

create-cluster-and-infraenv service fails pull the release image.

Expected results:

create-cluster-and-infraenv service should finish successfully.

Additional info:

Pushed a PR to the installer for propagating registries.conf: https://github.com/openshift/installer/pull/7332

We have a workaround in the appliance by overriding the service:
https://github.com/openshift/appliance/pull/94/

https://github.com/openshift/installer/pull/7332

Bug OCPBUGS-10178: Update 4.14 operator-lifecycle-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/470

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/470

Bug OCPBUGS-11464: Availability requirement update is initially disabled on Edit PodDisruptionBudget page

View the Description View the linked PRs

Description of problem:

Availability requirement updates is disabled on Edit PDB page, also when user tries to edit, it clears the current value so that user has no idea what's the current settings

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-03-211601

How reproducible:

Always

Steps to Reproduce:

1. Goes to deployment page -> Actions -> Add PodDisruptionBudget
2. on 'Create PodDisruptionBudge' page, set following fields and hit 'Create'
Name: example-pdb
Availability requirement:  maxUnavailable: 2
3. Make sure pdb/example-pdb is successfully created
$ oc get pdb
NAME          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
example-pdb   N/A             2                 2                     99s
4. Goes to deployment page again,  Actions -> Edit PodDisruptionBudget

Actual results:

'Availability requirement' value is disabled from editing by default, when user click 'maxUnavailable', the value is set to empty(user has no idea what's the original value)

Expected results:

when editing PDB, we should load the form with current value and user should have permission to update the values by default

Additional info:

https://github.com/openshift/console/pull/12918

Bug OCPBUGS-14824: CSI Driver Operators should not update the default storageclass annotation back after customers set the default storageclass annotation to false

View the Description View the linked PRs

Description of problem:

[AWS EBS CSI Driver Operator] should not update the default storageclass annotation back after customers remove the default storageclass annotation

Version-Release number of selected component (if applicable):

Server Version: 4.14.0-0.nightly-2023-06-08-102710

How reproducible:

Always

Steps to Reproduce:

1. Install an aws openshift cluster
2. Create 6 extra storage classes(any sc is ok)
3. Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false and check all the sc are set as undefault 
4. Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=true 
5. loop step4-5 several times

Actual results:

Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false, sometimes recovered by the driver operator

Expected results:

Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false should always succeed

Additional info:

Bug OCPBUGS-15011: Upload JAR file does not work if the Cluster Samples Operator is disabled

View the Description View the linked PRs

Description of problem:
This is a clone of the doc issue ~~OCPBUGS-9162~~.

Import JAR files doesn't work if the Cluster Samples Operator is not installed. This is a common issue in disconnected clusters where the Cluster Samples Operator is disabled by default. Users should not see the JAR import option if its not working correctly.

Version-Release number of selected component (if applicable):
4.9+

How reproducible:
Always, when the samples operator is not installed

Steps to Reproduce:

Setup a cluster without samples operator or uninstall all "Java" Builder Images (ImageStreams from the openshift namespace)
Switch to the Developer perspective
Navigate to Add > Import JAR file
Upload a JAR file and press Create

Actual results:
Import doesn't work

Expected results:
The Import JAR file option should not be disabled if no "Java" Builder Image (ImageStream in the openshift namespace) is available

Additional info:

https://github.com/openshift/console/pull/12917

Bug OCPBUGS-18987: dev console, silence alert, alert state is changed from Silenced to Firing quickly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18485~~. The following is the description of the original issue:
—
Description of problem:

developer console, go to "Observe -> openshift-moniotring -> Alerts", silence Watchdog alert, at the first, the alert state is Silenced in Alerts tab, but changed to Firing quickly(the alert is silenced actually), see the attached screen shoot

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-02-132842

How reproducible:

always

Steps to Reproduce:

1. silence alert in the dev console, and check alert state in Alerts tab
2.
3.

Actual results:

alert state is changed from Silenced to Firing quickly

Expected results:

state should be Silenced

https://github.com/openshift/console/pull/13152

Bug OCPBUGS-7181: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1605

Bug OCPBUGS-19545: kube-apiserver bound to port 60000 prevented metal3-baremetal-operator from starting

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18788~~. The following is the description of the original issue:
—
Description of problem:

metal3-baremetal-operator-7ccb58f44b-xlnnd pod failed to start on the SNO baremetal dualstack cluster:

Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               34m                    default-scheduler  Successfully assigned openshift-machine-api/metal3-baremetal-operator-7ccb58f44b-xlnnd to sno.ecoresno.lab.eng.tlv2.redha
t.com
  Warning  FailedScheduling        34m                    default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are availabl
e: 1 node(s) didn't have free ports for the requested pod ports..
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(c4a8b353e3ec105d2bff2eb1670b82a0f226ac1088b739a256deb9dfae6ebe54): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-bare
metal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(9e6960899533109b02fbb569c53d7deffd1ac8185cef3d8677254f9ccf9387ff): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use

Version-Release number of selected component (if applicable):

4.14.0-rc.0

How reproducible:

so far once

Steps to Reproduce:

1. Deploy disconnected baremetal SNO node with dualstack networking with agent-based installer
2.
3.

Actual results:

metal3-baremetal-operator pod fails to start

Expected results:

metal3-baremetal-operator pod is running

Additional info:

Checking the pots on node showed it was `kube-apiserver` process bound to the port:

tcp   ESTAB      0      0                                                [::1]:60000                        [::1]:2379    users:(("kube-apiserver",pid=43687,fd=455))


After rebooting the node all pods started as expected

https://github.com/openshift/cluster-baremetal-operator/pull/366

Bug OCPBUGS-14052: Critical Alert Rules do not have runbook url

View the Description View the linked PRs

Description of problem:

Critical Alert Rules do not have runbook url

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

1. Check details of KubeSchedulerDown Alert Rule
2.
3.

Actual results:

The Alert Rule KubeSchedulerDown has Critical Severity, but does not have runbook_url annotation.

Expected results:

All Critical Alert Rules must have runbbok_url annotation

Additional info:

Critical Alerts must have a runbook, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 

The runbooks are located at github.com/openshift/runbooks

To resolve the bug, 
 - Add runbooks for the relevant Alerts at github.com/openshift/runbooks
 - Add the link to the runbook in the Alert annotation 'runbook_url'
 - Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27933

https://github.com/openshift/cluster-kube-scheduler-operator/pull/489

Bug OCPBUGS-15805: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-network-config-controller/pull/115

Bug OCPBUGS-15905: ip-reconciler removes the overlappingrangeipreservations whether the pod is alive or not

View the Description View the linked PRs

Description of problem:

The reconciler removes the overlappingrangeipreservations.whereabouts.cni.cncf.io resources whether the pod is alive or not.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create pods and check the overlappingrangeipreservations.whereabouts.cni.cncf.io resources:

$ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -A
NAMESPACE          NAME                      AGE
openshift-multus   2001-1b70-820d-4b04--13   4m53s
openshift-multus   2001-1b70-820d-4b05--13   4m49s

2. Verify that when the ip-reconciler cronjob removes the overlappingrangeipreservations.whereabouts.cni.cncf.io resources when run:

$ oc get cronjob -n openshift-multus
NAME            SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
ip-reconciler   */15 * * * *   False     0        14m             4d13h

$ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -A
No resources found

$ oc get cronjob -n openshift-multus
NAME            SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
ip-reconciler   */15 * * * *   False     0        5s              4d13h

Actual results:

The overlappingrangeipreservations.whereabouts.cni.cncf.io resources are removed for each created pod by the ip-reconciler cronjob.
The "overlapping ranges" are not used.

Expected results:

The overlappingrangeipreservations.whereabouts.cni.cncf.io should not be removed regardless of if a pod has used an IP in the overlapping ranges.

Additional info:

https://github.com/openshift/whereabouts-cni/pull/167

Bug OCPBUGS-16690: [azure] uninitialized taint couldn't be removed if user defined taints in machineset

View the Description View the linked PRs

Description of problem:

User defined taints in machineset, then scale up machineset, instance can join the cluster and Node will be Ready but pod couldn't be deployed, checked node yaml file uninitialized taint was not removed.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.Setup a cluster on Azure
2.Create a machineset with taint
      taints:
      - effect: NoSchedule
        key: mapi
        value: mapi_test
3.Check node yaml file

Actual results:

uninitialized taint still in node, but no providerID in node.
$ oc get node 
NAME                                              STATUS   ROLES                  AGE   VERSION
zhsun724-mh4dt-master-0                           Ready    control-plane,master   9h    v1.27.3+4aaeaec
zhsun724-mh4dt-master-1                           Ready    control-plane,master   9h    v1.27.3+4aaeaec
zhsun724-mh4dt-master-2                           Ready    control-plane,master   9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-8rzqw              Ready    worker                 21m   v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-additional-q58zp   Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-additional-vwwhh   Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-v7k7s              Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus22-ggxql              Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus23-zf8l5              Ready    worker                 9h    v1.27.3+4aaeaec

$ oc edit node zhsun724-mh4dt-worker-westus21-8rzqw
spec:
  taints:
  - effect: NoSchedule
    key: node.cloudprovider.kubernetes.io/uninitialized
    value: "true"
  - effect: NoSchedule
    key: mapi
    value: mapi_test

Expected results:

uninitialized taint is removed, providerID is set in node.

Additional info:

must-gather: https://drive.google.com/file/d/12ypYmHN98j9lyWCS9Dgaqq5MLpftqEkS/view?usp=sharing

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/267

Bug OCPBUGS-17251: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/586

Bug OCPBUGS-15909: not all node provisioned in e2e-metal-ipi-ovn-dualstack

View the Description View the linked PRs

It seems the e2e-metal-ipi-ovn-dualstack job is permafailing the last couple of days.
sippy link

one common symptom seems to be that some nodes are being fully provisioned.
here is an example from this job

you can see the clusteroperators are not happy and specifically machine-api is stuck in init

https://github.com/openshift/image-customization-controller/pull/92

Bug OCPBUGS-16395: UPI Installation Failure: cluster operator control-plane-machine-set is not available

View the Description View the linked PRs

Description of problem:

OCP 4.14 installation fails.

Waiting for the UPI installation to complete using the wait-for, ends with a CO error:
```
$ openshift-install wait-for install-complete --log-level=debug

level=error msg=failed to initialize the cluster: Cluster operator control-plane-machine-set is not available
```

```
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          122m    Unable to apply 4.14.0-0.nightly-2023-07-18-085740: the cluster operator control-plane-machine-set is not available
```

```
$ oc get co | grep control-plane-machine-set
control-plane-machine-set                  4.14.0-0.nightly-2023-07-18-085740   False       False         True       6h47m   Missing 3 available replica(s)
```

Version-Release number of selected component (if applicable):

Openshift on Openstack
OCP 4.14.0-0.nightly-2023-07-18-085740
RHOS-16.2-RHEL-8-20230413.n.1
UPI installation

How reproducible:

Always

Steps to Reproduce:

Run the UPI openshift installation

Actual results:

UPI installation fail

Expected results:

UPI installation pass

Additional info:

Last UPI successful installation in D/S CI used: 4.14.0-0.nightly-2023-07-05-191022
control-plane-machine-set-operator log:

$ oc logs -n openshift-machine-api control-plane-machine-set-operator-5cbb7f68cc-h5f4p | tail
E0719 14:20:52.645504       1 controller.go:649]  "msg"="Observed unmanaged control plane nodes" "error"="found unmanaged control plane nodes, the following node(s) do not have associated machines: ostest-c2drn-master-0, ostest-c2drn-master-1, ostest-c2drn-master-2" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="1984ddf9-506f-4d10-88e5-0787b305484e" "unmanagedNodes"="ostest-c2drn-master-0,ostest-c2drn-master-1,ostest-c2drn-master-2"
I0719 14:20:52.645530       1 controller.go:268]  "msg"="Cluster state is degraded. The control plane machine set will not take any action until issues have been resolved." "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="1984ddf9-506f-4d10-88e5-0787b305484e"
I0719 14:20:52.667462       1 controller.go:212]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="1984ddf9-506f-4d10-88e5-0787b305484e"
I0719 14:20:52.668013       1 controller.go:156]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce"
I0719 14:20:52.668718       1 controller.go:121]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="e80d898c-9a8d-4774-8f22-fb464be45758"
I0719 14:20:52.668780       1 controller.go:142]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="e80d898c-9a8d-4774-8f22-fb464be45758"
I0719 14:20:52.669005       1 status.go:119]  "msg"="Observed Machine Configuration" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "observedGeneration"=1 "readyReplicas"=0 "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce" "replicas"=0 "unavailableReplicas"=3 "updatedReplicas"=0
E0719 14:20:52.669237       1 controller.go:649]  "msg"="Observed unmanaged control plane nodes" "error"="found unmanaged control plane nodes, the following node(s) do not have associated machines: ostest-c2drn-master-0, ostest-c2drn-master-1, ostest-c2drn-master-2" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce" "unmanagedNodes"="ostest-c2drn-master-0,ostest-c2drn-master-1,ostest-c2drn-master-2"
I0719 14:20:52.669267       1 controller.go:268]  "msg"="Cluster state is degraded. The control plane machine set will not take any action until issues have been resolved." "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce"
I0719 14:20:52.669842       1 controller.go:212]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce"

The nodes are up:

[cloud-user@installer-host ~]$ oc get nodes
NAME                    STATUS   ROLES                  AGE     VERSION
ostest-c2drn-master-0   Ready    control-plane,master   6h55m   v1.27.3+4aaeaec
ostest-c2drn-master-1   Ready    control-plane,master   6h55m   v1.27.3+4aaeaec
ostest-c2drn-master-2   Ready    control-plane,master   6h55m   v1.27.3+4aaeaec
ostest-c2drn-worker-0   Ready    worker                 6h36m   v1.27.3+4aaeaec
ostest-c2drn-worker-1   Ready    worker                 6h35m   v1.27.3+4aaeaec
ostest-c2drn-worker-2   Ready    worker                 6h36m   v1.27.3+4aaeaec

https://github.com/openshift/installer/blob/release-4.14/upi/openstack/control-plane.yaml should be changed?

https://github.com/openshift/installer/pull/7351

Bug OCPBUGS-17234: Differ title on command-line-tools page

View the Description View the linked PRs

Description of problem:

On command-line-tools page,the title is "Command line tools" instead of "Command Line Tools"

Version-Release number of selected component (if applicable):

How reproducible:

1/1

Steps to Reproduce:

1.goto command-line-tools page
2.check the title

Actual results:

the title is "Command line tools"

Expected results:

the title should be "Command Line Tools"

Additional info:

https://github.com/openshift/console/pull/13068

Bug OCPBUGS-8676: For IPv6-primary dual-stack cluster, kubelet.service renders only single node-ip

View the Description View the linked PRs

When implementing support for IPv6-primary dual-stack clusters, we have extended the available IP families to

const (
	IPFamiliesIPv4                 IPFamiliesType = "IPv4"
	IPFamiliesIPv6                 IPFamiliesType = "IPv6"
	IPFamiliesDualStack            IPFamiliesType = "DualStack"
	IPFamiliesDualStackIPv6Primary IPFamiliesType = "DualStackIPv6Primary"
)

At the same time definitions of kubelet.service systemd unit still contain the code

{{- if eq .IPFamilies "DualStack"}}
        --node-ip=${KUBELET_NODE_IPS} \
{{- else}}
        --node-ip=${KUBELET_NODE_IP} \
{{- end}}

which only matches the "old" dual-stack family. Because of this, an IPv6-primary dual-stack renders node-ip param with only 1 IP address instead of 2 as required in dual-stack.

https://github.com/openshift/machine-config-operator/pull/3592

Bug OCPBUGS-11223: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-18094: ACM cluster dropdown shouldn't have filter and clusters title

View the Description View the linked PRs

Description of problem:

the acm dropdown has a filter and clusters title even though there are only ever 2 items in the dropdown, local cluster and all clusters. it has been reported by a customer as confusing that they can add many clusters to the dropdown.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. install acm dynamic plugin to cluster
2. open cluster dropdown
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13137

Bug OCPBUGS-19353: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1551

Bug OCPBUGS-8468: aws: mismatch between RHCOS and AWS SDK regions

View the Description View the linked PRs

Description of problem:

RHCOS is being published to new AWS regions (https://github.com/openshift/installer/pull/6861) but aws-sdk-go need to be bumped to recognize those regions

Version-Release number of selected component (if applicable):

master/4.14

How reproducible:

always

Steps to Reproduce:

1. openshift-install create install-config
2. Try to select ap-south-2 as a region
3.

Actual results:

New regions are not found. New regions are: ap-south-2, ap-southeast-4, eu-central-2, eu-south-2, me-central-1.

Expected results:

Installer supports and displays the new regions in the Survey

Additional info:

See https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/aws/regions.go#L13-L23

https://github.com/openshift/installer/pull/6943

Bug OCPBUGS-8232: `oc patch project` not working with OCP 4.12

View the Description View the linked PRs

Description of problem:

oc patch project command is failing to annotate the project

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. Run the below patch command to update the annotation on existing project
~~~
oc patch project <PROJECT_NAME> --type merge --patch '{"metadata":{"annotations":{"openshift.io/display-name": "null","openshift.io/description": "This is a new project"}}}'
~~~

Actual results:

It produces the error output below:
~~~
The Project "<PROJECT_NAME>" is invalid: * metadata.namespace: Invalid value: "<PROJECT_NAME>": field is immutable * metadata.namespace: Forbidden: not allowed on this type 
~~~

Expected results:

The `oc patch project` command should patch the project with specified annotation.

Additional info:

Tried to patch the project with OCP 4.11.26 version, and it worked as expected.
~~~
oc patch project <PROJECT_NAME> --type merge --patch '{"metadata":{"annotations":{"openshift.io/display-name": "null","openshift.io/description": "New project"}}}'

project.project.openshift.io/<PROJECT_NAME> patched
~~~

The issue is with OCP 4.12, where it is not working.

https://github.com/openshift/openshift-apiserver/pull/356

Bug OCPBUGS-10926: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/513

Bug OCPBUGS-11284: Azure cloud node manager stopped applying beta topology labels

View the Description View the linked PRs

Description of problem:

When we rebased to 1.26, the rebase picked up https://github.com/kubernetes-sigs/cloud-provider-azure/pull/2653/ which made the Azure cloud node manager stop applying beta toplogy labels, such as failure-domain.beta.kubernetes.io/zone

Since we haven't completed the removal cycle for this, we still need the node manager to apply these labels. In the future we must ensure that these labels are available until users are no longer using them.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create a TP cluster on 4.13
2. Observe no beta label for zone or region
3.

Actual results:

Beta labels are not present

Expected results:

Beta labels are present and should match GA labels

Additional info:

Created https://github.com/kubernetes-sigs/cloud-provider-azure/pull/3685 to try and make upstream allow this to be flagged

Bug OCPBUGS-15994: When config-image is used console password is not accepted

View the Description View the linked PRs

Description of problem:

When the configuration is installed with the config-image,
the kubeadmin-password it not accepted to log into the console.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Build and install unconfigured ignition
2. Build and install config-image
3. When able to ssh into host0, attempt to log into console using the core user and generated kubeadmin-password.

Actual results:

The login fails.

Expected results:

The login should succeed.

Additional info:

https://github.com/openshift/installer/pull/7338

Bug OCPBUGS-17347: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13058

Bug OCPBUGS-6727: Nutanix: Hostname of the VM is not set when using DHCP network config

View the Description View the linked PRs

Description of problem:

When creating an OCP cluster with Nutanix infrastructure and using DHCP instead of IPAM network config, the Hostname of the VM is not set by DHCP. In these case we need to inject the desired hostname through cloud-init for both control-plane and worker nodes.

Version-Release number of selected component (if applicable):

How reproducible:

Reproducible when creating an OCP cluster with Nutanix infrastructure and using DHCP instead of IPAM network config.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-15558: [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured fails because of timeout issue

View the Description View the linked PRs

Description of problem:

The aforementioned test in the e2e origin test suite sometimes fails because it can't connect to the API endpoint.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Sometimes

Steps to Reproduce:

1. See https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade/1673703516675248128
2.
3.

Actual results:

The test failed.

Expected results:

The test should retry a couple of times with a delay when it didn't get an HTTP response from the endpoint (e.g. connection issue).

Additional info:

https://github.com/openshift/origin/pull/28010

Bug OCPBUGS-7966: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6991

Bug OCPBUGS-19697: [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18137~~. The following is the description of the original issue:
—
Description of problem:

When a workload includes a node selector term on the label kubernetes.io/arch and the allowed values do not include amd64, the auto scaler does not trigger the scale out of a valid, non-amd64, machine set if its current replicas are 0 and (for 4.14+) no architecture capacity annotation is set (ref ~~MIXEDARCH-129~~).

The issue is due to https://github.com/openshift/kubernetes-autoscaler/blob/f0ceeacfca57014d07f53211a034641d52d85cfd/cluster-autoscaler/cloudprovider/utils.go#L33

This bug should be considered at first on clusters having the same architecture for the control plane and the data plane.

In the case of multi-arch compute clusters, there is probably no alternative than letting the capacity annotation to be properly set in the machine set either manually or by the cloud provider actuator, as already discussed in the ~~MIXEDARCH-129~~ works, otherwise relying to the control plane architecture.

Version-Release number of selected component (if applicable):

- ARM64 IPI on GCP 4.14
- ARM64 IPI on Aws and Azure <=4.13
- In general, non-amd64 single-arch clusters supporting autoscale from 0

How reproducible:

Always

Steps to Reproduce:

1. Create an arm64 IPI cluster on GCP
2. Set one of the machinesets to have 0 replicas: 
    oc scale -n openshift-machine-api machineset/adistefa-a1-zn8pg-worker-f
3. Deploy the default autoscaler
4. Deploy the machine autoscaler for the given machineset
5. Deploy a workload with node affinity to arm64 only nodes, large resource requests and enough number of replicas.

Actual results:

From the pod events: 

pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector

Expected results:

The cluster autoscaler scales the machineset with 0 replicas in order to provide resources for the pending pods.

Additional info:

---
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec: {}
---
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: worker-us-east-1a
  namespace: openshift-machine-api
spec:
  minReplicas: 0
  maxReplicas: 12
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: adistefa-a1-zn8pg-worker-f
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: openshift-machine-api
  name: 'my-deployment'
  annotations: {}
spec:
  selector:
    matchLabels:
      app: name
  replicas: 3
  template:
    metadata:
      labels:
        app: name
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/arch
                  operator: In
                  values:
                    - "arm64"
      containers:
        - name: container
          image: >-
            image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest
          ports:
            - containerPort: 8080
              protocol: TCP
          env: []
          resources:
              requests:
                cpu: "2"
      imagePullSecrets: []
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
  paused: false

Bug OCPBUGS-8299: CronJobs table/details UI doesn't have Suspend indication

View the Description View the linked PRs

Description of problem:

Dev sandbox - CronJobs table/details UI doesn't have Suspend indication

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Create sample CronJob with either @daily or @hourly as schedule
2. Navigate to Administrator/Workloads/CronJobs area
3. Observe that table with CronJobs contain your created entry, but no column with Suspend True/False indication
4. Navigate into that same cron job details - still no presence of Suspend state
5. Then invoke 'oc get cj' command and example output could be:
NAME      SCHEDULE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
example   @hourly    True      0        24m             34m

where you could see separate SUSPEND column

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12638

Task MGMT-14315: Allow install P and Z architectures with Single Node Openshift

View the Description View the linked PRs

Make SNO dev-preview on 4.13 for P and Z

https://github.com/openshift/assisted-service/pull/5147

Task HOSTEDCP-1112: Create Config File to Control Frequency of RHTAP PRs

View the Description View the linked PRs

As a HyperShift developer, I would like a config file created to control the creation frequency of RHTAP PRs so that the HyperShift repo & CI is not inundated with RHTAP PRs.

https://github.com/openshift/hypershift/pull/2838

Bug OCPBUGS-13956: Use non alpha controller-runtime on Machine API Operator

View the Description View the linked PRs

Description of problem:

At moment we are using an alpha version of controller-runtime on the machine-api-operator.
Now that controller-runtime v0.15.0 is out, we want to bump to it.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-operator/pull/1145

Bug OCPBUGS-17472: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/389

Bug OCPBUGS-13656: MCO dropping new kubelet configuration field

View the Description View the linked PRs

Description of problem

oc adm node-logs feature has been upstreamed and is part of k8s 1.27. This resulted in the addition kubelet configuration enableSystemLogQuery to enable the feature. This feature has been enabled in the base kubelet configs in MCO. However in situations where TechPreview is enabled, it causes MCO to generate a kubelet configuration that overwrites the default and when it does this, the unmarshal and marshal cycle drops the field it is not aware of. This is because MCO currently vendors in k8s.io/kubelet at v0.25.1 and can be fixed by vendoring in v0.27.1

How reproducible:always

Steps to Reproduce:

1. Bring up a 4.14 cluster with TechPreview enabled
2. Run oc adm node-logs
3.

Actual results:

Command returns "<a href="ec274df5b608cc7a149ece1ce673306c/">ec274df5b608cc7a149ece1ce673306c/</a>" which is the contents of /var/log/journal

Expected results:

Should return journal logs from the node

Additional info

I took a quick cut of updating the OpenShift and k8s APIs to 1.27. Running into the following during make verify:

cmd/machine-config-controller/start.go:18:2: could not import github.com/openshift/machine-config-operator/pkg/controller/template (-: # github.com/openshift/machine-config-operator/pkg/controller/template
pkg/controller/template/render.go:396:91: cannot use cfg.FeatureGate (variable of type *"github.com/openshift/api/config/v1".FeatureGate) as featuregates.FeatureGateAccess value in argument to cloudprovider.IsCloudProviderExternal: *"github.com/openshift/api/config/v1".FeatureGate does not implement featuregates.FeatureGateAccess (missing method AreInitialFeatureGatesObserved)
pkg/controller/template/render.go:441:90: cannot use cfg.FeatureGate (variable of type *"github.com/openshift/api/config/v1".FeatureGate) as featuregates.FeatureGateAccess value in argument to cloudprovider.IsCloudProviderExternal: *"github.com/openshift/api/config/v1".FeatureGate does not implement featuregates.FeatureGateAccess (missing method AreInitialFeatureGatesObserved)) (typecheck)
        "github.com/openshift/machine-config-operator/pkg/controller/template"
        ^

Here are some examples of how other operators have handled this.

This is a critical bug as oc adm node-logs runs as part of must-gather and debugging node issues with TechPreview jobs in CI is impossible without this working.

https://github.com/openshift/machine-config-operator/pull/3735

Bug OCPBUGS-14602: selected project was not taking effect when searching InstallPlans

View the Description View the linked PRs

Description of problem:

when searching InstallPlans with specific project selected, still all IPs are listed, the selected project is not applied in filter

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-05-112833

How reproducible:

Always

Steps to Reproduce:

1. Install some operators to specific namespace and all namespaces
$ oc get ip -A
NAMESPACE             NAME            CSV                                 APPROVAL    APPROVED
default               install-tftg4   etcdoperator.v0.9.4                 Automatic   true
openshift-operators   install-5g2l4   3scale-community-operator.v0.10.1   Automatic   true
$ oc get sub -A
NAMESPACE             NAME                        PACKAGE                     SOURCE                CHANNEL
default               etcd                        etcd                        community-operators   singlenamespace-alpha
openshift-operators   3scale-community-operator   3scale-community-operator   community-operators   threescale-2.13  
2. navigates to Home -> Search page, select project 'default' in project dropdown, choose 'InstallPlan' resource
3. check the filtered lists

Actual results:

3. InstallPlans in all namespaces are listed

Expected results:

3. only the InstallPlan in 'default' project should be listed

Additional info:

https://github.com/openshift/console/pull/12880

Bug OCPBUGS-15776: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1497

Bug OCPBUGS-19063: Catalog pods in hypershift control plane in ImagePullBackOff

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18720~~. The following is the description of the original issue:
—
Description of problem:

Catalog pods in hypershift control plane in ImagePullBackOff

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a cluster in 4.14 HO + OCP 4.14.0-0.ci-2023-09-07-120503
2. Check controlplane pods, catalog pods in control plane namespace in ImagePullBackOff
3.

Actual results:

jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep catalog catalog-operator-64fd787d9c-98wx5                     2/2     Running            0          2m43s 
certified-operators-catalog-7766fc5b8-4s66z           0/1     ImagePullBackOff   0          2m43s 
community-operators-catalog-847cdbff6-wsf74           0/1     ImagePullBackOff   0          2m43s 
redhat-marketplace-catalog-fccc6bbb5-2d5x4            0/1     ImagePullBackOff   0          2m43s 
redhat-operators-catalog-86b6f66d5d-mpdsc             0/1     ImagePullBackOff   0          2m43s

Events:   Type     Reason          Age                 From               Message   ----     ------          ----                ----               -------   Normal   Scheduled       65m                 default-scheduler  Successfully assigned clusters-jie-test/certified-operators-catalog-7766fc5b8-4s66z to ip-10-0-64-135.us-east-2.compute.internal   Normal   AddedInterface  65m                 multus             Add eth0 [10.128.2.141/23] from openshift-sdn   Normal   Pulling         63m (x4 over 65m)   kubelet            Pulling image "from:imagestream"   Warning  Failed          63m (x4 over 65m)   kubelet            Failed to pull image "from:imagestream": rpc error: code = Unknown desc = reading manifest imagestream in docker.io/library/from: requested access to the resource is denied   Warning  Failed          63m (x4 over 65m)   kubelet            Error: ErrImagePull   Warning  Failed          63m (x6 over 65m)   kubelet            Error: ImagePullBackOff   Normal   BackOff         9s (x280 over 65m)  kubelet            Back-off pulling image "from:imagestream" jiezhao-mac:hypershift jiezhao$

Expected results:

catalog pods are running

Additional info:

slack:
https://redhat-internal.slack.com/archives/C01C8502FMM/p1694170060144859

https://github.com/openshift/hypershift/pull/3016

Bug OCPBUGS-6586: Timeout is too short for 'oc idle' tests

View the Description View the linked PRs

Description of problem:

Running the following tests using Openshift on Openstack with Kuryr
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by all [Suite:openshift/conformance/parallel]"
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by checking previous scale [Suite:openshift/conformance/parallel]"
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by label [Suite:openshift/conformance/parallel]"
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by name [Suite:openshift/conformance/parallel]"

Fails waiting for endpoints
STEP: wait until endpoint addresses are scaled to 2 01/21/23 01:16:42.024
Jan 21 01:16:42.025: INFO: Running 'oc --namespace=e2e-test-oc-idle-h2mvt --kubeconfig=/tmp/configfile3007731725 get endpoints idling-echo --template={{ len (index .subsets 0).addresses }} --output=go-template'
Jan 21 01:16:42.158: INFO: Error running /usr/local/bin/oc --namespace=e2e-test-oc-idle-h2mvt --kubeconfig=/tmp/configfile3007731725 get endpoints idling-echo --template={{ len (index .subsets 0).addresses }} --output=go-template:
StdOut>
Error executing template: template: output:1:8: executing "output" at <index .subsets 0>: error calling index: index of untyped nil. Printing more information for debugging the template:
    template was:
        {{ len (index .subsets 0).addresses }}
    raw data was:
        {"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{"endpoints.kubernetes.io/last-change-trigger-time":"2023-01-21T01:16:40Z"},"creationTimestamp":"2023-01-21T01:16:40Z","labels":{"app":"idling-echo"},"managedFields":[{"apiVersion":"v1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:endpoints.kubernetes.io/last-change-trigger-time":{}},"f:labels":{".":{},"f:app":{}}}},"manager":"kube-controller-manager","operation":"Update","time":"2023-01-21T01:16:40Z"}],"name":"idling-echo","namespace":"e2e-test-oc-idle-h2mvt","resourceVersion":"409973","uid":"91cd122e-b418-4e29-98c6-2ff757c74a15"}}
    object given to template engine was:
        map[apiVersion:v1 kind:Endpoints metadata:map[annotations:map[endpoints.kubernetes.io/last-change-trigger-time:2023-01-21T01:16:40Z] creationTimestamp:2023-01-21T01:16:40Z labels:map[app:idling-echo] managedFields:[map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:metadata:map[f:annotations:map[.:map[] f:endpoints.kubernetes.io/last-change-trigger-time:map[]] f:labels:map[.:map[] f:app:map[]]]] manager:kube-controller-manager operation:Update time:2023-01-21T01:16:40Z]] name:idling-echo namespace:e2e-test-oc-idle-h2mvt resourceVersion:409973 uid:91cd122e-b418-4e29-98c6-2ff757c74a15]]

When using 60 seconds in PollImmediate instead of 30 the tests pass.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-01-19-110743

How reproducible:

Consistently

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27913

Story HOSTEDCP-951: Let Install apply to continue and aggregate errors

View the Description View the linked PRs

DoD:

Currently we return early if we fail to apply a resource during installation https://github.com/openshift/hypershift/blob/main/cmd/install/install.go#L248

There's no reason why we wouldn't keep going, aggregate errors and return at the end.

It might help for scenarios where one broken CR prevent everything else from being installed, e.g.

https://redhat-internal.slack.com/archives/C02LM9FABFW/p1680599409023509?thread_ts=1680589848.540709&cid=C02LM9FABFW

https://github.com/openshift/hypershift/pull/2372

Bug OCPBUGS-12775: Update Cluster Sample Operator dependencies and libraries for OCP 4.14

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.13. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/500

Bug OCPBUGS-14795: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/391

Bug OCPBUGS-16070: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/356

Bug MGMT-14530: [Staging] - Events search should not be case sensitive

View the Description View the linked PRs

Description of the problem:

Events search should not be case sensitive

How reproducible:

100%

Steps to reproduce:

1. On UI View Cluster Events

2. Enter text on "Filter by text" field. (eg. "success" or "Success" )

Actual results:

Events filter is case sensitive.

See screenshots enclosed

Expected results:

Events filter should not be case sensitive

https://github.com/openshift/assisted-service/pull/5194

Bug OCPBUGS-6661: CRL configmap is limited by 1MB max, not allowing for multiple public CRLS.

View the Description View the linked PRs

Description of problem:

CRL list is capped at 1MB due to configmap max size. If multiple public CRLs are needed for ingress controller the CRL pem file will be over 1MB.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create CRL configmap with the following distribution points: 

         Issuer: C=US, O=DigiCert Inc, CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
         Subject: SOME SIGNED CERT            X509v3 CRL Distribution Points: 
                Full Name:
                  URI:http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-2.cr  
       
      
# curl -o DigiCertGlobalG2TLSRSASHA2562020CA1-2.crl http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-2.crl
# openssl crl -in  DigiCertGlobalG2TLSRSASHA2562020CA1-2.crl -inform DER -out  DigiCertGlobalG2TLSRSASHA2562020CA1-2.pem 
# du -bsh DigiCertGlobalG2TLSRSASHA2562020CA1-2.pem 
604K    DigiCertGlobalG2TLSRSASHA2562020CA1-2.pem


I still need to find more intermediate CRLS to grow this.

Actual results:

2023-01-25T13:45:01.443Z ERROR operator.init controller/controller.go:273 Reconciler error {"controller": "crl", "object": {"name":"custom","namespace":"openshift-ingress-operator"}, "namespace": "openshift-ingress-operator", "name": "custom", "reconcileID": "d49d9b96-d509-4562-b3d9-d4fc315226c0", "error": "failed to ensure client CA CRL configmap for ingresscontroller openshift-ingress-operator/custom: failed to update configmap: ConfigMap \"router-client-ca-crl-custom\" is invalid: []: Too long: must have at most 1048576 bytes"}

Expected results:

First be able to create a configmap where data only accounted to the 1MB max (see additional info below for more details), second some way to compress or allow a large CRL list that would be larger than 1MB

Additional info:

Only using this CRL and it being only 600K still causes issue and it could be due to  the `last-applied-configuration` annotation on the configmap. This is added since we do an apply operation (update) on the configmap. I am not sure if this is counting towards the 1MB max. 

https://github.com/openshift/cluster-ingress-operator/blob/release-4.10/pkg/operator/controller/crl/crl_configmap.go#L295 

Not sure if we could just replace the configmap.

Bug OCPBUGS-7906: hostpath and node-driver-registrar containers are not pinned to mgmt cores - no WLP annotation

View the Description View the linked PRs

Description of problem:

node-driver-registrar and hostpath containers in pod shared-resource-csi-driver-node-xxxxx under openshift-cluster-csi-drivers namespace are not pinned to reserved management cores.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Deploy SNO via ZTP with workload partitioning enabled
2. Check mgmt pods affinity
3.

Actual results:

pods do not have workload partitioning annotation, and are not pinned to mgmt cores

Expected results:

All management pods should be pinned to reserved cores

Pod should be annotated with: target.workload.openshift.io/management: '{"effect":"PreferredDuringScheduling"}'

Additional info:

pod metadata

metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["fd01:0:0:1::5f/64"],"mac_address":"0a:58:97:51:ad:31","gateway_ips":["fd01:0:0:1::1"],"ip_address":"fd01:0:0:1::5f/64","gateway_ip":"fd01:0:0:1::1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "fd01:0:0:1::5f"
          ],
          "mac": "0a:58:97:51:ad:31",
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "fd01:0:0:1::5f"
          ],
          "mac": "0a:58:97:51:ad:31",
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: privileged
/var/lib/jenkins/workspace/ocp-far-edge-vran-deployment/cnf-gotests/test/ran/workloadpartitioning/tests/workload_partitioning.go:113


SNO management workload partitioning [It] should have management pods pinned to reserved cpus
/var/lib/jenkins/workspace/ocp-far-edge-vran-deployment/cnf-gotests/test/ran/workloadpartitioning/tests/workload_partitioning.go:113

  [FAILED] Expected
      <[]ranwphelper.ContainerInfo | len:3, cap:4>: [
          {
              Name: "hostpath",
              Cpus: "2-55,58-111",
              Namespace: "openshift-cluster-csi-drivers",
              PodName: "shared-resource-csi-driver-node-vzvtc",
              Shares: 10,
              Pid: 41650,
          },
          {
              Name: "cluster-proxy-service-proxy",
              Cpus: "2-55,58-111",
              Namespace: "open-cluster-management-agent-addon",
              PodName: "cluster-proxy-service-proxy-66599b78bf-k2dvr",
              Shares: 2,
              Pid: 35093,
          },
          {
              Name: "node-driver-registrar",
              Cpus: "2-55,58-111",
              Namespace: "openshift-cluster-csi-drivers",
              PodName: "shared-resource-csi-driver-node-vzvtc",
              Shares: 10,
              Pid: 34782,
          },
      ]
  to be empty
  In [It] at: /var/lib/jenkins/workspace/ocp-far-edge-vran-deployment/cnf-gotests/test/ran/workloadpartitioning/ranwphelper/ranwphelper.go:172 @ 02/22/23 01:05:00.268

cluster-proxy-service-proxy is reported in https://issues.redhat.com/browse/OCPBUGS-7652

https://github.com/openshift/csi-driver-shared-resource-operator/pull/72

Story CONSOLE-3623: Enable getting X-CSRF token for plugins

View the Description View the linked PRs

X-CSRF token is currently added automatically for any request using `coFetch` functions. In some cases, plugins would like to use their own functions/libs like axios. Console should enable retrieving the X-CSRF token

Acceptance Criteria:

Dynamic plugin can retrieve X-CSRF token via their own functions (axios)

https://github.com/openshift/console/pull/12719

Bug OCPBUGS-13965: Bump cluster-dns-operator to k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/cluster-dns-operator vendors Kubernetes 1.26 packages. OpenShift 4.14 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/cluster-dns-operator/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.
controller-runtime will need to be bumped to v0.15.0 as well

https://github.com/openshift/cluster-dns-operator/pull/368

Bug OCPBUGS-17907: Revert adding the "gather_sno" script to MG

View the Description View the linked PRs

Description of problem:

accidentally merged before fully reviewed

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/must-gather/pull/376

Bug OCPBUGS-19061: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28260

Bug HOSTEDCP-1073: CPO does not respect CVO runlevels

View the Description View the linked PRs

The CPO does not currently respect the CVO runlevels as standalone OCP does.

The CPO reconciles everything all at once during upgrades which is resulting in FeatureSet aware components trying to start because the FeatureSet status is set for that version, leading to pod restarts.

It should roll things out in the following order for both initial install and upgrade, waiting between stages until rollout is complete:

etcd
kas
kcm and ks
everything else

https://github.com/openshift/hypershift/pull/2726

Bug OCPBUGS-13080: Root device hints should accept by-path device alias

View the Description View the linked PRs

In many cases, the /dev/disk/by-path symlink is the only way to stably identify a disk without having prior knowledge of the hardware from some external source (e.g. a spreadsheet of disk serial numbers). It should be possible to specify this path in the root device hints.
This is fixed by the first commit in the upstream Metal³ PR https://github.com/metal3-io/baremetal-operator/pull/1264

https://github.com/openshift/baremetal-operator/pull/276

Bug OCPBUGS-16292: [gcp] questions about "compute.platform.gcp.serviceAccount"

View the Description View the linked PRs

Description of problem:

The usage of "compute.platform.gcp.serviceAccount" needs to be clarified, and also the installation failure.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-16-230237

How reproducible:

Always

Steps to Reproduce:

1. "openshift-install explain installconfig.compute.platform.gcp.serviceAccount"
2. "create cluster" with an existing install-config having the field configured

Actual results:

1. It tells "The provided service account will be attached to control-plane nodes...", although the field is under compute.platform.gcp.
2. The installation failed on creating install config, with error "service accounts only valid for master nodes, provided for worker nodes".

Expected results:

1. shall "explain" command tell the field "serviceAccount" under "installconfig.compute.platform.gcp"?
2. please clarify how "compute.platform.gcp.serviceAccount" should be used

Additional info:

FYI the corresponding PR: https://github.com/openshift/installer/pull/7308

$ openshift-install version
openshift-install 4.14.0-0.nightly-2023-07-16-230237
built from commit c2d7db9d4eedf7b79fcf975f3cbd8042542982ca
release image registry.ci.openshift.org/ocp/release@sha256:e31716b6f12a81066c78362c2f36b9f18ad51c9768bdc894d596cf5b0f689681
release architecture amd64
$ openshift-install explain installconfig.compute.platform.gcp.serviceAccount
KIND:     InstallConfig
VERSION:  v1RESOURCE: <string>
  ServiceAccount is the email of a gcp service account to be used for shared vpn installations. The provided service account will be attached to control-plane nodes in order to provide the permissions required by the cloud provider in the host project.

$ openshift-install explain installconfig.controlPlane.platform.gcp.serviceAccount
KIND:     InstallConfig
VERSION:  v1RESOURCE: <string>
  ServiceAccount is the email of a gcp service account to be used for shared vpn installations. The provided service account will be attached to control-plane nodes in order to provide the permissions required by the cloud provider in the host project.

$ yq-3.3.0 r test2/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  computeSubnet: installer-shared-vpc-subnet-2
  controlPlaneSubnet: installer-shared-vpc-subnet-1
  network: installer-shared-vpc
  networkProjectID: openshift-qe-shared-vpc
$ yq-3.3.0 r test2/install-config.yaml credentialsMode
Passthrough
$ yq-3.3.0 r test2/install-config.yaml baseDomain
qe1.gcp.devcluster.openshift.com
$ yq-3.3.0 r test2/install-config.yaml metadata
creationTimestamp: null
name: jiwei-0718b
$ yq-3.3.0 r test2/install-config.yaml compute
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    gcp:
      ServiceAccount: ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com
      tags:
      - preserved-ipi-xpn-compute
  replicas: 2
$ yq-3.3.0 r test2/install-config.yaml controlPlane
architecture: amd64
hyperthreading: Enabled
name: master
platform:
  gcp:
    ServiceAccount: ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com
    tags:
    - preserved-ipi-xpn-control-plane
replicas: 3
$ openshift-install create cluster --dir test2
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: compute[0].platform.gcp.serviceAccount: Invalid value: "ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com": service accounts only valid for master nodes, provided for worker nodes 
$

https://github.com/openshift/installer/pull/7347

Bug TRT-1202: Hypershift 4.15 ci jobs failing

View the Description View the linked PRs

Failing payload https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1693191532644929536

due to

kubeconfig didn't become available: timed out waiting for the condition

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn

Bug OCPBUGS-11668: Installed Operators page only lets project admin delete CSV

View the Description View the linked PRs

Description of problem:

When listing installed operators, we attempt to list subscriptions in all namespaces in order to associate subscriptions/csvs. This prevents users without cluster scope list priveleges from seeing subscriptions on this page, which makes the uninstall action unavailable.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install an namespaced operator
2. Log in as a user with project admin permissions where the operator was installed
3. Visit the installed operators page
4. Click the kebab menu for the operator from step 1

Actual results:

The only action available is to delete the CSV

Expected results:

The "Uninstall Operator" and "Edit Subscriptions" actions should show since the user has permission to view, edit, delete Subscription resources in this namespace.

Additional info:

https://github.com/openshift/console/pull/12822

Bug OCPBUGS-14076: PowerVS: Remove ClusterOSImage

View the Description View the linked PRs

Description of problem:



Remove changing the image name for a MachineSet if ClusterOSImage is set

Terraform has already created an image bucket based on OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
for us. So worker nodes should not use OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE directly and instead use the image bucket.

Version-Release number of selected component (if applicable):

current master branch

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6996

Bug OCPBUGS-7267: [AUTH-262 epic story] [Enhancement] Modify the PSa pod extractor to mutate pod controller pod specs

View the Description View the linked PRs

Description of problem:

When creating a pod controller (e.g. deployment) with pod spec that will be mutated by SCCs, the users might still get a warning about the pod not meeting given namespace pod security level.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. create a namespace with restricted PSa warning level (the default)
2. create a deployment with a pod with an empty security context

Actual results:

You get a warning about the deployment's pod not meeting the NS's pod security admission requirements.

Expected results:

No warning if the pod for the deployment would be properly mutated by SCCs in order to fulfill the NS's pod security requirements.

Additional info:

originally implemented as a part of https://issues.redhat.com/browse/AUTH-337

https://github.com/openshift/kubernetes/pull/1482

Bug OCPBUGS-19375: [4.14] Multus security hardening: per node certification

View the Description View the linked PRs

https://github.com/openshift/multus-cni/pull/184

https://github.com/openshift/multus-cni/pull/185

Bug OCPBUGS-7091: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug AGENT-678: Concurrency issue in agent integration tests

View the Description View the linked PRs

The agent integration tests is failing with different errors when run multiple times locally:

Local Run 1:

level=fatal msg=failed to fetch Agent Installer PXE Files: failed to fetch dependency of "Agent Installer PXE Files": failed to generate asset "Agent Installer Artifacts": lstat /home/rwsu/.cache/agent/files_cache/libnmstate.so.2: no such file or directory
[exit status 1]
FAIL: testdata/agent/pxe/configurations/sno.txt:3: unexpected command failure

Local Run 2:

level=fatal msg=failed to fetch Agent Installer PXE Files: failed to fetch dependency of "Agent Installer PXE Files": failed to generate asset "Agent Installer Artifacts": file /usr/bin/agent-tui was not found
[exit status 1]
FAIL: testdata/agent/pxe/configurations/sno.txt:3: unexpected command failure

In the [CI|https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/7299/pull-ci-openshift-installer-master-agent-integration-tests/1677347591739674624,] it has failed in this PR multiple times with this error:

level=fatal msg=failed to fetch Agent Installer PXE Files: failed to fetch dependency of "Agent Installer PXE Files": failed to generate asset "Agent Installer Artifacts": lstat /.cache/agent/files_cache/agent-tui: no such file or directory   32  [exit status 1]   33  FAIL: testdata/agent/pxe/configurations/sno.txt:3: unexpected command failure

I believe the issue is the integration tests are running in parallel, and the extractFileFromImage function in pkg/asset/agent/image/oc.go problematic because the cache is being cleared and then files extracted to the same path. When the tests run in parallel, another test could clear the cached files and when the current test tries to read the file from the cached directory, it has disappeared.

Adding

-parallel 1

to ./hack/go-integration-test.sh eliminates the errors, so that why I think it is an concurrency issue.

https://github.com/openshift/installer/pull/7303

Bug OCPBUGS-18874: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1998

Bug OCPBUGS-4998: wait-for command doesn't handle installing-pending-user-action

View the Description View the linked PRs

If the cluster enters the installing-pending-user-action state in assisted-service, it will not recover absent user action.
One way to reproduce this is to have the wrong boot order set in the host, so that it reboots into the agent ISO again instead of the installed CoreOS on disk. (I managed this in dev-scripts by setting a root device hint that pointed to a secondary disk, and only creating that disk once the VM was up. This does not add the new disk to the boot order list, and even if you set it manually it does not take effect until after a full shutdown of the VM - the soft reboot doesn't count.)

Currently we report:

cluster has stopped installing... working to recover installation

in a loop. This is not accurate (unlike in e.g. the install-failed state) - it cannot be recovered automatically.

Also we should only report this, or any other, status once when the status changes, and not continuously in a loop.

https://github.com/openshift/installer/pull/7060

Bug OCPBUGS-8694: Install failed with External platform type

View the Description View the linked PRs

Description of problem:

Install failed with External platform type

Version-Release number of selected component (if applicable):

4.14.0-0.ci-2023-03-07-170635
as there is no available 4.14 nightly build, so use the ci build

How reproducible:

Always

Steps to Reproduce:

1.Set up a UPI vsphere cluster with platform set to External

2.Install failed

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion               
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          141m    Unable to apply 4.14.0-0.ci-2023-03-07-170635: the cluster operator cloud-controller-manager is not available
liuhuali@Lius-MacBook-Pro huali-test % oc get co                           
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.ci-2023-03-07-170635   True        False         False      118m    
baremetal                                  4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
cloud-controller-manager                   4.14.0-0.ci-2023-03-07-170635                                                
cloud-credential                           4.14.0-0.ci-2023-03-07-170635   True        False         False      140m    
cluster-autoscaler                         4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
config-operator                            4.14.0-0.ci-2023-03-07-170635   True        False         False      139m    
console                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      124m    
control-plane-machine-set                  4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
csi-snapshot-controller                    4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
dns                                        4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
etcd                                       4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
image-registry                             4.14.0-0.ci-2023-03-07-170635   True        False         False      127m    
ingress                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      126m    
insights                                   4.14.0-0.ci-2023-03-07-170635   True        False         False      132m    
kube-apiserver                             4.14.0-0.ci-2023-03-07-170635   True        False         False      134m    
kube-controller-manager                    4.14.0-0.ci-2023-03-07-170635   True        False         False      136m    
kube-scheduler                             4.14.0-0.ci-2023-03-07-170635   True        False         False      135m    
kube-storage-version-migrator              4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
machine-api                                4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
machine-approver                           4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
machine-config                             4.14.0-0.ci-2023-03-07-170635   True        False         False      136m    
marketplace                                4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
monitoring                                 4.14.0-0.ci-2023-03-07-170635   True        False         False      124m    
network                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      139m    
node-tuning                                4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
openshift-apiserver                        4.14.0-0.ci-2023-03-07-170635   True        False         False      132m    
openshift-controller-manager               4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
openshift-samples                          4.14.0-0.ci-2023-03-07-170635   True        False         False      131m    
operator-lifecycle-manager                 4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
operator-lifecycle-manager-catalog         4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
operator-lifecycle-manager-packageserver   4.14.0-0.ci-2023-03-07-170635   True        False         False      132m    
service-ca                                 4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
storage                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
liuhuali@Lius-MacBook-Pro huali-test % oc get infrastructure cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-03-08T07:46:07Z"
  generation: 1
  name: cluster
  resourceVersion: "527"
  uid: 096a54bc-8a35-4071-b750-cfac439c1916
spec:
  cloudConfig:
    name: ""
  platformSpec:
    external:
      platformName: vSphere
    type: External
status:
  apiServerInternalURI: https://api-int.huliu-vs8x.qe.devcluster.openshift.com:6443
  apiServerURL: https://api.huliu-vs8x.qe.devcluster.openshift.com:6443
  controlPlaneTopology: HighlyAvailable
  etcdDiscoveryDomain: ""
  infrastructureName: huliu-vs8x-fk79b
  infrastructureTopology: HighlyAvailable
  platform: External
  platformStatus:
    external: {}
    type: External
liuhuali@Lius-MacBook-Pro huali-test %

Actual results:

Install failed. the cluster operator cloud-controller-manager is not available

Expected results:

Install successfully

Additional info:

This if for testing https://issues.redhat.com/browse/OCPCLOUD-1772

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/233

Task MGMT-13586: Assisted installer should wait for etcd to remove bootstrap

View the Description View the linked PRs

Currently assisted installer doesn't verify that etcd is ok before reboot on the bootstrap node as wait_for_ceo in bootkube does nothing.

In 4.13 and backported to 4.12 etcd team had added status that we can check in assisted installer in order to decide if it is safe to reboot bootstrap or not. We should check it before running shutdown command.

Eran Cohen Rom Freiman

https://github.com/openshift/assisted-installer/pull/670

Task MGMT-15424: Parametrize envoy configmap name

View the Description View the linked PRs

We want to parametrize envoy configmap name: with that, we can configure a private envoy configuration that would bring the following advantages:

private infra details
changing envoy config can be done with app-interface MR only

https://github.com/openshift/assisted-service/pull/5411

Bug OCPBUGS-12325: Update 4.14 ose-ibm-vpc-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/36

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/37

Bug OCPBUGS-14875: Helm Chart installation form hangs on create if JSON-schema contains unknown value format

View the Description View the linked PRs

Description of problem:

If a JSON schema used in by a chart contains unknown value format (non-standard JSON Schema but valid in OpenAPI spec for example), Helm form view hangs on validation and stays in "submitting" state.

As per JSON Schema standard the "format" keyword should only take an advisory role (like an annotation) and should not affect validation.

https://json-schema.org/understanding-json-schema/reference/string.html#format

Version-Release number of selected component (if applicable):

Verified against 4.13, but probably applies to others.

How reproducible:

100%

Steps to Reproduce:

1. Go to Helm tab.
2. Click create in top right and select Repository
3. Paste following into YAML view and click Create:

apiVersion: helm.openshift.io/v1beta1
kind: ProjectHelmChartRepository
metadata:
  name: reproducer
spec:
  connectionConfig:
    url: 'https://raw.githubusercontent.com/tumido/helm-backstage/repo-multi-schema2'

4. Go to the Helm tab again (if redirected elsewhere)
5. Click create in top right and select Helm Release
6. In catalog filter select Chart repositories: Reproducer
7. Click on the single tile available (Backstage) and click Create
8. Switch to Form view
9. Leave default values and click Create
10. Stare at the always loading screen that never proceeds further.

Actual results:

And never finishes or displays any error in UI.

Expected results:

Unknown format should not result in rejected validation. JSON Schema standard says that formats should not be used for validation.

Additional info:

This is not a schema violation by itself since Helm itself is happy about it and doesn't complain. The same chart can be successfully deployed via the YAML view.

https://github.com/openshift/console/pull/12929

Bug OCPBUGS-17356: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/391

Story TRT-1244: Suspected regression in upgrade times on aws ovn minor upgrades

View the Description View the linked PRs

See this component readiness page.

test=[sig-cluster-lifecycle] cluster upgrade should complete in 105.00 minutes

Appears to indicate we're now taking longer than 105 minutes about 7% of the time, previously never.

Slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1694547497553699

wking points out it may be a one time ovn IC thing. Find out what's up and route to appropriate team.

https://github.com/openshift/origin/pull/28265

Bug OCPBUGS-10224: Multiple instances of tabs under ODF dashboard

View the Description View the linked PRs

Description of problem:

Multiple instances of tabs under ODF dashboard is seen and sometimes it also shows 404 error when each such tab is selected and the page is re-loaded

https://bugzilla.redhat.com/show_bug.cgi?id=2124829

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12635

Bug HOSTEDCP-903: AWSEndpointService conditions are not propagated

View the Description View the linked PRs

We faced an issue where the quota was reached for VPCE. This is visible in the status of AWSEndpointService

  - lastTransitionTime: "2023-03-01T10:23:08Z"
    message: 'failed to create vpc endpoint: VpcEndpointLimitExceeded'
    reason: AWSError
    status: "False"
    type: EndpointAvailable

but it should be propagated to the HC as it blocks worker creation (ignition was not working) and for better visibility.

https://github.com/openshift/hypershift/pull/2278

Bug OCPBUGS-5548: Pipeline is not removed when Deployment/DC/Knative Service or Application is deleted

View the Description View the linked PRs

Description of problem:
This is a follow-up on https://bugzilla.redhat.com/show_bug.cgi?id=2083087 and https://github.com/openshift/console/pull/12390

When creating a Deployment, DeploymentConfig, or Knative Service with enabled Pipeline, and then deleting it again with the enabled option "Delete other resources created by console" (only available on 4.13+ with the PR above) the automatically created Pipeline is not deleted.

When the user tries to create the same resource with a Pipeline again this fails with an error:

An error occurred
secrets "nodeinfo-generic-webhook-secret" already exists

Version-Release number of selected component (if applicable):
4.13

(we might want to backport this together with https://github.com/openshift/console/pull/12390 and ~~OCPBUGS-5547~~)

How reproducible:
Always

Steps to Reproduce:

Install OpenShift Pipelines operator (tested with 1.8.2)
Create a new project
Navigate to Add > Import from git and create an application
Case 1: In the topology select the new resource and delete it
Case 2: In the topology select the application group and delete the complete app

Actual results:
Case 1: Delete resources:

Deployment (tries it twice!) $name
Service $name
Route $name
ImageStream $name

Case 2: Delete application:

Deployment (just once) $name
Service $name
Route $name
ImageStream $name

Expected results:
Case 1: Delete resource:

Delete Deployment $name should be called just once
(Keep this deletion) Service $name
(Keep this deletion) Route $name
(Keep this deletion) ImageStream $name
Missing deletion of the Tekton Pipeline $name
Missing deletion of the Tekton TriggerTemplate with generated name trigger-template-$name-$random
Missing deletion of the Secret $name-generic-webhook-secret
Missing deletion of the Secret $name-github-webhook-secret

Case 2: Delete application:

(Keep this deletion) Deployment $name
(Keep this deletion) Service $name
(Keep this deletion) Route $name
(Keep this deletion) ImageStream $name
Missing deletion of the Tekton Pipeline $name
Missing deletion of the Tekton TriggerTemplate with generated name trigger-template-$name-$random
Missing deletion of the Secret $name-generic-webhook-secret
Missing deletion of the Secret $name-github-webhook-secret

Additional info:

https://github.com/openshift/console/pull/12587

Bug OCPBUGS-12613: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/39

Bug OCPBUGS-16298: kube-apiserver without need-management-kas-access label could still access mgmt KAS

View the Description View the linked PRs

Description of problem:

For HOSTEDCP-1062 , components without a label `hypershift.openshift.io/need-management-kas-access: "true"` can not access the management cluster KAS resources. 
But for `kube-apiserver` in HCP, there isn't the targe label `hypershift.openshift.io/need-management-kas-access: "true"` but it can access the mgmt KAS


jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep kube-apiserver
kube-apiserver-6799b6cfd8-wk8pv                      3/3     Running   0          178m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get pods kube-apiserver-6799b6cfd8-wk8pv -n clusters-jie-test -o yaml | grep hypershift.openshift.io/need-management-kas-access
jiezhao-mac:hypershift jiezhao$ 

jiezhao-mac:hypershift jiezhao$ oc -n clusters-jie-test rsh pod/kube-apiserver-6799b6cfd8-wk8pv curl --connect-timeout 2 -Iks https://10.0.142.255:6443 -v
Defaulted container "apply-bootstrap" out of: apply-bootstrap, kube-apiserver, audit-logs, init-bootstrap (init), wait-for-etcd (init)
* Rebuilt URL to: https://10.0.142.255:6443/
..
< HTTP/2 403 
HTTP/2 403 
...
< 
* Connection #0 to host 10.0.142.255 left intact

How reproducible:

refer test case: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-65141

Steps to Reproduce:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-65141

Additional info:

router pod has the label and can access mgmt KAS. My expectation is that router pod shouldn't have the label and shouldn't access mgmt KAS.

$ oc get pods router-667cb7f844-lx8mv -n clusters-jie-test -o yaml | grep hypershift.openshift.io/need-management-kas-access
hypershift.openshift.io/need-management-kas-access: "true"
jiezhao-mac:hypershift jiezhao$ oc -n clusters-jie-test rsh pod/router-667cb7f844-lx8mv curl --connect-timeout 2 -Iks 
https://10.0.142.255:6443
-v
Rebuilt URL to: 
https://10.0.142.255:6443/
Trying 10.0.142.255...
...
< HTTP/2 403
HTTP/2 403

> Actually, router doesn't need it anymore after https://github.com/openshift/hypershift/pull/2778

https://github.com/openshift/hypershift/pull/2888

Bug MGMT-14491: Invalid node label returns error 500 instead of 400

View the Description View the linked PRs

Description of the problem:

Adding invalid label (key or value) to a node returns error code 500 "Internal Server Error", instead of 400

"Bad Request"

How reproducible:

100%

Steps to reproduce:

1. Create a cluster

2. Boot node from ISO

3. Add invalid label, invalid key or value

e.g:

curl -s -H 'Content-Type: application/json' -X PATCH -d '{"node_labels": [{"key": "Label-1", "value": "Label1*1"},{"key": "worker.label2", "value": "Label-2"}]}' https://api.stage.openshift.com/api/assisted-install/v2/infra-envs/8603fe29-e67f-49ad-8ba7-7a256bcb3923/hosts/af629f1e-da67-4211-97f0-f27cb10471ff --header "Authorization: Bearer $(ocm token)"

Actual results:

Action failed with error code 500

{"code":"500","href":"","id":500,"kind":"Error","reason":"node_labels: Invalid value: \"Label1*1\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')"}

Expected results:

Action failed with error code 400

https://github.com/openshift/assisted-service/pull/5362

Bug OCPBUGS-11882: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-16113: feature-gate-manifest command not recognized in older CPO versions

View the Description View the linked PRs

Description of problem:

Noticed an issue with the ignition server when testing some of the latest HO updates on our older control planes:

❯ oc logs ignition-server-5fd4c89764-bddss -n master-roks-dev-4-9
Defaulted container "ignition-server" out of: ignition-server, fetch-feature-gate (init)
Error: unknown flag: --feature-gate-manifest

This seems to be thrown because that flag doesn't exist within the ignition server source code for previous control plane versions--we're specifically only seeing this in 4.9 and 4.10, where the ignition server was not being managed by CPO.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Install HO off main
2. Bring up 4.9/4.10 hosted control planes
3. Ignition server crashes

Actual results:

Ignition server crashes

Expected results:

Ignition server to run without issues

Additional info:

https://github.com/openshift/hypershift/pull/2817

Bug OCPBUGS-19835: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7541

Bug OCPBUGS-19865: Azure AD Workload Identity does not work with bring your own vnet

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18246~~. The following is the description of the original issue:
—
Description of problem:

Role assignment for Azure AD Workload Identity performed by ccoctl does not provide an option to scope role assignments to a resource group containing customer vnet in a byo vnet installation workflow.

https://docs.openshift.com/container-platform/4.13/installing/installing_azure/installing-azure-vnet.html

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create Azure resource group and vnet for OpenShift within that resource group.
2. Create Azure AD Workload Identity infrastructure with ccoctl.
3. Follow steps to configure existing vnet for installation setting networkResourceGroupName within the install config.
4. Attempt cluster installation.

Actual results:

Cluster installation fails.

Expected results:

Cluster installation succeeds.

Additional info:

ccoctl must be extended to accept a parameter specifying the network resource group name and scope relevant component role assignments to the network resource group in addition to the installation resource group.

https://github.com/openshift/cloud-credential-operator/pull/602

Bug OCPBUGS-11225: Relax CSR check due to k8s 1.27 changes

View the Description View the linked PRs

Kubernetes 1.27 changes validation of CSR for non-RSA kubelet client/serving CSRs, see https://github.com/kubernetes/kubernetes/issues/109077 and the PR changing https://github.com/kubernetes/kubernetes/pull/111660.

For that reason our machine-config-approver needs to relax the validation in https://github.com/openshift/cluster-machine-approver/blob/d74f42bb37c4130ae1e91819d90ad40a51ec472b/pkg/controller/csr_check.go#L84-L86 such that it appropriately expects the necessary key usage.

Bug OCPBUGS-14620: KCM is not aware of the AWS Region ap-southeast-3

View the Description View the linked PRs

Description of problem:

When installing a HyperShift cluster into ap-southeast-3 (currently only availble in the production environment), the install will never succeed due to the hosted KCM pods stuck in CrashLoopBackoff

Version-Release number of selected component (if applicable):

4.12.18

How reproducible:

100%

Steps to Reproduce:

1. Install a HyperShift Cluster in ap-southeast-3 on AWS

Actual results:

kube-controller-manager-54fc4fff7d-2t55x                 1/2     CrashLoopBackOff   7 (2m49s ago)   16m
kube-controller-manager-54fc4fff7d-dxldc                 1/2     CrashLoopBackOff   7 (93s ago)     16m
kube-controller-manager-54fc4fff7d-ww4kv                 1/2     CrashLoopBackOff   7 (21s ago)     15m

With selected "important" logs:
I0606 15:16:25.711483       1 event.go:294] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="ConfigMap" apiVersion="v1" type="Normal" reason="LeaderElection" message="kube-controller-manager-54fc4fff7d-ww4kv_6dbab916-b4bf-447f-bbb2-5037864e7f78 became leader"
I0606 15:16:25.711498       1 event.go:294] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="kube-controller-manager-54fc4fff7d-ww4kv_6dbab916-b4bf-447f-bbb2-5037864e7f78 became leader"
W0606 15:16:25.741417       1 plugins.go:132] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will be removed in a future release. Please use https://github.com/kubernetes/cloud-provider-aws
I0606 15:16:25.741763       1 aws.go:1279] Building AWS cloudprovider
F0606 15:16:25.742096       1 controllermanager.go:245] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": not a valid AWS zone (unknown region): ap-southeast-3a

Expected results:

The KCM pods are Running

https://github.com/openshift/hypershift/pull/2659

Story OSASINFRA-2168: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7015

Task MON-988: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2071

Bug OCPBUGS-11285: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13028

Bug OCPBUGS-16036: CredentialsRequest with secret generated by CCO on STS Manual Mode cluster does not have status

View the Description View the linked PRs

Description of problem:

Credentials secret generated by CCO on STS Manual Mode cluster does not have status

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

4.14.0

Steps to Reproduce:

1. Create a Manual mode, STS cluster in AWS.
2. Create a CredentialsRequest which provides .spec.cloudTokenPath and .spec.providerSpec.stsIAMRoleARN.
3. Observe that secret is created by CCO in the target namespace specified by the CredentialsRequest.
4. Observe that the CredentialsRequest does not set status once the secret is generated. Specifically, the CredentialsRequest does not set .status.provisioned == true.

Actual results:

Status is not set on CredentialsRequest with provisioned secret.

Expected results:

Status is set on CredentialsRequest with provisioned secret.

Additional info:

Reported by Jan Safranek when testing integration with the aws-efs-csi-driver-operator.

https://github.com/openshift/cloud-credential-operator/pull/562

Bug OCPBUGS-9329: Plugin count numbers in the Cluster Dashboard Dynamic Plugins popover can be incorrect when the console is running in development mode

View the Description View the linked PRs

Description of problem: When running in development mode [1], the Loaded enabled plugin count numbers in the Cluster Dashboard Dynamic Plugins popover may be incorrect. In order to make the experience less confusing for users working with the console in development mode, we need to:

add a switch (SERVER_FLAG?) to identify the console is running dev mode
update the Cluster Dashboard Dynamic Plugins popover to only show plugins running in dev mode
update the Console plugins list (e.g., /k8s/cluster/operator.openshift.io~v1~Console/cluster/console-plugins) to only show plugins running in dev mode
update https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/README.md#plugin-development

Note there is additional work planned in https://issues.redhat.com/browse/CONSOLE-3185. This bug is intended to only capture improving the experience for development mode.

[1] https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/README.md#plugin-development

https://github.com/openshift/console/pull/12666

Bug MGMT-14040: Data collection is not working while config is enabled

View the Description View the linked PRs

Description of problem:

I have deployed multicluster-engine.v2.3.0-81 with spoke cLuster 4.12.ec5

In the assisted pod I see data collection is enabled:
sh-4.4$ env | grep DATA
DATA_UPLOAD_ENDPOINT=https://console.redhat.com/api/ingress/v1/upload
ENABLE_DATA_COLLECTION=True

But : in AI logs I see "Event uploading is not enabled"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

deployed multicluster-engine.v2.3.0-81 with spoke cLuster 4.12.ec5

check the logs and env vars in pod
...

Actual results:

in AI logs I see "Event uploading is not enabled"

Expected results:

Data should be uploaded

Additional info:

https://github.com/openshift/assisted-service/pull/5054

Bug OCPBUGS-11889: Cross Origin Resource Sharing protection for the OpenShift Web Console

View the Description View the linked PRs

On https://issues.redhat.com/browse/RFE-2273 the customer analyzed quite correctly:

I have re-reviewed all of the provided data from the attached cases (DHL and ANZ) and have documented my findings below:
1) It looks like the request mentioned by the customer is sent to the Console API. Specifically `api/prometheus-tenancy/api/v1/*`
2) This is then forwarded to Cluster Monitoring (Thanos Querier) [0]
3) Thanos is configured to set the CORS headers to `*` due to the absence of the `--web.disable-cors` argument.[1]
4) The Thanos deployment is managed by the Cluster Monitoring Operator directly [2]
5) When using Postman, we can see the endpoint respond with a `access-control-allow-origin: *` [see image 1]
6) Manually setting the `--web.disable-cors` argument inside the Thanos Querier deployment, the `access-control-allow-origin: *` is removed.
7) Changing the Cluster Monitoring Operator deployment template[4] to include the flag and push the custom image into an OCP 4.10.31 cluster [3]
8) Seems like everything is working and the endpoint is not longer returning the CORS header. [see image 2]

We should set {}web.disable-cors{-} for our thanos deployment. We don't load any cross-origin resources through the console>thanos querier path, so this should just work.

https://github.com/openshift/cluster-monitoring-operator/pull/1950

Bug OCPBUGS-12635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/248

Bug OCPBUGS-19952: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1728

Bug OCPBUGS-7232: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12429

Bug MGMT-15653: [BE] Domain with double -- (cat--rahul.com) rejected in network validation

View the Description View the linked PRs

Description of the problem:

Base domain contains double `–` like cat–rahul.com allowed by UI and BE and when node discovered , network validation fails.

Current domain is a private case for using – but note that UI and BE allows to send many – chars as part of domain name.

from agent logs:

Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for ntp-synchronizer ntp-synchronizer-70565cf4 args <[{\"ntp_source\":\"\"}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for domain-resolution domain-resolution-f3917dea args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating domain resolution with args [{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating inventory with args [fea3d7b9-a990-48a6-9a46-4417915072b0]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Failed to validate domain resolution: data, {\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}" file="action.go:42" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating ntp synchronizer with args [{\"ntp_source\":\"\"}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating free addresses with args [[\"192.168.123.0/24\"]]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c cp /etc/mtab /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0 && podman run --privileged --pid=host --net=host --rm --quiet -v /var/log:/var/log -v /run/udev:/run/udev -v /dev/disk:/dev/disk -v /run/systemd/journal/socket:/run/systemd/journal/socket -v /var/log:/host/var/log:ro -v /proc/meminfo:/host/proc/meminfo:ro -v /sys/kernel/mm/hugepages:/host/sys/kernel/mm/hugepages:ro -v /proc/cpuinfo:/host/proc/cpuinfo:ro -v /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0:/host/etc/mtab:ro -v /sys/block:/host/sys/block:ro -v /sys/devices:/host/sys/devices:ro -v /sys/bus:/host/sys/bus:ro -v /sys/class:/host/sys/class:ro -v /run/udev:/host/run/udev:ro -v /dev/disk:/host/dev/disk:ro registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 inventory]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Unable to create runner for step <domain-resolution-f3917dea>, args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:126" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- findmnt --raw --noheadings --output SOURCE,TARGET --target /run/media/iso]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c podman ps --format '{{.Names}}' | grep -q '^free_addresses_scanner$' || podman run --privileged --net=host --rm --quiet --name free_addresses_scanner -v /var/log:/var/log -v /run/systemd/journal/socket:/run/systemd/journal/socket registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 free_addresses '[\"192.168.123.0/24\"]']" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- timeout 30 chronyc -n sources]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=warning msg="Sending step <domain-resolution-f3917dea> reply output <> error <validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'> exit-code <-1>" file="step_processor.go:76" request_id=5467e025-2683-4119-a55a-976bb7787279

How reproducible:

Create a cluster with domain cat–rahul.com with UI fix that allowing it.

Once node discovered , network validation fails on :

DNS wildcard not configured: DNS wildcard check cannot be performed yet because the host has not yet performed DNS resolution.

Steps to reproduce:

see above

Actual results:

Unable to install cluster due to network validation failure

Expected results:
The domain should be allowed in regex

https://github.com/openshift/assisted-service/pull/5451

Bug OCPBUGS-10844: Keystore in secret corrupted after editing the secret in the Console

View the Description View the linked PRs

Description of problem:

When modifying a secret in the Management Console that has a binary file inclued (such as a keystore), the keystore will get corrupted post the modification and therefore impact application functionality (as the keystore can not be read).

$ openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365
$ cat cert.pem key.pem > file.crt.txt
$ openssl pkcs12 -export -in file.crt.txt -out mykeystore.pkcs12 -name myAlias -noiter -nomaciter
$ oc create secret generic keystore --from-file=mykeystore.pkcs12 --from-file=cert.pem --from-file=key.pem -n project-300

apiVersion: v1
kind: Pod
metadata:
  name: mypod
  namespace: project-300
spec:
  containers:
  - name: mypod
    image: quay.io/rhn_support_sreber/curl:latest
    volumeMounts:
    - name: foo
      mountPath: "/keystore"
      readOnly: true
  volumes:
  - name: foo
    secret:
      secretName: keystore
      optional: true

# Getting the md5sum from the file on the local Laptop to compare with what is available in the pod
$ md5sum mykeystore.pkcs12
c189536854e59ab444720efaaa76a34a  mykeystore.pkcs12

sh-5.2# ls -al /keystore/..data/
total 16
drwxr-xr-x. 2 root root  100 Mar 24 11:19 .
drwxrwxrwt. 3 root root  140 Mar 24 11:19 ..
-rw-r--r--. 1 root root 1992 Mar 24 11:19 cert.pem
-rw-r--r--. 1 root root 3414 Mar 24 11:19 key.pem
-rw-r--r--. 1 root root 4380 Mar 24 11:19 mykeystore.pkcs12

sh-5.2# md5sum /keystore/..data/mykeystore.pkcs12
c189536854e59ab444720efaaa76a34a  /keystore/..data/mykeystore.pkcs12
sh-5.2#

Edit cert.pem in secret using the Management Console

$ oc delete pod mypod -n project-300

apiVersion: v1
kind: Pod
metadata:
  name: mypod
  namespace: project-300
spec:
  containers:
  - name: mypod
    image: quay.io/rhn_support_sreber/curl:latest
    volumeMounts:
    - name: foo
      mountPath: "/keystore"
      readOnly: true
  volumes:
  - name: foo
    secret:
      secretName: keystore
      optional: true

sh-5.2# ls -al /keystore/..data/
total 20
drwxr-xr-x. 2 root root   100 Mar 24 12:52 .
drwxrwxrwt. 3 root root   140 Mar 24 12:52 ..
-rw-r--r--. 1 root root  1992 Mar 24 12:52 cert.pem
-rw-r--r--. 1 root root  3414 Mar 24 12:52 key.pem
-rw-r--r--. 1 root root 10782 Mar 24 12:52 mykeystore.pkcs12

sh-5.2# md5sum /keystore/..data/mykeystore.pkcs12
56f04fa8059471896ed5a3c54ade707c  /keystore/..data/mykeystore.pkcs12
sh-5.2#      

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-03-23-204038   True        False         91m     Cluster version is 4.13.0-0.nightly-2023-03-23-204038

The modification was done in the Management Console, selecting the secret and then use: Actions -> Edit Secrets -> Modifying the value of cert.pem and submiting via Save button

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.0-0.nightly-2023-03-23-204038 and 4.12.6

How reproducible:

Always

Steps to Reproduce:

1. See above the details steps

Actual results:

# md5sum on the Laptop for the file
$ md5sum mykeystore.pkcs12
c189536854e59ab444720efaaa76a34a  mykeystore.pkcs12

# md5sum of the file in the pod after the modification in the Management Console
sh-5.2# md5sum /keystore/..data/mykeystore.pkcs12
56f04fa8059471896ed5a3c54ade707c  /keystore/..data/mykeystore.pkcs12

The file got corrupted and is not usable anymore. The binary file though should not be modified if no changes was made on it's value, when editing the secret in the Mansgement Console.

Expected results:

The binary file though should not be modified if no changes was made on it's value, when editing the secret in the Mansgement Console.

Additional info:

A similar problem was alredy fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1879638 but that was, when the binary file was uploaded. Possible that the secret edit functionality is also missing binary file support.

https://github.com/openshift/console/pull/12986

Bug OCPBUGS-14612: Improve logging for KNI haproxy

View the Description View the linked PRs

Improve logging format of KNI haproxy logs to display tcplogs + frondend IP and frontend port.

The current logging format is not very verbose:

<134>Jun  2 22:54:02 haproxy[11]: Connect from ::1:42424 to ::1:9445 (main/TCP)
<134>Jun  2 22:54:04 haproxy[11]: Connect from ::1:42436 to ::1:9445 (main/TCP)
<134>Jun  2 22:54:04 haproxy[11]: Connect from ::1:42446 to ::1:9445 (main/TCP)

It lacks critical information for troubleshooting, such as load-balancing destination and timestamps.
https://www.haproxy.com/blog/introduction-to-haproxy-logging recommends the following for tcp mode:

When in TCP mode, which is set by adding mode tcp, you should also add [option tcplog](https://www.haproxy.com/documentation/hapee/1-8r1/onepage/#option%20tcplog).

https://github.com/openshift/machine-config-operator/pull/3725

Bug OCPBUGS-18285: Bump to kubernetes 1.27.6

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.27.6:

Changelog:
v1.27.6: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1275
v1.27.5: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1274

https://github.com/openshift/kubernetes/pull/1709

Bug OCPBUGS-19075: Update 4.14 marketplace-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/535

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/535

Bug OCPBUGS-10798: must-gather does not contain CSIStorageCapacity

View the Description View the linked PRs

Description of problem:

CSI storage capacity tracking is GA since Kubernetes 1.24, yet must-gather does not collect CSIStorageCapacity objects. It would be useful for single node clusters with LVMO, but other clusters could benefit from it too.

Version-Release number of selected component (if applicable):

4.11.0

How reproducible:

always

Steps to Reproduce:

1. oc adm must-gather

Actual results:

Output does not contain CSIStorageCapacity objects

Expected results:

Output contains CSIStorageCapacity objects

Additional info:

We should go through all new additions to storage APIs (storage.k8s.io/v1) and any missing items.

https://github.com/openshift/must-gather/pull/356

Bug OCPBUGS-15945: CNO degraded with "Panic detected: net/http: abort Handler"

View the Description View the linked PRs

Description of problem:

CNO panics with net/http: abort Handler while installing SNO cluster on OpenshiftSDN

network                                    4.14.0-0.nightly-2023-07-05-191022   True        False         True       9h      Panic detected: net/http: abort Handler

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-05-191022

How reproducible:

sometimes

Steps to Reproduce:

1.Install OpenshiftSDN cluster on SNO
2.
3.

Actual results:

Cluster (CNO) reports errors

Expected results:

Cluster should be installed fine

Additional info:

SOS: http://shell.lab.bos.redhat.com/~anusaxen/sosreport-rg-0707-tl6fd-master-0-2023-07-07-pyaruar.tar.xz

MG:  http://shell.lab.bos.redhat.com/~anusaxen/must-gather.local.4340060474822893433/

https://github.com/openshift/cluster-network-operator/pull/1893

Story STOR-1432: Allow separate images to the specified for Hosted Control Plane components

View the Description View the linked PRs

Hypershift needs to be able to specify a different release payload for control plane components without redeploying anything in the hosted cluster.

csi-driver-node DaemonSet pods in the hosted cluster and the csi-driver-controller Deployment that runs in the control plane both use the AWS_EBS_DRIVER_IMAGE and LIVENESS_PROBE_IMAGE

https://github.com/openshift/hypershift/blob/fc42313fc93125799f7eba5361190043cc2f6561/control-plane-operator/controllers/hostedcontrolplane/storage/envreplace.go#L9-L48

We need a way to specify these images separately for csi-driver-node and csi-driver-controller.

Bug OCPBUGS-2968: OpenShift will not start if no registry exists, even if images are loaded into containers-store

View the Description View the linked PRs

Description of problem:

Even in environments when containers are manually loaded into containers-store, services will fail because they are written to always pull images priory to starting the container (or checking podman image to see if the image exists first).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6536

Bug OCPBUGS-7632: [CI Watcher] Testing uninstall of Business Automation Operator "attempts to uninstall the Operator and delete all Operand Instances, shows 'Error Deleting Operands' alert"

View the Description View the linked PRs

Description of problem:

Business Automation Operands fail to load in uninstall operator modal. With "Cannot load Operands. There was an error loading operands for this operator. Operands will need to be deleted manually..." alert message.

"Delete all operand instances for this operator__checkbox" is not shown so the test fails. 

https://search.ci.openshift.org/?search=Testing+uninstall+of+Business+Automation+Operator&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12647

Bug OCPBUGS-8271: unusual error log in cluster-policy-controller

View the Description View the linked PRs

Description of problem:

The kube-controller-manager container cluster-policy-controller will show unusual error logs ,such as "
I0214 10:49:34.698154       1 interface.go:71] Couldn't find informer for template.openshift.io/v1, Resource=templateinstances
I0214 10:49:34.698159       1 resource_quota_monitor.go:185] QuotaMonitor unable to use a shared informer for resource "template.openshift.io/v1, Resource=templateinstances": no informer found for template.openshift.io/v1, Resource=templateinstances
"

Version-Release number of selected component (if applicable):

How reproducible:

when the cluster-policy-controller restart ,u will see these logs

Steps to Reproduce:

1.oc logs kube-controller-manager-master0 -n openshift-kube-controller-manager -c cluster-policy-controller

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-policy-controller/pull/100

Bug OCPBUGS-12044: Update 4.14 cluster-etcd-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1042

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1047

Bug OCPBUGS-12859: TestDNSLogging e2e test flakes

View the Description View the linked PRs

The e2e test "TestDNSLogging" from https://github.com/openshift/cluster-dns-operator/tree/master/test/e2e fails intermittently.

Recently seen in:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/364/pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-operator/1650652904379387904

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/364/pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-operator/1651228691235082240

https://github.com/openshift/cluster-dns-operator/pull/365

Bug OCPBUGS-15575: updated nmstate builds will not work for MCO

View the Description View the linked PRs

Description of problem:

nmstate packages > 2.2.9 will cause MCD firstboot to fail. For now, let's pin the nmstate version and fix properly via https://github.com/openshift/machine-config-operator/pull/3720

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3769

Bug OCPBUGS-12610: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/475

Bug OCPBUGS-14940: No datapoints found for Long Running Requests by Resource and Long Running Requests by Instance of "API Performance" dashboard

View the Description View the linked PRs

Description of problem:

No datapoints found for Long Running Requests by Resource and Long Running Requests by Instance of "API Performance" dashboard on web-console UI

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

always

Steps to Reproduce:

1.Installed OCP cluster with 4.14 nightly payload
2.Open the web-console, view the page "API Performance" dashboard on web-console UI

Actual results:

1.On the Long Running Requests by Resource and Long Running Requests by Instance page, shows No datapoints found

Expected results:

2.Should show something on Long Running Requests by Resource and Long Running Requests by Instance pages.

Additional info:

1. Got the same results on 4.13.
2. Not found the apiserver_longrunning_gauge in prometheus data, only apiserver_longrunning_requests

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep apiserver_longrunning_gauge
no result

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep apiserver_long
    "apiserver_longrunning_requests",

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1511

Bug OCPBUGS-9982: SCOS bootstrap should skip pivot when root is not writable

View the Description View the linked PRs

Description of problem:

In assisted-installer flow bootkube service is started on Live ISO, so root FS is read-only. OKD installer attempts to pivot the booted OS to machine-os-content via `rpm-ostree rebase`. This is not necessary since we're already using SCOS in Live ISO.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6965

Bug OCPBUGS-11219: Print preview of Topology UI List view presents incorrect layout

View the Description View the linked PRs

Description of problem:

Print preview of Topology presents incorrect layout

Version-Release number of selected component (if applicable):

4.12.0

How reproducible:

Always

Steps to Reproduce:

1. Have 2 KNative/Serverless Functions deployed (in my case 1 is Quarkus and another is Spring Boot)
2. In Topology UI observe you see their snippets properly within Graph view are
3. Now switch to List view.
4. In my case items I see in List view are such short list of my items:
Broker
  default
Operator Backed Service
DW terminal-avby87
  D workspaceb5975d64dbc54983
Service
KSVC caller-function
  REV caller-function-00002
Service
KSVC callme-function
  REV callme-function-00001
5. Now using Chrome browser click Ctrl+P, i.e. Print preview
6. Observe that even in Landscape mode only till workspace item is displayed and no more pages/info.

Actual results:

Incomplete Topology info from List view in Print Preview

Expected results:

Full and accurate Topology info from List view in Print Preview

Additional info:

Bug OCPBUGS-19627: Multus per node certificates: CNO integration [backport 4.14]

View the Description View the linked PRs

Description of problem: Multus should implement per node certificates via integration in the CNO

https://github.com/openshift/cluster-network-operator/pull/2023

Bug OCPBUGS-16586: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-openstack/pull/74

Bug OCPBUGS-17436: DynamicResourceAllocation feature breaks TechPreview clusters

View the Description View the linked PRs

Description of problem:

When installing a new cluster with TechPreviewNoUpgrade featureSet, Nodes never become Ready.

Logs from control-plane components indicate that a resource associated with the DynamicResourceAllocation feature can't be found:

E0804 15:48:51.094383       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1alpha2.PodSchedulingContext: failed to list *v1alpha2.PodSchedulingContext: the server could not find the requested resource (get podschedulingcontexts.resource.k8s.io)

It turns out we either need to:

1. Enable the resource.k8s.io/v1alpha2=true API in kube-apiserver.
2. Or disable the DynamicResourceAllocation feature as TP.

For now I added a commit to invalidate this feature in o/k and disable all related tests. Please let me know once this is sorted out so that I can drop that commit from the rebase PR.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always when installing a new cluster with TechPreviewNoUpgrade featureSet.

Steps to Reproduce:

1. Install cluster with TechPreviewNoUpgrade featureSet (this can be done passing an install-config.yaml to the installer).
2. Check logs from one the control-plane components.

Actual results:

Nodes are NotReady and ClusterOperators Degraded.

Expected results:

Cluster is installed successfully.

Additional info:

Slack thread: https://redhat-internal.slack.com/archives/C05HQGU8TFF/p1691154653507499

How to enable an API in KAS: https://kubernetes.io/docs/tasks/administer-cluster/enable-disable-api/

Story CORS-2656: Remove Context from GCP Destroy Struct

View the Description View the linked PRs

When making a change to the uninstaller for GCP, the linter picked up an error:

pkg/destroy/gcp/gcp.go:42:2: found a struct that contains a context.Context field (containedctx)
	Context           context.Context

Contexts should not be added to structs. Instead the context should be created at the top level of the uninstaller OR a separate context can be used for each stage of the uninstallation process.

Currently this error can be bypassed by adding:

//nolint:containedctx

to the offending line

https://github.com/openshift/installer/pull/7169

Bug OCPBUGS-14137: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/653

Bug OCPBUGS-15155: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/345

Bug OCPBUGS-16403: Update Cluster Sample Operator dependencies and libraries for OCP 4.14

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.14. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/511

Bug OCPBUGS-13871: Should update with --include-local-oci-catalogs for --oci-registries-config options

View the Description View the linked PRs

Should update with --include-local-oci-catalogs for --oci-registries-config's help info 


      --oci-registries-config string    Registries config file location (used only with --use-oci-feature flag)
Now the `--use-oci-feature` has been deprecated, please replace with --include-local-oci-catalogs for the help information.

https://github.com/openshift/oc-mirror/pull/653

Bug OCPBUGS-14321: Sysctl allowlist update test is unstable

View the Description View the linked PRs

Description of problem:

After updating the sysctl config map, the test waits up to 30s for the pod to be in ready state. From the logs, it could be seen that the allowlist controller takes more than 30s to reconcile when multiple tests are running in parallel.

The internal logic of the allowlist controller waits up to 60s for the pods of the allowlist DS to be running. Therefore, it is logical to increase the timeout in the test to 60s.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27955

Bug OCPBUGS-8682: Empty clickable item in drop-down list in OpenShift console -> Installed Operators -> All Instances

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Go to console 
2. Click  on "Installed Operator"
3. Add operator (Node feature discovery )
4. Click on all instances that on Create new (see image)

Actual results:

The drop down is empty but the as a user you can click them and get to the new instance yaml

Expected results:

For a better user experince if at least there will be at least some labels or clickable text

Additional info:

https://github.com/openshift/console/pull/12819

Bug OCPBUGS-19305: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/202

Bug OCPBUGS-3505: Cluster bootstrap waits for only one master to join before finishing

View the Description View the linked PRs

Description of problem:

While installing cluster with assisted installer lately we have cases when one of the master joins very quickly and start all needed pods in order for cluster bootstrap to finish but the second one joins only after that.
Keepalived can't start if there is only one joined cluster as it doesn't have enough data to build configuration files.
In HA mode cluster bootstrap should wait at least for 2 joined masters before removing bootstrap control plane as without it installation with fail.

Version-Release number of selected component (if applicable):

How reproducible:

Start bm installation and start one master, wait till it starts all required pods and then add others.

Steps to Reproduce:

1. Start bm installation 
2. Start one master 
3. Wait till it starts all required pods.
4. Add others

Actual results:

no vip, installation fails

Expected results:

installation succeeds, vip moves to master

Additional info:

https://github.com/openshift/cluster-bootstrap/pull/71

Bug OCPBUGS-10649: Hypershift replace upgrade: node in NotReady after upgrading from a 4.14 image to another 4.14 image

View the Description View the linked PRs

Description of problem:

After a replace upgrade from OCP 4.14 image to another 4.14 image first node is in NotReady.

jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig 
NAME                     STATUS   ROLES  AGE   VERSION
ip-10-0-128-175.us-east-2.compute.internal  Ready   worker  72m   v1.26.2+06e8c46
ip-10-0-134-164.us-east-2.compute.internal  Ready   worker  68m   v1.26.2+06e8c46
ip-10-0-137-194.us-east-2.compute.internal  Ready   worker  77m   v1.26.2+06e8c46
ip-10-0-141-231.us-east-2.compute.internal  NotReady  worker  9m54s  v1.26.2+06e8c46

- lastHeartbeatTime: "2023-03-21T19:48:46Z"
  lastTransitionTime: "2023-03-21T19:42:37Z"
  message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady
   message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/.
   Has your network provider started?'
  reason: KubeletNotReady
  status: "False"
  type: Ready

Events:
 Type   Reason          Age         From          Message
 ----   ------          ----        ----          -------
 Normal  Starting         11m         kubelet        Starting kubelet.
 Normal  NodeHasSufficientMemory 11m (x2 over 11m)  kubelet        Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasSufficientMemory
 Normal  NodeHasNoDiskPressure  11m (x2 over 11m)  kubelet        Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasNoDiskPressure
 Normal  NodeHasSufficientPID   11m (x2 over 11m)  kubelet        Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasSufficientPID
 Normal  NodeAllocatableEnforced 11m         kubelet        Updated Node Allocatable limit across pods
 Normal  Synced          11m         cloud-node-controller Node synced successfully
 Normal  RegisteredNode      11m         node-controller    Node ip-10-0-141-231.us-east-2.compute.internal event: Registered Node ip-10-0-141-231.us-east-2.compute.internal in Controller
 Warning ErrorReconcilingNode   17s (x30 over 11m) controlplane      nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation

ovnkube-master log:

I0321 20:55:16.270197       1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:16.270209       1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:16.270273       1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:17.851497       1 master.go:719] Adding or Updating Node "ip-10-0-137-194.us-east-2.compute.internal"
I0321 20:55:25.965132       1 master.go:719] Adding or Updating Node "ip-10-0-128-175.us-east-2.compute.internal"
I0321 20:55:45.928694       1 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:NB_Global Row:map[options:{GoMap:map[e2e_timestamp:1679432145 mac_prefix:2e:f9:d8 max_tunid:16711680 northd_internal_version:23.03.1-20.27.0-70.6 northd_probe_interval:5000 svc_monitor_mac:fe:cb:72:cf:f8:5f use_logical_dp_groups:true]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {c8b24290-296e-44a2-a4d0-02db7e312614}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
I0321 20:55:46.270129       1 obj_retry.go:265] Retry object setup: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:55:46.270154       1 obj_retry.go:319] Adding new object: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:55:46.270164       1 master.go:719] Adding or Updating Node "ip-10-0-141-231.us-east-2.compute.internal"
I0321 20:55:46.270201       1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:46.270209       1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:46.270284       1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:52.916512       1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Namespace total 5 items received
I0321 20:56:06.910669       1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Pod total 12 items received
I0321 20:56:15.928505       1 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:NB_Global Row:map[options:{GoMap:map[e2e_timestamp:1679432175 mac_prefix:2e:f9:d8 max_tunid:16711680 northd_internal_version:23.03.1-20.27.0-70.6 northd_probe_interval:5000 svc_monitor_mac:fe:cb:72:cf:f8:5f use_logical_dp_groups:true]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {c8b24290-296e-44a2-a4d0-02db7e312614}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
I0321 20:56:16.269611       1 obj_retry.go:265] Retry object setup: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:56:16.269637       1 obj_retry.go:319] Adding new object: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:56:16.269646       1 master.go:719] Adding or Updating Node "ip-10-0-141-231.us-east-2.compute.internal"
I0321 20:56:16.269688       1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:56:16.269697       1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:56:16.269724       1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation

cluster-network-operator log:

I0321 21:03:38.487602       1 log.go:198] Set operator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:38.488312       1 log.go:198] Skipping reconcile of Network.operator.openshift.io: spec unchanged
I0321 21:03:38.499825       1 log.go:198] Set ClusterOperator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:38.571013       1 log.go:198] Set HostedControlPlane conditions:
- lastTransitionTime: "2023-03-21T17:38:24Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidAWSIdentityProvider
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Configuration passes validation
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidHostedControlPlaneConfiguration
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: QuorumAvailable
  status: "True"
  type: EtcdAvailable
- lastTransitionTime: "2023-03-21T17:38:23Z"
  message: Kube APIServer deployment is available
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: KubeAPIServerAvailable
- lastTransitionTime: "2023-03-21T20:26:29Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "False"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:37:11Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: InfrastructureReady
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: External DNS is not configured
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ExternalDNSReachable
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: Available
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Reconciliation active on resource
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ReconciliationActive
- lastTransitionTime: "2023-03-21T17:38:25Z"
  message: All is well
  reason: AsExpected
  status: "True"
  type: AWSDefaultSecurityGroupCreated
- lastTransitionTime: "2023-03-21T19:30:54Z"
  message: 'Error while reconciling 4.14.0-0.nightly-2023-03-20-201450: the cluster
    operator network is degraded'
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "False"
  type: ClusterVersionProgressing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Condition not found in the CVO.
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ClusterVersionUpgradeable
- lastTransitionTime: "2023-03-21T17:44:05Z"
  message: Done applying 4.14.0-0.nightly-2023-03-20-201450
  observedGeneration: 3
  reason: FromClusterVersion
  status: "True"
  type: ClusterVersionAvailable
- lastTransitionTime: "2023-03-21T19:55:15Z"
  message: Cluster operator network is degraded
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "True"
  type: ClusterVersionFailing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Payload loaded version="4.14.0-0.nightly-2023-03-20-201450" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-03-20-201450"
    architecture="amd64"
  observedGeneration: 3
  reason: PayloadLoaded
  status: "True"
  type: ClusterVersionReleaseAccepted
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "False"
  type: network.operator.openshift.io/ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: network.operator.openshift.io/Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: network.operator.openshift.io/Progressing
- lastTransitionTime: "2023-03-21T17:39:27Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Available
I0321 21:03:39.450912       1 pod_watcher.go:125] Operand /, Kind= openshift-multus/multus updated, re-generating status
I0321 21:03:39.450953       1 pod_watcher.go:125] Operand /, Kind= openshift-multus/multus updated, re-generating status
I0321 21:03:39.493206       1 log.go:198] Set operator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:39.494050       1 log.go:198] Skipping reconcile of Network.operator.openshift.io: spec unchanged
I0321 21:03:39.508538       1 log.go:198] Set ClusterOperator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:39.684429       1 log.go:198] Set HostedControlPlane conditions:
- lastTransitionTime: "2023-03-21T17:38:24Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidAWSIdentityProvider
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Configuration passes validation
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidHostedControlPlaneConfiguration
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: QuorumAvailable
  status: "True"
  type: EtcdAvailable
- lastTransitionTime: "2023-03-21T17:38:23Z"
  message: Kube APIServer deployment is available
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: KubeAPIServerAvailable
- lastTransitionTime: "2023-03-21T20:26:29Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "False"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:37:11Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: InfrastructureReady
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: External DNS is not configured
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ExternalDNSReachable
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: Available
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Reconciliation active on resource
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ReconciliationActive
- lastTransitionTime: "2023-03-21T17:38:25Z"
  message: All is well
  reason: AsExpected
  status: "True"
  type: AWSDefaultSecurityGroupCreated
- lastTransitionTime: "2023-03-21T19:30:54Z"
  message: 'Error while reconciling 4.14.0-0.nightly-2023-03-20-201450: the cluster
    operator network is degraded'
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "False"
  type: ClusterVersionProgressing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Condition not found in the CVO.
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ClusterVersionUpgradeable
- lastTransitionTime: "2023-03-21T17:44:05Z"
  message: Done applying 4.14.0-0.nightly-2023-03-20-201450
  observedGeneration: 3
  reason: FromClusterVersion
  status: "True"
  type: ClusterVersionAvailable
- lastTransitionTime: "2023-03-21T19:55:15Z"
  message: Cluster operator network is degraded
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "True"
  type: ClusterVersionFailing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Payload loaded version="4.14.0-0.nightly-2023-03-20-201450" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-03-20-201450"
    architecture="amd64"
  observedGeneration: 3
  reason: PayloadLoaded
  status: "True"
  type: ClusterVersionReleaseAccepted
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "False"
  type: network.operator.openshift.io/ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: network.operator.openshift.io/Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: network.operator.openshift.io/Progressing
- lastTransitionTime: "2023-03-21T17:39:27Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Available

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. management cluster 4.13
2. bring up the hostedcluster and nodepool in 4.14.0-0.nightly-2023-03-19-234132
3. upgrade the hostedcluster to 4.14.0-0.nightly-2023-03-20-201450 
4. replace upgrade the nodepool to 4.14.0-0.nightly-2023-03-20-201450

Actual results

First node is in NotReady

Expected results:

All nodes should be Ready

Additional info:

No issue with replace upgrade from 4.13 to 4.14

https://github.com/openshift/cluster-network-operator/pull/1748

Bug OCPBUGS-12561: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/56

Bug OCPBUGS-17998: While mirroring nvidia operator with oc-mirror 4.13 version, ImageContentSourcePolicy is not getting created properly

View the Description View the linked PRs

Description of problem:

While mirroring nvidia operator with oc-mirror 4.13 version, ImageContentSourcePolicy is not getting created properly

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create imageset file

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 4
storageConfig:
  local:
    path: /home/name/nvidia
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/certified-operator-index:v4.11
    packages:
    - name: nvidia-network-operator

2. mirror to disk using oc-mirror 4.13
$oc-mirror -c imageset.yaml file:///home/name/nvidia/
./oc-mirror version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.13.0-202307242035.p0.gf11a900.assembly.stream-f11a900", GitCommit:"f11a9001caad8fe146c73baf2acc38ddcf3642b5", GitTreeState:"clean", BuildDate:"2023-07-24T21:25:46Z", GoVersion:"go1.19.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

3. Now generate the manifest

$ oc-mirror --from /home/name/nvidia/ docker://registry:8443 --manifests-only

- mirrors:
    - registry:8443/nvidia/cloud-native
    source: nvcr.io/nvidia

However the correct mapping should be:
    - mirrors:
        - registry/nvidia
      source: nvcr.io/nvidia

4. perform same step with 4.12.0 version you will not hit this issue. 
./oc-mirror version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.12.0-202304241542.p0.g5fc00fe.assembly.stream-5fc00fe", GitCommit:"5fc00fe735d8fb3b6125f358f5d6b9fe726fad10", GitTreeState:"clean", BuildDate:"2023-04-24T16:01:29Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/681

Bug OCPBUGS-18306: `useDeleteModal` example is not formatted correctly on https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#example-46

View the Description View the linked PRs

`useDeleteModal` example is not formatted correctly on https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#example-46 as it is missing the wrapping "```tsx" and "```" markdown.

https://github.com/openshift/console/pull/13117

Bug OCPBUGS-18406: Builds navigation item is missing in Developer perspective

View the Description View the linked PRs

Description of problem:
Builds navigation item is missing in Developer perspective

Version-Release number of selected component (if applicable):
4.14.0

How reproducible:
Always

Steps to Reproduce:

Open the developer perspective on a cluster with BuildConfigs (default)

Actual results:
"Builds" is missing as a navigation item below "Search".

Expected results:
"Builds" navigation item should be displayed again when BuildConfigs CRD is available.

Additional info:
Might be dropped with PR https://github.com/openshift/console/pull/13097

https://github.com/openshift/console/pull/13124

Bug OCPBUGS-14907: Operator Backed catalog doesn't show anything when CSV copies are disabled

View the Description View the linked PRs

Description of problem:

We disabled copies of CSVs in our clusters, the list of the installed operators is visible, but when we go (within the context of some user namespace) to:
Developer Catalog -> Operator Backed
then the list is empty.

When we enable the copies of CSVs, then the operator backed catalog shows the expected items.

Version-Release number of selected component (if applicable):

OpenShift 4.13.1

How reproducible:

every time

Steps to Reproduce:

1. install Camel-k operator (community version, stable channel)
2. Disable copies of CSV by setting 'OLMConfig.spec.features.disableCopiedCSVs' to 'true'
3. create a new namespace/project
4. go to Developer Catalog -> Operator backed

Actual results:

the Operator Backed Catalog is empty

Expected results:

the Operator Backed Catalog should show Camel-K related items

Additional info:

https://github.com/openshift/console/pull/12932

Bug OCPBUGS-17374: Dockerfile.fast not working due to .dockerignore

View the Description View the linked PRs

Description of problem:

Dockerfile.fast relies on picking up the `bin` directory built in the host for inclusion in the HyperShift Operator image for development.

Containerfile.operator, for RHTAP, relies on .dockerignore to prevent a `/bin` to be present in the podman build context that has permissions that the user `default` (used by the golang build container) can't write to.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.make docker-build-fast

Actual results:

COPY bin/* /usr/bin/ fails due to bin not being included in the podman build context

Expected results:

The container builds successfully

Additional info:

https://github.com/openshift/hypershift/pull/2879

Bug OCPBUGS-18727: [OCP web console] Unable to select/change log component under master node's logs section once user made any selection.

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13146

Bug OCPBUGS-19337: Unhide the Import From Git Tab on the Add page if Pipelines Operator is installed and BuildConfig is not installed in the cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19311~~. The following is the description of the original issue:
—

Description

As a user, I would like to use the Import from Git form even if I don't have BC installed in my cluster, but I have installed the Pipelines operator.

Acceptance Criteria

Show the Import From Git Tab on the Add page if Pipelines Operator is installed and BuildConfig is not installed in the cluster

Additional Details:

https://github.com/openshift/console/pull/13160

Bug OCPBUGS-6407: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/195

Bug OCPBUGS-14833: CNO breaks with newer golangci-linter

View the Description View the linked PRs

No QA needed. Current CNO does not pass with newer linter version 1.53.1.

https://github.com/openshift/cluster-network-operator/pull/1834

Bug OCPBUGS-15754: Bump Jenkins and Jenkins Agent Base image versions

View the Description View the linked PRs

Description of problem:

Jenkins and Jenkins Agent Base image versions needs to be updated to use the latest images to mitigate known CVEs in plugins and Jenkins versions.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/504

Bug OCPBUGS-16776: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-15999~~.

https://github.com/openshift/installer/pull/7409

Task OPRUN-2995: Remove cluster-policy-controller dependency from olm

View the Description View the linked PRs

The PSA changes introduced in 4.12 meant that we had to figure out a way to ensure that customer workloads (3rd-party or otherwise) wouldn't grind to a halt as pods cannot be scheduled due to PSA. The solution found was to have another controller that could introspect a namespace to determine the best pod security standard to apply to the namespace. This controller ignores payload namespaces (usually named openshift-), but will reconcile non-payload openshift- namespaces with a special label applied to it. On the OLM side, we had to create a controller that would apply the psa label sync'er label to non-payload openshift-* namespaces with operators (CSVs) installed in them.

OLM took a dependency on the cluster-policy-controller in order to get the list of payload namespaces. This dependency introduced a few challenges for our CI:

we need to ensure parity between the CPC and OLM OpenShift releases: since the list of payload namespaces could vary between OpenShift releases.
because the CPC is also a controller, it depends on many of the same libraries as OLM. This can cause vendoring problems, or force OLM to be in lockstep with CPC w.r.t. the common controller libraries

To avoid these issues, and seen as the list probably won't update very frequently, we'll make our own copy of the list and maintain it on this side, as this will be less busy work than the alternative.

https://github.com/openshift/operator-framework-olm/pull/498

Bug OCPBUGS-14029: The vsphere-problem-detector-operator panics if vsphere Infrastructure field is empty

View the Description View the linked PRs

Duplicate to use automation since original bug is restricted.
https://issues.redhat.com/browse/OCPBUGS-14022

https://github.com/openshift/vsphere-problem-detector/pull/115

Bug OCPBUGS-17701: EUS upgrade from 4.12 ->4.14 is not working

View the Description View the linked PRs

Description of problem:

 On attempting to perform EUS->EUS upgrade from 4.12.z->4.14 (CI builds), I am seeing consistently that after upgrade OCP to 4.14, worker machine configpool goes to degraded state, complaining about {noformat}message: 'Node c01-dbn-412-tzm44-worker-0-7w6wg is reporting: "failed to run
        nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl: no such file
        or directory", Node c01-dbn-412-tzm44-worker-0-cmqsl is reporting: "failed
        to run nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl: no
        such file or directory", Node c01-dbn-412-tzm44-worker-0-qrp6v is reporting:
        "failed to run nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl:
        no such file or directory"'
{noformat}. And then clusterversion reports error:
{noformat}
[cloud-user@ocp-psi-executor dbasunag]$ oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.ci-2023-08-14-110508   True        True          125m    Unable to apply 4.14.0-0.ci-2023-08-14-152624: wait has exceeded 40 minutes for these operators: machine-config
[cloud-user@ocp-psi-executor dbasunag]$
{noformat}
This is consistently reproducible in clusters with knmstate installed.

Version-Release number of selected component (if applicable):

4.12.29 -> 4.13.0-0.ci-2023-08-14-110508->4.14.0-0.ci-2023-08-14-152624

How reproducible:

100%

Steps to Reproduce:

1. Perform EUS upgrade on a cluster with CNV, ODF, Knmstate
2. After pausing worker mcp, upgraded OCP, ODF, CNV, KNMstate to 4.13 - everything worked fine
3. After upgrading OCP to 4.14, when master mcp is updated, worker mcp went to degraded state and clusterversion eventually reported error (all the master nodes were updated)

Actual results:

[cloud-user@ocp-psi-executor dbasunag]$ oc get co
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.ci-2023-08-14-152624   True        False         False      9h      
baremetal                                  4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
cloud-controller-manager                   4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
cloud-credential                           4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
cluster-autoscaler                         4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
config-operator                            4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
console                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
control-plane-machine-set                  4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
csi-snapshot-controller                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
dns                                        4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
etcd                                       4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
image-registry                             4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
ingress                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
insights                                   4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-apiserver                             4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-controller-manager                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-scheduler                             4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-storage-version-migrator              4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
machine-api                                4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
machine-approver                           4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
machine-config                             4.13.0-0.ci-2023-08-14-110508   True        True          True       2d23h   Unable to apply 4.14.0-0.ci-2023-08-14-152624: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)]]
marketplace                                4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
monitoring                                 4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
network                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
node-tuning                                4.14.0-0.ci-2023-08-14-152624   True        False         False      95m     
openshift-apiserver                        4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
openshift-controller-manager               4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
openshift-samples                          4.14.0-0.ci-2023-08-14-152624   True        False         False      98m     
operator-lifecycle-manager                 4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
operator-lifecycle-manager-catalog         4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
operator-lifecycle-manager-packageserver   4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
service-ca                                 4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
storage                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
[cloud-user@ocp-psi-executor dbasunag]$ 
[cloud-user@ocp-psi-executor dbasunag]$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-693b054330417fe5e098b58716603fc8   True      False      False      3              3                   3                     0                      2d23h
worker   rendered-worker-b2f5a9084e9919b4c1c491658c73bce5   False     False      True       3              0                   0                     3                      2d23h
[cloud-user@ocp-psi-executor dbasunag]$
[cloud-user@ocp-psi-executor dbasunag]$ oc get node
NAME                               STATUS   ROLES                  AGE     VERSION
c01-dbn-412-tzm44-master-0         Ready    control-plane,master   2d23h   v1.27.4+deb2c60
c01-dbn-412-tzm44-master-1         Ready    control-plane,master   2d23h   v1.27.4+deb2c60
c01-dbn-412-tzm44-master-2         Ready    control-plane,master   2d23h   v1.27.4+deb2c60
c01-dbn-412-tzm44-worker-0-7w6wg   Ready    worker                 2d22h   v1.25.11+1485cc9
c01-dbn-412-tzm44-worker-0-cmqsl   Ready    worker                 2d22h   v1.25.11+1485cc9
c01-dbn-412-tzm44-worker-0-qrp6v   Ready    worker                 2d22h   v1.25.11+1485cc9
[cloud-user@ocp-psi-executor dbasunag]$

Expected results:

EUS upgrade should work without error

Additional info:

Must-gather can be found here: https://drive.google.com/drive/folders/1SCZoYpGiRpOteTM-sTLmbfgr3hqsICVO?usp=drive_link

https://github.com/openshift/machine-config-operator/pull/3860

Bug OCPBUGS-17919: Azure MAO CredentialRequests Missing Compute Permissions

View the Description View the linked PRs

Description of problem:

CredentialsRequest for Azure AD Workload Identity missing disk encryption set read permissions.

- Microsoft.Compute/diskEncryptionSets/read

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Every time when creating a machine with a disk encryption set

Steps to Reproduce:

1. Create workload identity cluster
2. Create keyvault and secret within keyvault
3. Create disk encryption set and point it to keyvault; can use system-assigned identity 
4. Create or modify existing machineset to include a disk encryption set.  
            managedDisk:
              diskEncryptionSet:
                id: /subscriptions/<subscription_id>/resourceGroups/<resource_id>/providers/Microsoft.Compute/diskEncryptionSets/<disk_encryption_set_name>
5. Scale machineset

Actual results:

'failed to create vm <vm_name>:
        failure sending request for machine steven-wi-cluster-pzqvm-worker-eastus3-mfk5z:
        cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending
        request: StatusCode=403 -- Original Error: Code="LinkedAuthorizationFailed"
        Message="The client ''55c10ba9-f891-4f42-a697-0ab283b86c63'' with object id
        ''55c10ba9-f891-4f42-a697-0ab283b86c63'' has permission to perform action
        ''Microsoft.Compute/virtualMachines/write'' on scope ''/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/steven-wi-cluster-pzqvm-worker-eastus3-mfk5z'';
        however, it does not have permission to perform action ''read'' on the linked
        scope(s) ''/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/diskEncryptionSets/test-disk-encryption-set''
        or the linked scope(s) are invalid."'

Expected results:

The machine is able to create and join the cluster successfully.

Additional info:

Docs about preparing disk encryption sets on Azure: https://docs.openshift.com/container-platform/4.12/installing/installing_azure/enabling-user-managed-encryption-azure.html

https://github.com/openshift/machine-api-operator/pull/1162

Bug OCPBUGS-9182: Console doesn't honor kubectl.kubernetes.io/default-container annotation

View the Description View the linked PRs

The `kubectl.kubernetes.io/default-container` annotation can be set on a pod to specify the default container for logs and terminal. The console doesn't honor the annotation. For example:

https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/bindata/assets/kube-apiserver/pod.yaml#L7

https://github.com/openshift/console/pull/13098

Bug OCPBUGS-12644: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/538

Bug OCPBUGS-13257: Labels added in the Git import flow are not propagated to the pipeline resources

View the Description View the linked PRs

Description of problem:

Labels added in the Git import flow are not propagated to the pipeline resources when a pipeline is added

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Goto Git Import Form
2. Add Pipeline
3. Add labels
4. Submit the form

Actual results:

The added labels are not propagated to the pipeline resources

Expected results:

The added labels should be added to the pipeline resources

Additional info:

https://github.com/openshift/console/pull/12808

Bug OCPBUGS-18415: Update 4.14 ose-etcd image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/etcd/pull/208

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/etcd/pull/208

Bug OCPBUGS-14149: Failed to list Kepler CSV

View the Description View the linked PRs

Description of problem:

Cannot list Kepler CSV

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Install Kepler Community Operator
2. Create Kepler Instance
3. Console gets error and shows "Oops, something went wrong"

Actual results:

Console gets error and shows "Oops, something went wrong"

Expected results:

Should list Kepler Instance

Additional info:

https://github.com/openshift/console/pull/12866

Bug OCPBUGS-16491: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/233

Task AUTH-373: make oauth-proxy send audit ID with its requests

View the Description View the linked PRs

OAuth-Proxy should send an Audit-Id header with its requests to the kube-apiserver so that we can easily track its requests and be able to tell which arrived and which were processed.

This comes from a time when the CI was in disarray and oauth-proxy requests were failing to reach the KAS but we did not know if at least any were processed or if they were just all plainly rejected somewhere in the middle.

https://github.com/openshift/oauth-proxy/pull/252

Bug MGMT-13284: Assisted service requires BMH CRD without a clear error

View the Description View the linked PRs

Description of the problem:

assisted-service pod crashloops with kube-api enabled without the BMH CRD.

How reproducible:

100%

Steps to reproduce:

1. Deploy assisted-service will kube-api enabled

2. Either don't create or remove the BMH CRD (if removed you will need to restart the assisted-service pod)

3. Observe assisted-service pod

Actual results:

After a few minutes assisted-service will crash with a message like:

time="2023-01-12T14:26:03Z" level=fatal msg="failed to run manager" func=main.main.func1 file="/remote-source/assisted-service/app/cmd/main.go:204" error="failed to wait for baremetal-agent-controller caches to sync: timed out waiting for cache to be synced"

Expected results:

Either assisted service comes up without the BMAC controller and without errors or a clear error stating that the BMH CRD is required and is missing.

https://github.com/openshift/assisted-service/pull/5284

Bug OCPBUGS-13372: Missing error check on sysctl whitelist test

View the Description View the linked PRs

Description of problem:

The test for updating the sysctl whitelist fails to check the error returned when the pod running state is verified.

Test is always passing. We failed to detect a bug in the cluster network operator for the allowlist controller.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27914

Bug OCPBUGS-11190: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/bond-cni/pull/48

Bug OCPBUGS-12132: Update 4.14 ose-cluster-image-registry-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-image-registry-operator/pull/855

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-image-registry-operator/pull/854

Task MON-2641: Write e2e tests for the alertingrules CRD

View the Description View the linked PRs

OCP 4.11 ships the alertingrules CRD as a techpreview feature. Before graduating to GA we need to have e2e tests in the CMO repository.

AC:

End-to-end tests in the CMO repository validating that
- Admins can create/update/delete alertingrules
- ~~Invalid resources are rejected~~ invalid alertingrules don't break the system
Configuration of a blocking job in openshift/release.

https://github.com/openshift/cluster-monitoring-operator/pull/2054

Bug OCPBUGS-12726: Nutanix: MAPI machine-controller fails to handle the windows-user-data

View the Description View the linked PRs

Description of problem:

When running the nutanix-e2e-windows test from the WMCO PR https://github.com/openshift/windows-machine-config-operator/pull/1398, the MAPI nutanix-controller failed to create the Windows machine VM with the below error logs. It failed to marshal the windows-user-data to struct IgnitionConfig, since the windows-user-data is in powershell script format, but not the ignition data format.

I0424 17:37:43.472054       1 recorder.go:103] events "msg"="ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt: reconciler failed to Create machine: failed to get user data: Failed to unmarshal userData to IgnitionConfig. invalid character '<' looking for beginning of value" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt","uid":"d3981cb0-4f98-4424-9252-b100521c2a93","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"31045"} "reason"="FailedCreate" "type"="Warning"
E0424 17:37:43.472923       1 controller.go:329]  "msg"="Reconciler error" "error"="ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt: reconciler failed to Create machine: failed to get user data: Failed to unmarshal userData to IgnitionConfig. invalid character '<' looking for beginning of value" "controller"="machine-controller" "name"="ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt" "namespace"="openshift-machine-api" "object"={"name":"ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt","namespace":"openshift-machine-api"} "reconcileID"="16572b5d-2418-4f7c-b7a8-5f08f2659391"

Version-Release number of selected component (if applicable):

How reproducible:

When the Machine is configured to be Windows node

Steps to Reproduce:

Run the ci/prow/nutanix-e2e-operator test.

Actual results:

The MAPI nutanix-controller failed to create the Windows VM with the error logs showing above.

Expected results:

The Windows VM and node can be successfully created and provisioned.

Additional info:

https://github.com/openshift/machine-api-provider-nutanix/pull/48

Bug OCPBUGS-18604: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13134

Story HOSTEDCP-1032: Add e2e test that ensures we are enforcing restricted pod security admission

View the Description View the linked PRs

From deads2k: I think creating pods that should get rejected in the kube-system namespace would ensure it. OCP-classic is still struggling with customers who did naughty things.

https://github.com/openshift/hypershift/pull/2714

Bug OCPBUGS-14356: Cluster Autoscaler Operator should inject unique labels on Nutanix platform

View the Description View the linked PRs

Description of problem:

There are several labels used by the Nutanix platform which can vary between instances. If not set as ignore labels on the Cluster Autoscaler, features such as balancing similar node groups will not work predictably.

The Cluster Autoscaler Operator should be updated with the following labels on Nutanix:

* nutanix.com/prism-element-name
* nutanix.com/prism-element-uuid
* nutanix.com/prism-host-name
* nutanix.com/prism-host-uuid

for reference see this code: https://github.com/openshift/cluster-autoscaler-operator/blob/release-4.14/pkg/controller/clusterautoscaler/clusterautoscaler.go#L72-L159

Version-Release number of selected component (if applicable):

master, 4.14

How reproducible:

always

Steps to Reproduce:

1. create a ClusterAutoscaler CR on Nutanix platform
2. inspect the deployment for the cluster-autoscaler
3. see that it does not have the ignore labels added as command line flags

Actual results:

labels are not added as flags

Expected results:

labels should be added as flags

Additional info:

this should proabably be backported to 4.13 as well since the labels will be applied by the Nutanix CCM

https://github.com/openshift/cluster-autoscaler-operator/pull/275

Bug OCPBUGS-16072: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/296

Bug OCPBUGS-13228: Update 4.14 atomic-openshift-cluster-autoscaler image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/255

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/255

Bug OCPBUGS-14023: Log vcenter version in raw string format in problem-detector

View the Description View the linked PRs

We should log vcenter version information in plain text.

There are cases in code where vcenter version that we receive from vcenter could become unparseable. I see errors in problem-detector while parsing the version and both CSI driver and operator depends on ability to determine vcenter version.

https://github.com/openshift/vsphere-problem-detector/pull/114

Bug OCPBUGS-14368: [4.14][Azure] Replace master failed as new master did not add into lb backend

View the Description View the linked PRs

A clone of https://issues.redhat.com/browse/OCPBUGS-11143 but for the downstream openshift/cloud-provider-azure

Description of problem:

On azure, delete a master, old machine stuck in Deleting, some pods in cluster are in ImagePullBackOff, check from azure console, new master did not add into lb backend, seems this lead the machine has no internet connection.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-12-024338

How reproducible:

Always

Steps to Reproduce:

1. Set up a cluster on Azure, networkType ovn
2. Delete a master
3. Check master and pod

Actual results:

Old machine stuck in Deleting,  some pods are in ImagePullBackOff.
 $ oc get machine    
NAME                                    PHASE      TYPE              REGION   ZONE   AGE
zhsunaz2132-5ctmh-master-0              Deleting   Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-1              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-2              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-flqqr-0        Running    Standard_D8s_v3   westus          105m
zhsunaz2132-5ctmh-worker-westus-dhwfz   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-dw895   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-xlsgm   Running    Standard_D4s_v3   westus          152m

$ oc describe machine zhsunaz2132-5ctmh-master-flqqr-0  -n openshift-machine-api |grep -i "Load Balancer"
      Internal Load Balancer:  zhsunaz2132-5ctmh-internal
      Public Load Balancer:      zhsunaz2132-5ctmh

$ oc get node            
NAME                                    STATUS     ROLES                  AGE    VERSION
zhsunaz2132-5ctmh-master-0              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-1              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-2              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-flqqr-0        NotReady   control-plane,master   109m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dhwfz   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dw895   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-xlsgm   Ready      worker                 152m   v1.26.0+149fe52
$ oc describe node zhsunaz2132-5ctmh-master-flqqr-0
  Warning  ErrorReconcilingNode       3m5s (x181 over 108m)  controlplane         [k8s.ovn.org/node-chassis-id annotation not found for node zhsunaz2132-5ctmh-master-flqqr-0, macAddress annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0" , k8s.ovn.org/l3-gateway-config annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0"]

$ oc get po --all-namespaces | grep ImagePullBackOf   
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-l8ng4                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-csi-drivers                      azure-file-csi-driver-node-99k82                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-node-tuning-operator             tuned-bvvh7                                                       0/1     ImagePullBackOff        0              113m
openshift-dns                                      node-resolver-2p4zq                                               0/1     ImagePullBackOff        0              113m
openshift-image-registry                           node-ca-vxv87                                                     0/1     ImagePullBackOff        0              113m
openshift-machine-config-operator                  machine-config-daemon-crt5w                                       1/2     ImagePullBackOff        0              113m
openshift-monitoring                               node-exporter-mmjsm                                               0/2     Init:ImagePullBackOff   0              113m
openshift-multus                                   multus-4cg87                                                      0/1     ImagePullBackOff        0              113m
openshift-multus                                   multus-additional-cni-plugins-mc6vx                               0/1     Init:ImagePullBackOff   0              113m
openshift-ovn-kubernetes                           ovnkube-master-qjjsv                                              0/6     ImagePullBackOff        0              113m
openshift-ovn-kubernetes                           ovnkube-node-k8w6j                                                0/6     ImagePullBackOff        0              113m

Expected results:

Replace master successful

Additional info:

Tested payload 4.13.0-0.nightly-2023-02-03-145213, same result.
Before we have tested in 4.13.0-0.nightly-2023-01-27-165107, all works well.

Bug OCPBUGS-14660: Helm Repository "Edit" button results in 404

View the Description View the linked PRs

Description of problem:

Helm view in Dev console doesn't allow you to edit Helm repositories through the three dots menu "Edit option". It results in 404.

Prerequisites (if any, like setup, operators/versions):

Tried in 4.13 only, not sure if other versions are affected

Steps to Reproduce

1. Create a new Helm chart repository (/ns/<NAMESPACE>/helmchartrepositories/~new/form endpoint)
2. List all the custom Helm repositories ( /helm-releases/ns/<NAMESPACE>/repositories endpoint)
3. Click three dots menu on the right of any chart repository and select "Edit ProjectHelmChartRepository" (leads to /k8s/ns/<NAMESPACE>/helmchartrepositories/<REPO_NAME>/edit)
4. You land on 404 page

Actual results:

404 page, see the attached GIF

Expected results:

Edit view

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

Observed in OCP 4.13 (Dev sandbox and OpenShift Local)

Workaround:

Follow steps 1 and 2. from the reproducer above
3. Click on Helm repository name
4. Click YAML tab to edit resource (/k8s/ns/<NAMESPACE>/helm.openshift.io~v1beta1~ProjectHelmChartRepository/<REPO_NAME>/yaml endpoint)

Additional info:

https://github.com/openshift/console/pull/12891

Bug OCPBUGS-18013: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/230

Bug MGMT-14396: Installer binary cache fails with mirrored release image

View the Description View the linked PRs

Description of the problem:

Since ~~MGMT-13083~~ merged, disconnected jobs are failing in the ephemeral installer (specifically e2e-agent-sno-ipv6 and e2e-agent-ha-dualstack). Preparing for installation fails because we can't get the installer binary:

Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=info msg="Successfully extracted openshift-baremetal-install binary from the release to: /data/install-config-generate/installercache/virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install" func="github.com/openshift/assisted-service/internal/oc.(*release).extractFromRelease" file="/src/internal/oc/release.go:376" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 go-id=18956 request_id=
Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=error msg="failed generating install config for cluster a3945e90-44a8-436c-89ad-12d3a5820a26" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).generateClusterInstallConfig" file="/src/internal/bminventory/inventory.go:1738" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 error="failed to get installer path: Failed to create hard link to binary /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install: link /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/ln_1682028043_openshift-baremetal-install: no such file or directory" go-id=18956 pkg=Inventory request_id=
Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=warning msg="Cluster installation initialization failed" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).InstallClusterInternal.func3.1" file="/src/internal/bminventory/inventory.go:1339" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 error="failed generating install config for cluster a3945e90-44a8-436c-89ad-12d3a5820a26: failed to get installer path: Failed to create hard link to binary /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install: link /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/ln_1682028043_openshift-baremetal-install: no such file or directory" go-id=18932 pkg=Inventory request_id=ca799c5a-c798-4a93-9bf8-7f27ed93ca20
Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=warning msg="Failed to prepare installation of cluster a3945e90-44a8-436c-89ad-12d3a5820a26" func="github.com/openshift/assisted-service/internal/cluster.(*Manager).HandlePreInstallError" file="/src/internal/cluster/cluster.go:985" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 error="failed generating install config for cluster a3945e90-44a8-436c-89ad-12d3a5820a26: failed to get installer path: Failed to create hard link to binary /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install: link /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/ln_1682028043_openshift-baremetal-install: no such file or directory" go-id=18956 pkg=cluster-state request_id=

The issue appears to be that we extract the binary to a path including the mirror registry (installercache/virthost.ostest.test.metalkube.org:5000/localimages/local-release-image) but then look for it at a path representing the original pullspec (installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release)

How reproducible:

100%

Steps to reproduce:

1. Use the agent-based installer to install using a disconnected mirror registry in the ImageContentSources.

Actual results:

Installation never starts, we just see a loop of:

evel=debug msg=Host worker-0: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host worker-1: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host master-0: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host master-1: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host master-2: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host worker-0: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host worker-1: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host master-0: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host master-1: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host master-2: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host worker-0: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host worker-1: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host master-0: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host master-1: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host master-2: updated status from preparing-successful to known (Host is ready to be installed)

Expected results:

Cluster is installed.

https://github.com/openshift/assisted-service/pull/5141

Bug OCPBUGS-14403: IngressVIP getting attach to two nodes at once

View the Description View the linked PRs

Description of problem:

IngressVIP is getting attached to two node at once.

Version-Release number of selected component (if applicable):

4.11.39

How reproducible:

Always in customer cluster

Actual results:

IngressVIP is getting attached to two node at once.

Expected results:

IngressVIP should get attach to only one node.

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/257

Bug OCPBUGS-17415: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5417

Bug OCPBUGS-20038: Many SNOs failed to complete install because "the cluster operator cluster-autoscaler is not available"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18954~~. The following is the description of the original issue:
—
Description of problem:

While installing 3618 SNOs via ZTP using ACM 2.9, 15 clusters failed to complete install and have failed on the cluster-autoscaler operator. This represents the bulk of all cluster install failures in this testbed for OCP 4.14.0-rc.0.


# cat aci.InstallationFailed.autoscaler  | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers "
vm00527 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00717 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00881 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00998 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01006 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01059 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01155 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01930 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02407 version         False   True   16h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02651 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03073 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03258 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03295 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03303 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03517 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available

Version-Release number of selected component (if applicable):

Hub 4.13.11
Deployed SNOs 4.14.0-rc.0
ACM 2.9 - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 20 failures (75% of the failures)
15 out of 3618 total attempted SNOs to be installed ~.4% of all installs

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

It appears that some show in the logs of the cluster-autoscaler-operator an error, Example:

I0912 19:54:39.962897       1 main.go:15] Go Version: go1.20.5 X:strictfipsruntime
I0912 19:54:39.962977       1 main.go:16] Go OS/Arch: linux/amd64
I0912 19:54:39.962982       1 main.go:17] Version: cluster-autoscaler-operator v4.14.0-202308301903.p0.gb57f5a9.assembly.stream-dirty
I0912 19:54:39.963137       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I0912 19:54:39.975478       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="127.0.0.1:9191"
I0912 19:54:39.976939       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-clusterautoscalers"
I0912 19:54:39.976984       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-machineautoscalers"
I0912 19:54:39.977082       1 main.go:41] Starting cluster-autoscaler-operator
I0912 19:54:39.977216       1 server.go:216] controller-runtime/webhook/webhooks "msg"="Starting webhook server" 
I0912 19:54:39.977693       1 certwatcher.go:161] controller-runtime/certwatcher "msg"="Updated current TLS certificate" 
I0912 19:54:39.977813       1 server.go:273] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=8443
I0912 19:54:39.977938       1 certwatcher.go:115] controller-runtime/certwatcher "msg"="Starting certificate watcher" 
I0912 19:54:39.978008       1 server.go:50]  "msg"="starting server" "addr"={"IP":"127.0.0.1","Port":9191,"Zone":""} "kind"="metrics" "path"="/metrics"
I0912 19:54:39.978052       1 leaderelection.go:245] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader...
I0912 19:54:39.982052       1 leaderelection.go:255] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader
I0912 19:54:39.983412       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ClusterAutoscaler"
I0912 19:54:39.983462       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Deployment"
I0912 19:54:39.983483       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Service"
I0912 19:54:39.983501       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ServiceMonitor"
I0912 19:54:39.983520       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.PrometheusRule"
I0912 19:54:39.983532       1 controller.go:185]  "msg"="Starting Controller" "controller"="cluster_autoscaler_controller"
I0912 19:54:39.986041       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *v1beta1.MachineAutoscaler"
I0912 19:54:39.986065       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *unstructured.Unstructured"
I0912 19:54:39.986072       1 controller.go:185]  "msg"="Starting Controller" "controller"="machine_autoscaler_controller"
I0912 19:54:40.095808       1 webhookconfig.go:72] Webhook configuration status: created
I0912 19:54:40.101613       1 controller.go:219]  "msg"="Starting workers" "controller"="cluster_autoscaler_controller" "worker count"=1
I0912 19:54:40.102857       1 controller.go:219]  "msg"="Starting workers" "controller"="machine_autoscaler_controller" "worker count"=1
E0912 19:58:48.113290       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": net/http: TLS handshake timeout - error from a previous attempt: unexpected EOF
E0912 20:02:48.135610       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused
E0913 13:49:02.118757       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused

https://github.com/openshift/cluster-autoscaler-operator/pull/292

Bug OCPBUGS-10306: [vSphere]vSphere Upi installation failed due to VMs for master and worker node creation failed.

View the Description View the linked PRs

Description of problem:

Terraform will not create VMs for master and worker for upi vsphere when unset var.control_plane_ip_addresses and var.compute_ip_addresses. When users are using IPAM (as before) to reserve IPs instead of setting static IPs directly into var.control_plane_ip_addresses and var.compute_ip_addresses, Based on upstream code #1 and #2. The count of master and worker is always 0, then terraform will not create any VMs for master and worker nodes. If we changed code as below, it works in IPAM case as before.  
control_plane_fqdns = [for idx in range(length(var.control_plane_ip_addresses)) : "control-plane-${idx}.${var.cluster_domain}"]  
compute_fqdns = [for idx in range(length(var.compute_ip_addresses)) : "compute-${idx}.${var.cluster_domain}"] ==>>
control_plane_fqdns = [for idx in range(var.control_plane_count) : "control-plane-${idx}.${var.cluster_domain}"]
compute_fqdns = [for idx in range(var.compute_count) : "compute-${idx}.${var.cluster_domain}"]

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-11-033820

How reproducible:

always

Steps to Reproduce:

1.Trigger job to install a cluster on vSphere with upi.
2.If the ip applied for master and worker VMs from IPAM server instead of setting the static ip directly into var.control_plane_ip_addresses and var.compute_ip_addresses, the VM creation will fail.

Actual results:

the VM creation will fail

Expected results:

VM creation succeeds.

Additional info:

#1 link:https://github.com/openshift/installer/blob/master/upi/vsphere/main.tf#L15-L16
#2 link:https://github.com/openshift/installer/blob/master/upi/vsphere/main.tf#L211
This bug will only affect UPI vSphere installation when user use IPAM server to reserve static IPs instead of setting static ip directly into var.control_plane_ip_addresses and var.compute_ip_addresses. now it don't affect QE test, because we still install with previous code.

https://github.com/openshift/installer/pull/6999

Bug OCPBUGS-11382: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12655

Bug OCPBUGS-12597: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/138

Bug OCPBUGS-12732: Create BuildConfig button in the Devconsole opens the form in default namespace

View the Description View the linked PRs

Description of problem:

Create BuildConfig button in the Dev console builds opens the form view but in default namespace

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Goto Dev Perspective
2. Click on Builds
3. Click on "Create BuildConfig"

Actual results:

"default" namespace is selected in the namespace selector

Expected results:

It should open the form in the active namespace

Additional info:

https://github.com/openshift/console/pull/12771

Bug OCPBUGS-8691: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

View the Description View the linked PRs

Description of problem:

In hypershift context:
Operands managed by Operators running in the hosted control plane namespace in the management cluster do not honour affinity opinions https://hypershift-docs.netlify.app/how-to/distribute-hosted-cluster-workloads/
https://github.com/openshift/hypershift/blob/main/support/config/deployment.go#L263-L265

These operands running management side should honour the same affinity, tolerations, node selector and priority rules than the operator.
This could be done by looking at the operator deployment itself or at the HCP resource.

aws-ebs-csi-driver-controller
aws-ebs-csi-driver-operator
csi-snapshot-controller
csi-snapshot-webhook

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift cluster.
2. Check affinity rules and node selector of the operands above.
3.

Actual results:

Operands missing affinity rules and node selecto

Expected results:

Operands have same affinity rules and node selector than the operator

Additional info:

Bug OCPBUGS-8752: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-9685: fails to reconcile to RT kernel on interrupted updates

View the Description View the linked PRs

The aggregated https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-rt-upgrade-4.14-minor-release-openshift-release-analysis-aggregator/1633554110798106624 job failed. Digging into one of them:

This MCD log has https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/1633554106595414016/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-daemon-p2vf4_machine-config-daemon.log

Deployments:
* ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                   Digest: sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                  Version: 413.86.202302230536-0 (2023-03-08T20:10:47Z)
      RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-372.43.1.el8_6
          LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules
                           kernel-rt-modules-extra
...
E0308 22:11:21.925030 74176 writer.go:200] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1 : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1: [0m[31merror: [0mImporting: remote error: fetching blob: received unexpected HTTP status: 500 Internal Server Error
... 
I0308 22:11:36.959143   74176 update.go:2010] Running: rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra
...
E0308 22:12:35.525156   74176 writer.go:200] Marking Degraded due to: error running rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra: [0m[31merror: [0mPackage/capability 'kernel-rt-core' is not currently requested
: exit status 1

Something is going wrong here in our retry loop. I think it might be that we don't clear the pending deployment on failure. IOW we need to

rpm-ostree cleanup -p

before we rertry.

This is fallout from https://github.com/openshift/machine-config-operator/pull/3580 - Although I suspect it may have been an issue before too.

https://github.com/openshift/machine-config-operator/pull/3599

Bug OCPBUGS-10950: "pipelines-as-code-pipelinerun-go" configMap is not been used for the Go repository

View the Description View the linked PRs

Description of problem:

"pipelines-as-code-pipelinerun-go" configMap is not been used for the Go repository while creating Pipeline Repository. "pipelines-as-code-pipelinerun-generic" configMap has been used.

Prerequisites (if any, like setup, operators/versions):

Install Red Hat Pipeline operator

Steps to Reproduce

Navigate to Create Repository form
Enter the Git URL `https://github.com/vikram-raj/hello-func-go`
Click on Add

Actual results:

`pipelines-as-code-pipelinerun-generic` PipelineRun template has been shown on the overview page

Expected results:

`pipelines-as-code-pipelinerun-go` PipelineRun template should show on the overview page

Reproducibility (Always/Intermittent/Only Once):

Build Details:

4.13

Workaround:

Additional info:

https://github.com/openshift/console/pull/12682

Bug OCPBUGS-12538: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/101

Bug OCPBUGS-18686: useLabelsModal is not properly exported from dynamic plugin SDK

View the Description View the linked PRs

Description of problem:

We need to export the hook function from the module that's required in the dynamic core api, otherwise an exception will be thrown if the hook is imported/used by plugins.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Plugins using this hook throw an exception.

Expected results:

The hook should be imported and function properly.

Additional info:

https://github.com/openshift/console/pull/13142

Bug OCPBUGS-7675: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/277

Bug OCPBUGS-8707: OVN IPSec - does not create IPSec tunnels

View the Description View the linked PRs

Description of problem:

Enabling IPSec doesn't result in IPsec tunnels being created

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Deploy & Enable IPSec

Steps to Reproduce:

1.
2.
3.

Actual results:

000 Total IPsec connections: loaded 0, active 0
000  
000 State Information: DDoS cookies not required, Accepting new IKE connections
000 IKE SAs: total(0), half-open(0), open(0), authenticated(0), anonymous(0)
000 IPsec SAs: total(0), authenticated(0), anonymous(0)

Expected results:

Active connections > 0

Additional info:

✘-1 ~/code/k8s-netperf [more-meta L|✚ 4…37⚑ 1] 
06:49 $ oc -n openshift-ovn-kubernetes -c nbdb rsh ovnkube-master-qw4zv \ovn-nbctl --no-leader-only get nb_global . ipsec
true

https://github.com/openshift/cluster-network-operator/pull/1727

Bug OCPBUGS-12964: Bootstrap on aws should have same metadata service type as on other nodes

View the Description View the linked PRs

Description of problem:

While installing ocp on aws user can set metadataService auth to Required in order to use IMDSv2, in that case user requires all the vms to use it. 
Currently bootstrap will always run with Optional and this can be blocked on users aws account and will fail the installation process

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Install aws cluster and set metadataService to Required

Steps to Reproduce:

1.
2.
3.

Actual results:

Bootstrap has IMDSv2 set to optional

Expected results:

All vms had IMDSv2 set to required

Additional info:

https://github.com/openshift/installer/pull/7149

Bug OCPBUGS-15012: oc image extract incorrect idms mapping

View the Description View the linked PRs

Description of problem:

Newly introduced `--idms-file` in oc image extract is incorrectly mapped to ICSPFile object instead IDMSFile

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1464

Bug OCPBUGS-6784: SNO cluster deployment failing due to authentication and console CO in degraded state

View the Description View the linked PRs

Description of problem:

SNO installation performed with the assisted-installer failed

Version-Release number of selected component (if applicable):

4.10.32

# oc get co authentication -o yaml
- lastTransitionTime: '2023-01-30T00:51:11Z'
    message: 'IngressStateEndpointsDegraded: No subsets found for the endpoints of
      oauth-server      OAuthServerConfigObservationDegraded: secret "v4-0-config-system-router-certs"
      not found      OAuthServerDeploymentDegraded: 1 of 1 requested instances are unavailable for
      oauth-openshift.openshift-authentication (container is waiting in pending oauth-openshift-58b978d7f8-s6x4b
      pod)      OAuthServerRouteEndpointAccessibleControllerDegraded: secret "v4-0-config-system-router-certs"

# oc logs ingress-operator-xxx-yyy -c ingress-operator 
2023-01-30T08:14:13.701799050Z 2023-01-30T08:14:13.701Z ERROR   operator.certificate_publisher_controller       certificate-publisher/controller.go:80  failed to list ingresscontrollers for secret    {"related": "", "error": "Index with name field:defaultCertificateName does not exist"}

Restarting the ingress-operator pod helped fix the issue, but a permanent fix is required.

The Bug(https://bugzilla.redhat.com/show_bug.cgi?id=2005351) was filed earlier but closed due to inactivity.

https://github.com/openshift/cluster-ingress-operator/pull/913

Bug OCPBUGS-11946: Add new OCP 4.13 storage admission plugin

View the Description View the linked PRs

Description of problem:

Add storage admission plugin "storage.openshift.io/CSIInlineVolumeSecurity"

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster v 4.13
2.Check config map kas-config

Actual results:

The CM does not include "storage.openshift.io/CSIInlineVolumeSecurity" storage plugin

Expected results:

The plugin should be included

Additional info:

https://github.com/openshift/hypershift/pull/2445

Bug OCPBUGS-12092: Update 4.14 ose-openstack-cinder-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/195

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/195

Bug OCPBUGS-9959: Fix cnf compute tests to check scheduler settings under /sys/kernel/debug/sched/

View the Description View the linked PRs

Description of problem:

Fix cnf compute tests to check scheduler settings under /sys/kernel/debug/sched/

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/581

Bug OCPBUGS-12293: Update 4.14 prom-label-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prom-label-proxy/pull/355

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prom-label-proxy/pull/355

Bug OCPBUGS-13392: NetworkPolicyLegacy test failing on kube 1.27 bump

View the Description View the linked PRs

Description of problem:

NetworkPolicyLegacy test timeout on bump PR, the latest is https://github.com/openshift/origin/pull/27912
Job example https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27912/pull-ci-openshift-origin-master-e2e-gcp-ovn/1655997089001246720

Seems like the problem is 15 min timeout, test fails with " Interrupted by User". I think this is change that affected it https://github.com/kubernetes/kubernetes/pull/112923.

From what I saw in the logs, seems like "testCannotConnect" reaches 5 min timeout instead of completing in ~45 sec based on the client pod command. But this is NetworkPolicyLegacy, not sure how much time we want to spend debugging it.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Slack thread https://redhat-internal.slack.com/archives/C04UQLWQAP3/p1683640905643069

https://github.com/openshift/kubernetes/pull/1623

Bug OCPBUGS-19355: topologySpreadConstraints for UWM prometheus-operator does not work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17682~~. The following is the description of the original issue:
—
Description of problem:

since in-cluster prometheus-operator and UWM prometheus-operator pods are scheduled to master nodes, see from

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator/deployment.yaml#L88-L97

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator-user-workload/deployment.yaml#L91-L103

enabled UWM and add topologySpreadConstraints for in-cluster prometheus-operator and UWM prometheus-operator(set topologyKey to node-role.kubernetes.io/master), topologySpreadConstraints takes effect for in-cluster prometheus-operator, but not for UWM prometheus-operator

apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring

in-cluster prometheus-operator, topologySpreadConstraints settings are loaded to prometheus-operator pod and deployment, see

$ oc -n openshift-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
        maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
      volumes:

$ oc -n openshift-monitoring get pod -l app.kubernetes.io/name=prometheus-operator -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP            NODE                                                 NOMINATED NODE   READINESS GATES
prometheus-operator-65496d5b78-fb9nq   2/2     Running   0          105s   10.128.0.71   juzhao-0813-szb9h-master-0.c.openshift-qe.internal   <none>           <none>

$ oc -n openshift-monitoring get pod prometheus-operator-65496d5b78-fb9nq -oyaml | grep topologySpreadConstraints -A7
    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus-operator
      maxSkew: 1
      topologyKey: node-role.kubernetes.io/master
      whenUnsatisfiable: DoNotSchedule
    volumes:

but the topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

$ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-14T08:10:49Z"
  labels:
    app.kubernetes.io/managed-by: cluster-monitoring-operator
    app.kubernetes.io/part-of: openshift-monitoring
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
  resourceVersion: "212490"
  uid: 048f91cb-4da6-4b1b-9e1f-c769096ab88c

$ oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
no result

$ oc -n openshift-user-workload-monitoring get pod -l app.kubernetes.io/name=prometheus-operator
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-77bcdcbd9c-m5x8z   2/2     Running   0          15m

$ oc -n openshift-user-workload-monitoring get pod prometheus-operator-77bcdcbd9c-m5x8z -oyaml | grep topologySpreadConstraints
no result

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

Expected results:

topologySpreadConstraints settings loaded to UWM prometheus-operator pod and deployment

https://github.com/openshift/cluster-monitoring-operator/pull/2087

Bug OCPBUGS-19808: Rollout of ovnk pods is taking more time

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17391~~. The following is the description of the original issue:
—
the pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration job started failing recently when the
ovnkube-master daemonset would not finish rolling out after 360s.

taking the must gather to debug which happens a few minutes after the test
failure you can see that the daemonset is still not ready, so I believe that
increasing the timeout is not the answer.

some debug info:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk get daemonsets -A 
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
openshift-cluster-csi-drivers aws-ebs-csi-driver-node 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-cluster-node-tuning-operator tuned 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns dns-default 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns node-resolver 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-image-registry node-ca 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-ingress-canary ingress-canary 3 3 3 3 3 kubernetes.io/os=linux 8h
openshift-machine-api machine-api-termination-handler 0 0 0 0 0 kubernetes.io/os=linux,machine.openshift.io/interruptible-instance= 8h
openshift-machine-config-operator machine-config-daemon 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-machine-config-operator machine-config-server 3 3 3 3 3 node-role.kubernetes.io/master= 8h
openshift-monitoring node-exporter 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-multus multus 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus multus-additional-cni-plugins 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus network-metrics-daemon 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-network-diagnostics network-check-target 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
openshift-ovn-kubernetes ovnkube-master 3 3 2 2 2 beta.kubernetes.io/os=linux,node-role.kubernetes.io/master= 9h
openshift-ovn-kubernetes ovnkube-node 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
Name: ovnkube-master
Selector: app=ovnkube-master
Node-Selector: beta.kubernetes.io/os=linux,node-role.kubernetes.io/master=
Labels: networkoperator.openshift.io/generates-operator-status=stand-alone
Annotations: deprecated.daemonset.template.generation: 3
kubernetes.io/description: This daemonset launches the ovn-kubernetes controller (master) networking components.
networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
release.openshift.io/version: 4.14.0-0.ci.test-2023-08-04-123014-ci-op-c6fp05f4-latest
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=ovnkube-master
component=network
kubernetes.io/os=linux
openshift.io/component=network
ovn-db-pod=true
type=infra
Annotations: networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
target.workload.openshift.io/management:
{"effect": "PreferredDuringScheduling"}
Service Account: ovn-kubernetes-controller

it seems there is one pod that is not coming up all the way and that pod has
two containers not ready (sbdb and nbdb). logs from those containers below:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk describe pod ovnkube-master-7qlm5 -n openshift-ovn-kubernetes | rg '^ [a-z].*:|Ready'
northd:
Ready: True
nbdb:
Ready: False
kube-rbac-proxy:
Ready: True
sbdb:
Ready: False
ovnkube-master:
Ready: True
ovn-dbchecker:
Ready: True
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c sbdb
2023-08-04T13:08:49.127480354Z + [[ -f /env/_master ]]
2023-08-04T13:08:49.127562165Z + trap quit TERM INT
2023-08-04T13:08:49.127609496Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:49.127637926Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.127637926Z + transport=ssl
2023-08-04T13:08:49.127645167Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:49.127682687Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:49.127690638Z + db=sb
2023-08-04T13:08:49.127690638Z + db_port=9642
2023-08-04T13:08:49.127712038Z + ovn_db_file=/etc/ovn/ovnsb_db.db
2023-08-04T13:08:49.127854181Z + [[ ! ssl:10.0.102.2:9642,ssl:10.0.42.108:9642,ssl:10.0.74.128:9642 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:49.128199437Z ++ bracketify 10.0.42.108
2023-08-04T13:08:49.128237768Z ++ case "$1" in
2023-08-04T13:08:49.128265838Z ++ echo 10.0.42.108
2023-08-04T13:08:49.128493242Z + OVN_ARGS='--db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.128535253Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.128819438Z ++ date -Iseconds
2023-08-04T13:08:49.130157063Z 2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.130170893Z + echo '2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2'
2023-08-04T13:08:49.130170893Z + initialize=false
2023-08-04T13:08:49.130179713Z + [[ ! -e /etc/ovn/ovnsb_db.db ]]
2023-08-04T13:08:49.130318475Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:49.130406657Z + wait 9
2023-08-04T13:08:49.130493659Z + exec /usr/share/ovn/scripts/ovn-ctl -db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_sb_ovsdb
2023-08-04T13:08:49.208399304Z 2023-08-04T13:08:49.208Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-sb.log
2023-08-04T13:08:49.213507987Z ovn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:49.224890005Z 2023-08-04T13:08:49Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:49.224912156Z 2023-08-04T13:08:49Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:49.255474964Z 2023-08-04T13:08:49.255Z|00002|raft|INFO|local server ID is 7f92
2023-08-04T13:08:49.333342909Z 2023-08-04T13:08:49.333Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:49.348948944Z 2023-08-04T13:08:49.348Z|00004|reconnect|INFO|ssl:10.0.102.2:9644: connecting...
2023-08-04T13:08:49.349002565Z 2023-08-04T13:08:49.348Z|00005|reconnect|INFO|ssl:10.0.74.128:9644: connecting...
2023-08-04T13:08:49.352510569Z 2023-08-04T13:08:49.352Z|00006|reconnect|INFO|ssl:10.0.102.2:9644: connected
2023-08-04T13:08:49.353870484Z 2023-08-04T13:08:49.353Z|00007|reconnect|INFO|ssl:10.0.74.128:9644: connected
2023-08-04T13:08:49.889326777Z 2023-08-04T13:08:49.889Z|00008|raft|INFO|server 2501 is leader for term 5
2023-08-04T13:08:49.890316765Z 2023-08-04T13:08:49.890Z|00009|raft|INFO|rejecting append_request because previous entry 5,1538 not in local log (mismatch past end of log)
2023-08-04T13:08:49.891199951Z 2023-08-04T13:08:49.891Z|00010|raft|INFO|rejecting append_request because previous entry 5,1539 not in local log (mismatch past end of log)
2023-08-04T13:08:50.225632838Z 2023-08-04T13:08:50Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:50.225677739Z 2023-08-04T13:08:50Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connected
2023-08-04T13:08:50.227772827Z Waiting for OVN_Southbound to come up.
2023-08-04T13:08:55.716284614Z 2023-08-04T13:08:55.716Z|00011|raft|INFO|ssl:10.0.74.128:43498: learned server ID 3dff
2023-08-04T13:08:55.716323395Z 2023-08-04T13:08:55.716Z|00012|raft|INFO|ssl:10.0.74.128:43498: learned remote address ssl:10.0.74.128:9644
2023-08-04T13:08:55.724570375Z 2023-08-04T13:08:55.724Z|00013|raft|INFO|ssl:10.0.102.2:47804: learned server ID 2501
2023-08-04T13:08:55.724599466Z 2023-08-04T13:08:55.724Z|00014|raft|INFO|ssl:10.0.102.2:47804: learned remote address ssl:10.0.102.2:9644
2023-08-04T13:08:59.348572779Z 2023-08-04T13:08:59.348Z|00015|memory|INFO|32296 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:59.348648190Z 2023-08-04T13:08:59.348Z|00016|memory|INFO|atoms:35959 cells:31476 monitors:0 n-weak-refs:749 raft-connections:4 raft-log:1543 txn-history:100 txn-history-atoms:7100
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c nbdb 
2023-08-04T13:08:48.779743434Z + [[ -f /env/_master ]]
2023-08-04T13:08:48.779743434Z + trap quit TERM INT
2023-08-04T13:08:48.779825516Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:48.779825516Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.779825516Z + transport=ssl
2023-08-04T13:08:48.779825516Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:48.779825516Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:48.779825516Z + db=nb
2023-08-04T13:08:48.779825516Z + db_port=9641
2023-08-04T13:08:48.779825516Z + ovn_db_file=/etc/ovn/ovnnb_db.db
2023-08-04T13:08:48.779887606Z + [[ ! ssl:10.0.102.2:9641,ssl:10.0.42.108:9641,ssl:10.0.74.128:9641 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:48.780159182Z ++ bracketify 10.0.42.108
2023-08-04T13:08:48.780167142Z ++ case "$1" in
2023-08-04T13:08:48.780172102Z ++ echo 10.0.42.108
2023-08-04T13:08:48.780314224Z + OVN_ARGS='--db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.780314224Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:48.780518588Z ++ date -Iseconds
2023-08-04T13:08:48.781738820Z 2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108
2023-08-04T13:08:48.781753021Z + echo '2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108'
2023-08-04T13:08:48.781753021Z + initialize=false
2023-08-04T13:08:48.781753021Z + [[ ! -e /etc/ovn/ovnnb_db.db ]]
2023-08-04T13:08:48.781816342Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:48.781936684Z + wait 9
2023-08-04T13:08:48.781974715Z + exec /usr/share/ovn/scripts/ovn-ctl -db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_nb_ovsdb
2023-08-04T13:08:48.851644059Z 2023-08-04T13:08:48.851Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2023-08-04T13:08:48.852091247Z ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:48.875126148Z 2023-08-04T13:08:48.875Z|00002|raft|INFO|local server ID is c503
2023-08-04T13:08:48.911846610Z 2023-08-04T13:08:48.911Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:48.918864408Z 2023-08-04T13:08:48.918Z|00004|reconnect|INFO|ssl:10.0.102.2:9643: connecting...
2023-08-04T13:08:48.918934490Z 2023-08-04T13:08:48.918Z|00005|reconnect|INFO|ssl:10.0.74.128:9643: connecting...
2023-08-04T13:08:48.923439162Z 2023-08-04T13:08:48.923Z|00006|reconnect|INFO|ssl:10.0.102.2:9643: connected
2023-08-04T13:08:48.925166154Z 2023-08-04T13:08:48.925Z|00007|reconnect|INFO|ssl:10.0.74.128:9643: connected
2023-08-04T13:08:49.861650961Z 2023-08-04T13:08:49Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:49.861747153Z 2023-08-04T13:08:49Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connected
2023-08-04T13:08:49.875272530Z 2023-08-04T13:08:49.875Z|00008|raft|INFO|server fccb is leader for term 6
2023-08-04T13:08:49.875302480Z 2023-08-04T13:08:49.875Z|00009|raft|INFO|rejecting append_request because previous entry 6,1732 not in local log (mismatch past end of log)
2023-08-04T13:08:49.876027164Z Waiting for OVN_Northbound to come up.
2023-08-04T13:08:55.694760761Z 2023-08-04T13:08:55.694Z|00010|raft|INFO|ssl:10.0.74.128:57122: learned server ID d382
2023-08-04T13:08:55.694800872Z 2023-08-04T13:08:55.694Z|00011|raft|INFO|ssl:10.0.74.128:57122: learned remote address ssl:10.0.74.128:9643
2023-08-04T13:08:55.706904913Z 2023-08-04T13:08:55.706Z|00012|raft|INFO|ssl:10.0.102.2:43230: learned server ID fccb
2023-08-04T13:08:55.706931733Z 2023-08-04T13:08:55.706Z|00013|raft|INFO|ssl:10.0.102.2:43230: learned remote address ssl:10.0.102.2:9643
2023-08-04T13:08:58.919567770Z 2023-08-04T13:08:58.919Z|00014|memory|INFO|21944 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:58.919643762Z 2023-08-04T13:08:58.919Z|00015|memory|INFO|atoms:8471 cells:7481 monitors:0 n-weak-refs:200 raft-connections:4 raft-log:1737 txn-history:72 txn-history-atoms:8165
➜ static-kas git:(master)

This seems to happen very frequently now, but was not happening before around July 21st.

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration?buildId=1684628739427667968

https://github.com/openshift/cluster-network-operator/pull/2036

Bug OCPBUGS-7559: Newly provisioned machines unable to join cluster

View the Description View the linked PRs

Description of problem:

When attempting to add nodes to a long-lived 4.12.3 cluster, net new nodes are not able to join the cluster. They are provisioned in the cloud provider (AWS), but never actually join as a node.

Version-Release number of selected component (if applicable):

4.12.3

How reproducible:

Consistent

Steps to Reproduce:

1. On a long lived cluster, add a new machineset

Actual results:

Machines reach "Provisioned" but don't join the cluster

Expected results:

Machines join cluster as nodes

Additional info:

https://github.com/openshift/machine-config-operator/pull/3585

Task AGENT-692: Remove dependency on assisted-service

View the Description View the linked PRs

Currently, the installer has a dependency on the main assisted-service go module. This means that we pull in all of it's dependencies, which include libnmstate (the Rust one). In practice, this means that we can't update assisted-service at least until ~~AGENT-139~~ is implemented. And since the main assisted-service module and the API module should be in lockstep, this means we can't update to pick up recent changes to the ZTP API either.

https://github.com/openshift/installer/pull/7439

Bug OCPBUGS-10105: Update 4.14 ose-cluster-autoscaler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/271

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/271

Bug OCPBUGS-14964: [CI Watcher] monitoring.scenario.ts tests failing

View the Description View the linked PRs

Description of problem:

The following test case is failing: 
Error: Timeout - Async callback was not invoked within timeout specified by jasmine.DEFAULT_TIMEOUT_INTERVAL. exception Error: Timeout - Async callback was not invoked within timeout specified by jasmine.DEFAULT_TIMEOUT_INTERVAL.

Tests scenario is failing with an 85% failure rate:

https://search.ci.openshift.org/?search=Alertmanager&maxAge=48h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_console/12892/pull-ci-openshift-console-master-e2e-gcp-console/1668916100596764672
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_console/12892/pull-ci-openshift-console-master-e2e-gcp-console/1668916100596764672/artifacts/e2e-gcp-console/test/artifacts/gui_test_screenshots/c8b0a6b0614b41eee9ea123ffe9a3bea.png

https://github.com/openshift/console/pull/12902

Bug OCPBUGS-15989: OpenShift 4.12.18 install fails with Tigera Calico v3.16

View the Description View the linked PRs

Description of problem:

We have OCP 4:10 installed along with Tigera 3.13 with no issues. We could also update OCP to 4:11 and 4:12 along with Tigera upgrade to 3.15 and 3.16. The upgrade works with no issue. The problem appears when we install Tigera 3.16 along with OCP 4.12. (fresh install)
Tigera support says OCP install parameters need to be updated to accommodate new Tigera updates. Its either in the Terraform Plug-in or file called main.tf that need update. 
Please engage someone from RedHat OCP engineering team.

Ref doc: https://access.redhat.com/solutions/6980264

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

install Tigera 3.16 along with OCP 4.12. (fresh install)

Actual results:

Installation fails with the error: "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5330750 vs. 4194304)"

Expected results:

Just like 4.10, 4.12 installation should work with Tigera calico

Additional info:

https://github.com/openshift/installer/pull/7354

Bug OCPBUGS-16207: [CORS-2602]Installer should check whether the specified custom security groups exceeded the maximum number allowed

View the Description View the linked PRs

Description of problem:

According to https://docs.aws.amazon.com/vpc/latest/userguide/amazon-vpc-limits.html, the default Security groups number per network interface is 5 and could be 16 at most, so we better to have some pre-check on the number of provided custom security groups.

When it's more than 15(since the maximum is 16, but installer will also create one ${var.cluster_id}-master-sg/${var.cluster_id}-worker-sg), installer should quit and warn user about this.

Version-Release number of selected component (if applicable):

registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-07-11-092038

How reproducible:

Always

Steps to Reproduce:

1. Set 16 Security groups IDs in compute.platform.aws.additionalSecurityGroupIDs

  compute:
 - architecture: amd64
   hyperthreading: Enabled
   name: worker
   platform:
     aws:
       additionalSecurityGroupIDs:
       - sg-06e63a6ad731c10cc
       - sg-054614d4f4eb5751d
       - sg-05c4fe202c8e2c28c
       - sg-0c948fa8b85bf4af1
       - sg-0cfb0c91c0b48f0de
       - sg-0eff6077ca727c921
       - sg-0d2d1f41f1ac9801c
       - sg-047c67d5decb64563
       - sg-0ee63f164c0ab8b04
       - sg-033ff80fa12e43c7f
       - sg-0ccad43754d9652cd
       - sg-04e4cbca2b5d50c3a
       - sg-0d133411fdcb0a4e0
       - sg-0b2b0e0d515b2f561
       - sg-045fde620b3e702da
       - sg-07e0493a65749973c
   replicas: 3

2. The installation failed due to workers couldn't be provisioned.

Actual results:

[root@preserve-gpei-worker k_files]# oc get machines -A
NAMESPACE               NAME                                       PHASE     TYPE         REGION      ZONE         AGE
openshift-machine-api   gpei-0613g-wp7zw-master-0                  Running   m6i.xlarge   us-west-2   us-west-2a   66m
openshift-machine-api   gpei-0613g-wp7zw-master-1                  Running   m6i.xlarge   us-west-2   us-west-2b   66m
openshift-machine-api   gpei-0613g-wp7zw-master-2                  Running   m6i.xlarge   us-west-2   us-west-2a   66m
openshift-machine-api   gpei-0613g-wp7zw-worker-us-west-2a-7rszc   Failed                                          62m
openshift-machine-api   gpei-0613g-wp7zw-worker-us-west-2a-pwnvp   Failed                                          62m
openshift-machine-api   gpei-0613g-wp7zw-worker-us-west-2b-n2cs9   Failed                                          62m
[root@preserve-gpei-worker k_files]# oc describe machine gpei-0613g-wp7zw-worker-us-west-2b-n2cs9 -n openshift-machine-api
Name:         gpei-0613g-wp7zw-worker-us-west-2b-n2cs9
..
Spec:
  Lifecycle Hooks:
  Metadata:
  Provider Spec:
    Value:
      Ami:
        Id:         ami-01bfc200595c748a1
      API Version:  machine.openshift.io/v1beta1
      Block Devices:
        Ebs:
      Metadata Service Options:
      Placement:
        Availability Zone:  us-west-2b
        Region:             us-west-2
      Security Groups:
        Filters:
          Name:  tag:Name
          Values:
            gpei-0613g-wp7zw-worker-sg
        Id:  sg-033ff80fa12e43c7f
        Id:  sg-045fde620b3e702da
        Id:  sg-047c67d5decb64563
        Id:  sg-04e4cbca2b5d50c3a
        Id:  sg-054614d4f4eb5751d
        Id:  sg-05c4fe202c8e2c28c
        Id:  sg-06e63a6ad731c10cc
        Id:  sg-07e0493a65749973c
        Id:  sg-0b2b0e0d515b2f561
        Id:  sg-0c948fa8b85bf4af1
        Id:  sg-0ccad43754d9652cd
        Id:  sg-0cfb0c91c0b48f0de
        Id:  sg-0d133411fdcb0a4e0
        Id:  sg-0d2d1f41f1ac9801c
        Id:  sg-0ee63f164c0ab8b04
        Id:  sg-0eff6077ca727c921
      Subnet:
        Id:  subnet-0641814f00311bd9c
      Tags:
        Name:   kubernetes.io/cluster/gpei-0613g-wp7zw
        Value:  owned
      User Data Secret:
        Name:  worker-user-data
Status:
  Conditions:
    Last Transition Time:  2023-07-13T09:58:02Z
    Status:                True
    Type:                  Drainable
    Last Transition Time:  2023-07-13T09:58:02Z
    Message:               Instance has not been created
    Reason:                InstanceNotCreated
    Severity:              Warning
    Status:                False
    Type:                  InstanceExists
    Last Transition Time:  2023-07-13T09:58:02Z
    Status:                True
    Type:                  Terminable
  Error Message:           error launching instance: You have exceeded the maximum number of security groups allowed per network interface.

Expected results:

Installer could abort and prompt the provided custom security group number exceeded the maximum number allowed.

Additional info:

https://github.com/openshift/installer/pull/7345

Story TRT-856: Write test to detect overlap between DNS lookup and real disruption

View the Description View the linked PRs

Related to ~~TRT-849~~, we want to write a test to see how often this is happening before we undertake a major effort to get to the bottom of it.

The test will need to process disruption across all backends, look for DNS lookup disruptions, and then see if we have overlap with non-DNS lookup disruptions within those timeframes.

We have some precedent for similar code in KubePodNotReady alerts that we handle differently if in proximity to other intervals.

The test should flake, we can then see how often it's happening in sippy and on what platforms. With sql we could likely pinpoint to certain build clusters as well.

https://github.com/openshift/origin/pull/27826

Bug OCPBUGS-13140: Maximum Number Of Egress IPs Supported

View the Description View the linked PRs

Description of problem:

According to the Red Hat documentation https://docs.openshift.com/container-platform/4.12/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.html, the maximum number of IP aliases per node is 10 - "Per node, the maximum number of IP aliases, both IPv4 and IPv6, is 10.".

Looking at the code base, the number of allowed IPs is calculated as
Capacity = defaultGCPPrivateIPCapacity (which is set to 10) + cloudPrivateIPsCount (that is number of available IPs from the range) - currentIPv4Usage (number of assigned v4 IPs) - currentIPv6Usage (number of assigned v6 IPs)
https://github.com/openshift/cloud-network-config-controller/blob/master/pkg/cloudprovider/gcp.go#L18-L22

Speaking to GCP, they support up to 100 alias IP ranges (not IPs) per vNIC.

Can Red Hat confirm
1) If there is a limitation of 10 from OCP and why?
2) If there isn't a limit, what is the maximum number of egress IPs that could be supported per node?

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Case: 03487893
It is one of the most highlighted bug from our customer.

https://github.com/openshift/cloud-network-config-controller/pull/110

Bug OCPBUGS-19535: machine-config-operator does not honor ICSP when fetching machine-os-content

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13044~~. The following is the description of the original issue:
—
Description of problem:

During cluster installations/upgrades with an imageContentSourcePolicy in place but with access to quay.io, the ICSP is not honored to pull the machine-os-content image from a private registry.

Version-Release number of selected component (if applicable):

$ oc logs -n openshift-machine-config-operator ds/machine-config-daemon -c machine-config-daemon|head -1
Found 6 pods, using pod/machine-config-daemon-znknf
I0503 10:53:00.925942    2377 start.go:112] Version: v4.12.0-202304070941.p0.g87fedee.assembly.stream-dirty (87fedee690ae487f8ae044ac416000172c9576a5)

How reproducible:

100% in clusters with ICSP configured BUT with access to quay.io

Steps to Reproduce:

1. Create mirror repo:
$ cat <<EOF > /tmp/isc.yaml                                                    
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 4
storageConfig:
  registry:
    imageURL: quay.example.com/mirror/oc-mirror-metadata
    skipTLS: true
mirror:
  platform:
    channels:
    - name: stable-4.12
      type: ocp
      minVersion: 4.12.13
    graph: true
EOF
$ oc mirror --dest-skip-tls  --config=/tmp/isc.yaml docker://quay.example.com/mirror/oc-mirror-metadata
<...>
info: Mirroring completed in 2m27.91s (138.6MB/s)
Writing image mapping to oc-mirror-workspace/results-1683104229/mapping.txt
Writing UpdateService manifests to oc-mirror-workspace/results-1683104229
Writing ICSP manifests to oc-mirror-workspace/results-1683104229

2. Confirm machine-os-content digest:
$ oc adm release info 4.12.13 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a1660c8086ff85e569e10b3bc9db344e1e1f7530581d742ad98b670a81477b1b"
}
$ oc adm release info 4.12.14 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ed68d04d720a83366626a11297a4f3c5761c0b44d02ef66fe4cbcc70a6854563"
}

3. Create 4.12.13 cluster with ICSP at install time:
$ grep imageContentSources -A6 ./install-config.yaml
imageContentSources:
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release-images
    source: quay.io/openshift-release-dev/ocp-release

Actual results:

1. After the installation is completed, no pulls for a166 (4.12.13-x86_64-machine-os-content) are logged in the Quay usage logs whereas e.g. digest 22d2 (4.12.13-x86_64-machine-os-images) are reported to be pulled from the mirror. 

2. After upgrading to 4.12.14 no pulls for ed68 (4.12.14-x86_64-machine-os-content) are logged in the mirror-registry while the image was pulled as part of `oc image extract` in the machine-config-daemon:

[core@master-1 ~]$ sudo less /var/log/pods/openshift-machine-config-operator_machine-config-daemon-7fnjz_e2a3de54-1355-44f9-a516-2f89d6c6ab8f/machine-config-daemon/0.log                        2023-05-03T10:51:43.308996195+00:00 stderr F I0503 10:51:43.308932   11290 run.go:19] Running: nice -- ionice -c 3 oc image extract -v 10 --path /:/run/mco-extensions/os-extensions-content-4035545447 --registry- config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad48fe01f3e82584197797ce2151eecdfdcce67ae1096f06412e5ace416f66ce 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418008  184455 client_mirrored.go:174] Attempting to connect to quay.io/openshift-release-dev/ocp-v4.0-art-dev 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418174  184455 round_trippers.go:466] curl -v -XGET  -H "User-Agent: oc/4.12.0 (linux/amd64) kubernetes/31aa3e8" 'https://quay.io/v2/' 2023-05-03T10:51:43.419618513+00:00 stderr F I0503 10:51:43.419517  184455 round_trippers.go:495] HTTP Trace: DNS Lookup for quay.io resolved to [{34.206.15.82 } {54.209.210.231 } {52.5.187.29 } {52.3.168.193 }  {52.21.36.23 } {50.17.122.58 } {44.194.68.221 } {34.194.241.136 } {2600:1f18:483:cf01:ebba:a861:1150:e245 } {2600:1f18:483:cf02:40f9:477f:ea6b:8a2b } {2600:1f18:483:cf02:8601:2257:9919:cd9e } {2600:1f18:483:cf01 :8212:fcdc:2a2a:50a7 } {2600:1f18:483:cf00:915d:9d2f:fc1f:40a7 } {2600:1f18:483:cf02:7a8b:1901:f1cf:3ab3 } {2600:1f18:483:cf00:27e2:dfeb:a6c7:c4db } {2600:1f18:483:cf01:ca3f:d96e:196c:7867 }] 2023-05-03T10:51:43.429298245+00:00 stderr F I0503 10:51:43.429151  184455 round_trippers.go:510] HTTP Trace: Dial to tcp:34.206.15.82:443 succeed

Expected results:

All images are pulled from the location as configured in the ICSP.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3932

Bug OCPBUGS-10433: multus-admission-controller does not have correct RollingUpdate parameterts when running under Hypershift

View the Description View the linked PRs

Description of problem:

When CNO is managed by Hypershift multus-admission-controller does not have correct RollingUpdate parameterts meeting Hypershift requirements outligned here: https://github.com/openshift/hypershift/blob/646bcef53e4ecb9ec01a05408bb2da8ffd832a14/support/config/deployment.go#L81
```
There are two standard cases currently with hypershift: HA mode where there are 3 replicas spread across zones and then non ha with one replica. When only 3 zones are available you need to be able to set maxUnavailable in order to progress the rollout. However, you do not want to set that in the single replica case because it will result in downtime.
```
So when multus-admission-controller has more than one replica the RollingUpdate parameters should be
```
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
```

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Check rolling update parameters of multus-admission-controller

Actual results:

the operator has default parameters: {"rollingUpdate":{"maxSurge":"25%","maxUnavailable":"25%"},"type":"RollingUpdate"}

Expected results:

{"rollingUpdate":{"maxSurge":0,"maxUnavailable":1},"type":"RollingUpdate"}

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1740

Spike OCPCLOUD-2016: CPMS: Investigate a way to surface the desired vs current ProviderSpec diff

View the Description View the linked PRs

User Story

As a user I want to see what differs between the Machine's (current) ProviderSpec and the Control Plane Machine Set (desired) ProviderSpec so that I can understand why the CPMSO is replacing my control plane machine.

Background

Work spawn out of discussions in https://redhat-internal.slack.com/archives/CCX9DB894/p1678820665803259 and https://redhat-internal.slack.com/archives/C04UB95G802

Believe we are already logging this, would be good to emit either an event or the diff into the status, whoever takes this card should investigate the best way of surfacing this.

Outcome:

Decision on event/status/both
If status, API design scoped out
Cards written for implementation

Steps

PR
update tests

Stakeholders

<Who is interested in this/where did they request this>

Definition of Done

<Add items that need to be completed for this card>

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/180

Bug OCPBUGS-12133: Update 4.14 ose-cluster-kube-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/726

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/727

Bug OCPBUGS-12343: Update 4.14 cluster-monitoring-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-monitoring-operator/pull/1952

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-monitoring-operator/pull/1952

Bug OCPBUGS-17810: MCO API erroneously contains certificate dates typed/formatted as strings, violates API conformance

View the Description View the linked PRs

Description of problem:

The MCO's "Certificate Observability" CRD fields (introduced in MCO-607) are non-RFC3339 formatted strings and are unparseable as the API standard metav1.Time

For context, the MCO is currently migrating its API to openshift/api where it needs to comply with API standards, and if these strings are still present in the API when 4.14 ships, we will be unable to upgrade from the shipping version to the one where the API has migrated, so we need to adjust this now before it ships.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.Create a cluster
2.Observe ControllerConfig status.controllerCertificates
3.Observe MachineConfigPool status.certExpirys

Actual results:

Types are wrong, and strings are formatted thusly: 2033-08-12 01:47:54 +0000 UTC

Expected results:

ControllerConfig and MachineConfigPools do not contain certificate observability fields formatted as "2033-08-12 01:47:54 +0000 UTC".

Either contain certificate observability fields formatted as "2006-01-02T15:04:05Z07:00" or should not contain them at all.

Additional info:

If we ship 4.14 with these strings how they are, we will be stuck like that and unable to easily upgrade out of it (because the new MCO that regards the fields as metav1.Time will be unable to parse the old strings), e.g.

2023-08-15T05:03:40.989575279Z W0815 05:03:40.989527 1 reflector.go:533] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: failed to list *v1.MachineConfigPool: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T" 2023-08-15T05:03:40.989575279Z E0815 05:03:40.989555 1 reflector.go:148] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfigPool: failed to list *v1.MachineConfigPool: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T" 2023-08-15T05:04:05.304139210Z W0815 05:04:05.304088 1 reflector.go:533] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: failed to list *v1.ControllerConfig: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T" 2023-08-15T05:04:05.304139210Z E0815 05:04:05.304121 1 reflector.go:148] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: Failed to watch *v1.ControllerConfig: failed to list *v1.ControllerConfig: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T"

https://github.com/openshift/machine-config-operator/pull/3866

Task HOSTEDCP-1156: Add Note to Upstream Docs about Defaulting Webhook and Release Image Flag

View the Description View the linked PRs

https://github.com/openshift/hypershift/pull/2892#issuecomment-1673193901

https://github.com/openshift/hypershift/pull/2922

Task HOSTEDCP-1185: Add flag to hypershift create infra aws command to create a single nat gateway

View the Description View the linked PRs

Allow creating a single NAT gateway for a multi-zone hosted cluster. The route table in other zones should point to the one NAT gateway.

This allows running a cluster in multiple zones with a single NAT gateway which can be expensive to run in AWS.

https://github.com/openshift/hypershift/pull/2984

Bug OCPBUGS-10120: Update 4.14 ose-alibaba-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-alibaba-cloud/pull/30

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/30

Bug OCPBUGS-13579: Rebase components to k8s v0.27.*

View the Description View the linked PRs

Some repositories require bugzilla/valid-bug label present. Complement to https://issues.redhat.com/browse/WRKLDS-700.

Bug OCPBUGS-13860: ControllerConfig fails to sync after Infrastructure (and most likely DNS) embedded objects are updated

View the Description View the linked PRs

Description of problem:

ControllerConfig renders properly until Infrastructure object changes, then:
- 'Kind' and 'APIVersion' are no longer present on the object resulting from a "get" for that object via the lister and
- as a result, the embedded dns and infrastructure objects in ControllerConfig fail to validate 
- this results in ControllerConfig failing to sync

Version-Release number of selected component (if applicable):

4.14 machine-config-operator

How reproducible:

I can reproduce it every time

Steps to Reproduce:

1.Build a 4.14 cluster
2.Update Infrastructure non-destructively, e.g.: oc annotate infrastructure cluster break.the.mco=yep
3.Watch the machine-config-operator pod logs (or oc get co, the error will propagate) to see the validation errors for the new controllerconfig

Actual results:

2023-05-17T20:45:04.627320107Z I0517 20:45:04.627281       1 event.go:285] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"d52d09f4-f7bb-497a-a5c3-92861aa6796f", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OperatorDegraded: MachineConfigControllerFailed' Failed to resync 4.14.0-0.ci.test-2023-05-17-193937-ci-op-dcrr8kjq-latest because: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.infra.apiVersion: Required value: must not be empty, spec.infra.kind: Required value: must not be empty, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

Expected results:

machine-config-operator quietly syncs controllerconfig :)

Additional info:

The MCO itself is not doing this. It's not part of resourcemerge or anything like that. It's happening "below" us. 

The short version here is that when using a typed client, the group,version,kind (GVK) gets stripped during decoding because it's redundant (you already know the type). For "top level" objects, it gets put back during an update request automatically, but it doesn't recurse into embedded objects (which Infrastructure and DNS are). So we end up with embedded objects that are missing explicit GVKs and won't validate. 

Why does it only happen after the objects change? We're using a lister, and the lister's "strip-on-decode" behavior seems a little inconsistent. Sometimes the GVK is populated. If you use a direct client "get", the GVK will never be populated. 

There is a lot of history on this behavior, it won't be changed any time soon, here are some entry points: 
- https://github.com/kubernetes/kubernetes/pull/63972
- https://github.com/kubernetes/kubernetes/issues/80609

https://github.com/openshift/machine-config-operator/pull/3713

Bug OCPBUGS-10032: upgrade test failure with "Cluster operator control-plane-machine-set is not available"

View the Description View the linked PRs

Description of problem:

test "operator conditions control-plane-machine-set" fails https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216
control-plane-machine-set operator is Unavailable, because it doesn't reconcile node events. If a node becomes ready later than the referencing Machine, Node update event will not trigger reconciliation.

Version-Release number of selected component (if applicable):

How reproducible:

depends on the sequence of Node vs Machine events

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

operator logs 
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/pods/openshift-machine-api_control-plane-machine-set-operator-5d5848c465-g4q2p_control-plane-machine-set-operator.log

machines 
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/machines.json

nodes 
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/nodes.json

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/177

Bug OCPBUGS-10080: Update 4.14 ose-cluster-dns-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/357

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/357

Bug OCPBUGS-18719: vsphere IPI: missing guestinfo.domain in bootstrap VM

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18304~~. The following is the description of the original issue:
—
Description of problem:

https://github.com/openshift/installer/pull/6770 reverted part of https://github.com/openshift/installer/pull/5788 which has set guestinfo.domain for bootstrap machine. This breaks some OKD installations, which require that setting

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7477

Bug OCPBUGS-11102: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7091

Bug OCPBUGS-15723: Hypershift NodePool AllMachinesReady and AllNodesHealthy should have message conditions ordered

View the Description View the linked PRs

Description of problem:


NodePool conditions AllMachinesReady and AllNodesHealthy are used by Cluster Service to detect problems on customer nodes.

Everytime a NodePool is updated, it triggers an update in a ManifestWork that is being processed by CS to build a user message about why a specific machinepool/nodepool is not healthy.

The lack of a sorted message when there are more than one machines creates a bug that the NodePool is updated multiple time, when the state is the same.

For example, CS may capture scenarios like this and consider them like the change is the same.

Machine rosa-vws58-workshop-69b55d58b-mq44p: UnhealthyNode
Machine rosa-vws58-workshop-69b55d58b-97n47: UnhealthyNode ,
Machine rosa-vws58-workshop-69b55d58b-mq44p: NodeConditionsFailed
Machine rosa-vws58-workshop-69b55d58b-97n47: Deleting ,

Machine rosa-vws58-workshop-69b55d58b-97n47: UnhealthyNode
Machine rosa-vws58-workshop-69b55d58b-mq44p: UnhealthyNode ,
Machine rosa-vws58-workshop-69b55d58b-97n47: Deleting
Machine rosa-vws58-workshop-69b55d58b-mq44p: NodeConditionsFailed ,

Machine rosa-vws58-workshop-69b55d58b-mq44p: UnhealthyNode
Machine rosa-vws58-workshop-69b55d58b-97n47: UnhealthyNode ,
Machine rosa-vws58-workshop-69b55d58b-mq44p: NodeConditionsFailed
Machine rosa-vws58-workshop-69b55d58b-97n47: Deleting ,

Expected results:


The HyperShift Operator should sort the messages where multiple machines/nodes are invovled:

https://github.com/openshift/hypershift/blob/86af31a5a5cdee3da0d7f65f3bd550f4ec9cac55/hypershift-operator/controllers/nodepool/nodepool_controller.go#L2509

https://github.com/openshift/hypershift/pull/2766

Bug OCPBUGS-17653: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/505

Bug OCPBUGS-12891: TypeError on operand creation page

View the Description View the linked PRs

Description of problem:

we can see TypeErrors on operand creation page

Version-Release number of selected component (if applicable):

cluster-bot cluster 
launch 4.14-ci,openshift/console#12525

How reproducible:

Always

Steps to Reproduce:

1. create mock CRD and CSV files into project 'test'
$ oc project test
$ oc apply -f mock-crd-and-csv.yaml 
customresourcedefinition.apiextensions.k8s.io/mock-k8s-dropdown-resources.test.tectonic.com created
clusterserviceversion.operators.coreos.com/mock-k8s-resource-dropdown-operator created
2. Goes to CR creation page Operators -> Installed Operators -> Mock K8sResourcePrefixOperator -> Mock Resource tab -> click on 'Create MockK8sDropdownResource' button

Actual results:

2. we can see errors

Description:
e is undefined

Component trace: 
g@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:17026
v@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:54359
div
N@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:173048
R@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:173543
_@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:20749
10807/t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:145
4156/t.default@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:22586
s@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:223444
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:69403
T
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:71448
Suspense
i@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:435931
section
m@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:170312
div
div
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1501506
div
div
c@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:699298
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:219161
div
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:89596
l@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1151500
H<@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:442786
S@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:87:86675
main
div
v@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:466912
div
div
c@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:311348
div
div
c@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:699298
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:219161
div
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:89596
Jn@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:36:185686
t.default@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:854425
5404/t.default@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/quick-start-chunk-0b68859d1eaa39849249.min.js:1:1264
s@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:223444
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1581508
ee@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599747
St@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:36:142700
ee@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599747
ee@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599747
i@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:809765
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1575685
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1575874
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1573290
te@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599889
ne<@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1603021
r@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:36:122338
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:69403
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:71448
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:66008
re@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1603332
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:783751
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1084331
s@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:635039
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:135:257437
Suspense

Expected results:

2. operand creation form/yaml page should be loaded successfully

Additional info:

mock-crd-and-csv.yaml and screenshot are at https://drive.google.com/drive/folders/1Z432vVMArHLgCgzu5IMGi9_oq3iRtezx

https://github.com/openshift/console/pull/12887

Bug OCPBUGS-13975: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/546

Bug OCPBUGS-16077: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5371

Bug OCPBUGS-17981: Remove DeploymentConfig, Build and BuildConfig sections from navigation and use flags so they can be enabled by cluster admins

View the Description View the linked PRs

There is a workloads change, which is introducing DeploymentConfigs and Builds API as a capabilities, which gives the cluster admin option to enable/disable each of their API.

In case the DeploymentConfigs capability is disabled we should remove the `Deployment Config` subsection from `Workloads` nav section.

In case the Builds capability is disabled we should remove the `Builds` and `Build Configs` subsection from `Workloads` nav section.

Use console extensions to check for if DeploymentConfigs, Builds and BuildConfigs are available on the cluster and set appropriate flags which will serve as requirement for console.navigation/resource-ns and console.action/resource-provider

https://github.com/openshift/console/pull/13089

Bug OCPBUGS-19878: TaskRun duration chart legend shows only 4 taskruns

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-7893~~. The following is the description of the original issue:
—
Description of problem:
The TaskRun duration diagram on the "Metrics" tab of pipeline is set to only show 4 TaskRuns in the legend regardless of the number of TaskRuns on the diagram.

Expected results:

All TaskRuns should be displayed in the legend.

https://github.com/openshift/console/pull/13202

Bug OCPBUGS-10148: Update 4.14 vmware-vsphere-syncer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/61

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/63

Bug OCPBUGS-16019: Multiple cni-sysctl-allowlist-ds were created in openshift-multus namespace

View the Description View the linked PRs

Description of problem:

Hello, one of our customers had several cni-sysctl-allowlist-ds created (around 10.000 pods) in openshift-multus namespace. That caused several issues in the cluster, as nodes were full of pods an run out of IPs.

After deleting them, the situation has improved. But we want to know the root cause of this issue.

Searching in the network-operator pod logs, it seems that the customer faced some networking issues. After this issue, we can see that the cni-sysctl-allowlist pods started to be created.

Could we know why the cni-sysctl-allowlist-ds pods were created?

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1904

Bug OCPBUGS-10379: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3623

Bug OCPBUGS-17059: pod/importer-prime-xxx can't to be ready for HyperShift KubeVirt

View the Description View the linked PRs

Description of problem:

Unable to successfully create HyperShift KubeVirt HostedCluster on BM, control plane's pod/importer-prime-xxx can's be ready

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. HyperShift install operator
2. HyperShift create cluster KubeVirt xxx

Actual results:

➜  oc get pod -n clusters-3d9ec3c7e495f1c58da1  | grep "importer-prime"
importer-prime-90175dc9-21bf-4f13-a021-6c42a2e19652   1/2     Error              16 (5m13s ago)   57m
importer-prime-9f153661-1c2c-4b61-84fd-0a2d83f30699   1/2     Error              16 (5m4s ago)    57m
importer-prime-cb817383-58bd-4480-a7e1-49ae42368cae   1/2     CrashLoopBackOff   15 (4m51s ago)   57m

➜  oc logs importer-prime-90175dc9-21bf-4f13-a021-6c42a2e19652 -c importer -n clusters-3d9ec3c7e495f1c58da1

I0728 18:41:20.106447       1 importer.go:103] Starting importer
E0728 18:41:20.107346       1 importer.go:133] exit status 1, blockdev: cannot open /dev/cdi-block-volume: Permission denied

kubevirt.io/containerized-data-importer/pkg/util.GetAvailableSpaceBlock
        /remote-source/app/pkg/util/util.go:136
kubevirt.io/containerized-data-importer/pkg/util.GetAvailableSpaceByVolumeMode
        /remote-source/app/pkg/util/util.go:106
main.main
        /remote-source/app/cmd/cdi-importer/importer.go:131
runtime.main
        /usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
        /usr/lib/golang/src/runtime/asm_amd64.s:1598
➜  oc get hostedcluster -n clusters 3d9ec3c7e495f1c58da1 -ojsonpath='{.status.version.desired}' | jq
{
  "image": "registry.build01.ci.openshift.org/ci-op-ywf2rxrx/release@sha256:940a0463d1203888fb4e5fa4a09b69dc4eb3cc5d70dee22e1155c677aafca197",
  "version": "4.14.0-0.ci-2023-07-28-090906"
}
➜  oc get hostedcluster -n clusters 3d9ec3c7e495f1c58da1                                    
NAME                   VERSION   KUBECONFIG                              PROGRESS   AVAILABLE   PROGRESSING   MESSAGE
3d9ec3c7e495f1c58da1             3d9ec3c7e495f1c58da1-admin-kubeconfig   Partial    True        False         The hosted control plane is available
➜  oc get clusterversion version -ojsonpath='{.status.desired.image}'
registry.build01.ci.openshift.org/ci-op-ywf2rxrx/release@sha256:940a0463d1203888fb4e5fa4a09b69dc4eb3cc5d70dee22e1155c677aafca197                                                       
➜  oc get vmi -A                                             
No resources found

Expected results:

All pods on the control plane should be ready

Additional info:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/41772/rehearse-41772-periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-kubevirt-baremetalds-conformance/1684954151244533760

https://github.com/openshift/hypershift/pull/2860

Bug OCPBUGS-4053: container_network* metrics stop reporting after container restart

View the Description View the linked PRs

Description of problem:

container_network* metrics stop reporting after a container restarts. Other container_* metrics continue to report for the same pod.

How reproducible:

Issue can be reproduced by triggering a container restart

Steps to Reproduce:

1.Restart container 
2.Check metrics and see container_network* not reporting

Additional info:
Ticket with more detailed debugging process ~~OHSS-16739~~

https://github.com/openshift/kubernetes/pull/1594

Bug OCPBUGS-6418: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/48

Story TRT-1193: Hypershift conformance job blocking nightly payloads

View the Description View the linked PRs

First showed on https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-08-16-042125

Did not appear to happen on https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-08-15-200133

Changelog is getting huge but I diffed these two PRs:

❯ diff 1.txt 2.txt 
2a3
>     Use go 1.18 when setting up environment (#5422) #5422
15a17
>     CFE-688: Update install-config CRD to support gcp labels and tags #7126
23a26,27
>     OCPBUGS-17711: Revert “pkg/cli/admin/release/extract: Add –included and –install-config” #1527
>     Update openshift/api #1525
28a33
>     pkg/aws/actuator: Drop comment which suggested passthrough permission verification #590
49a55,59
> cluster-control-plane-machine-set-operator
> 
>     OCPCLOUD-2130: Add subnet to Azure FD, fix for optional fields in FD #229
>     Full changelog
> 
64a75
>     IR-373: remove node-ca daemon #867
126a138,147
> cluster-storage-operator
> 
>     STOR-1274: use granular permissons for Azure credential requests #388
>     Full changelog
> 
> cluster-version-operator
> 
>     CNF-9385: add ImageRegistry capability #950
>     Full changelog
> 
132a154,158
> container-networking-plugins
> 
>     OCPBUGS-17681: Default CNI binaries to RHEL 8 #116
>     Full changelog
> 
143a170,174
> haproxy-router
> 
>     OCPBUGS-17653: haproxy/template: mitigate CVE-2023-40225 #505
>     Full changelog
> 
193a225,229
> monitoring-plugin
> 
>     OCPBUGS-17650: Fix /monitoring/ redirects #68
>     Full changelog
> 
204a241,245
> openstack-machine-api-provider
> 
>     Bump CAPO to match branch release-0.7 #80
>     Full changelog
> 
206a248,249
>     OCPBUGS-17157: scripts: add a Go-based bumper, sync upstream #534
>     Add ncdc to DOWNSTREAM_OWNERS #539
223a267
>     update watch-endpoint-slices to usable shape #28184

https://github.com/openshift/cluster-image-registry-operator/pull/899

Bug OCPBUGS-10345: Runtime error on console backend

View the Description View the linked PRs

Description of problem:

A runtime error is encountered when running the console backend in off-cluster mode against only one cluster (non-multicluster configuration)

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Follow readme instructions for running bridge locally
2.
3.

Actual results:

Bridge crashes with a runtime error

Expected results:

Bridge should run normally

Additional info:

https://github.com/openshift/console/pull/12654

Bug OCPBUGS-14185: Alert Rules do not have summary/description

View the Description View the linked PRs

Description of problem:

Alert Rules do not have summary/description

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

Check details of following Alert Rules
1. KubeletHealthState
2. MCCDrainError
3. MCDPivotError
4. MCDRebootError
5. SystemMemoryExceedsReservation

Actual results:

These Alert Rules do not have Summary/Description annotation, but have a 'message' annotation. OpenShift alerts must use 'description' -- consider renaming the annotation

Expected results:

Alerts should have Summary/Description annotation.

Additional info:

Alerts must have a summary/description annotation, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 


To resolve the bug, 
- Rename message annotation to summary/description annotation
- Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27944

https://github.com/openshift/machine-config-operator/pull/3721

Bug OCPBUGS-8938: alert TargetDown fired for XXX seconds with labels: {job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"}

View the Description View the linked PRs

From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1423947091704549376:

```
alert TargetDown fired for 13 seconds with labels:

{job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"}

```

Checking kubelet logs for all the nodes:
```
Aug 07 10:11:49.788245 libvirt-ppc64le-1-1-9-kfv8v-master-0 crio[1244]: time="2021-08-07 10:11:49.788169211Z" level=info msg="Started container dd7e2473c51870c1894531af9a3935b907340a31216f85c32e391bddf22d7fd0: openshift-machine-config-operator/machine-config-daemon-7r2bb/machine-config-daemon" id=15456b41-39c9-41ce-8f10-71398df6dd26 name=/runtime.v1alpha2.RuntimeService/StartContainer
Aug 07 10:11:49.265439 libvirt-ppc64le-1-1-9-kfv8v-master-1 crio[1242]: time="2021-08-07 10:11:49.264443242Z" level=info msg="Created container 0651d7904d63a3f2c1fa9177d2ccf890c8fc769e96c836074aa8cc28a8bd7e04: openshift-machine-config-operator/machine-config-daemon-pk29l/machine-config-daemon" id=a622e284-7d45-4b72-b271-c39081c2c77a name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 07 10:11:49.602420 libvirt-ppc64le-1-1-9-kfv8v-master-2 crio[1243]: time="2021-08-07 10:11:49.602359290Z" level=info msg="Started container 5a24f464210595cd394aacd4e98903a196d67762a53d764bd6f4a6010cc17acf: openshift-machine-config-operator/machine-config-daemon-69fw6/machine-config-daemon" id=89b0650c-741e-4c61-ab49-f68aa82cb302 name=/runtime.v1alpha2.RuntimeService/StartContainer
Aug 07 10:15:54.666525 libvirt-ppc64le-1-1-9-kfv8v-worker-0-gddxw crio[1252]: time="2021-08-07 10:15:54.666233168Z" level=info msg="Started container 8ba32989af629e00c35578c51e9b5612ca8ddcf97b32f2b500d777a6eb2ff2e1: openshift-machine-config-operator/machine-config-daemon-5tb88/machine-config-daemon" id=4fa0e2ba-54aa-41a8-ab7b-7a3b6f6a9998 name=/runtime.v1alpha2.RuntimeService/StartContainer
Aug 07 10:16:14.170188 libvirt-ppc64le-1-1-9-kfv8v-worker-0-p76x7 crio[1235]: time="2021-08-07 10:16:14.170137303Z" level=info msg="Started container 78d933af1e7100050332b1df62e67d1fc71ca735c7a7d3c060411f61f32a0c74: openshift-machine-config-operator/machine-config-daemon-k6l8w/machine-config-daemon" id=c344fd94-abeb-4393-87f3-5bcaba21d45f name=/runtime.v1alpha2.RuntimeService/StartContainer
```

All containers started before the test started (before 2021-08-07T10:28:00Z, see https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1423947091704549376/build-log.txt). Checking https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1423947091704549376/artifacts/ocp-jenkins-e2e-remote-libvirt-ppc64le/gather-libvirt/artifacts/pods.json:

```
machine-config-daemon-5tb88_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-worker-0-gddxw, 0 restarts, ready since 2021-08-07T10:16:07Z
machine-config-daemon-k6l8w_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-worker-0-p76x7, 0 restarts, ready since 2021-08-07T10:16:14Z
machine-config-daemon-69fw6_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-master-2, 0 restarts, ready since 2021-08-07T10:11:49Z
machine-config-daemon-pk29l_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-master-1, 0 restarts, ready since 2021-08-07T10:11:49Z
machine-config-daemon-7r2bb_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-master-0, 0 restarts, ready since 2021-08-07T10:11:49Z
```

All containers were running since they got created and never restarted.

The incident (alert TargetDown fired for 13 seconds) occurred at August 7, 2021 10:33:18 AM. The test suite finished 2021-08-07T10:33:40Z.

Based on the TargetDown definition (see https://github.com/openshift/cluster-monitoring-operator/blob/001eccd81ff51af0ed7a9d463dd35bfa9b75d102/assets/cluster-monitoring-operator/prometheus-rule.yaml#L16-L28):
```

alert: TargetDown
annotations:
description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service
}} targets in {{ $labels.namespace }} namespace have been unreachable for
more than 15 minutes. This may be a symptom of network connectivity issues,
down nodes, or failures within these components. Assess the health of the
infrastructure and nodes running these targets and then contact support.'
summary: Some targets were not reachable from the monitoring server for an
extended period of time.
expr: |
100 * (count(up == 0 unless on (node) max by (node) (kube_node_spec_unschedulable == 1)) BY (job, namespace, service) /
count(up unless on (node) max by (node) (kube_node_spec_unschedulable == 1)) BY (job, namespace, service)) > 10
for: 15m
```

The machine-config-daemon was down for 15m and 13s. Given the test suite ran for ~5m42s (10:33:18-10:28:00), the target was down before the test suite started to run.

This patterns repears in other jobs as well:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1424128286878863360 (alert TargetDown fired for 13 seconds with labels, Step ocp-jenkins-e2e-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 5m40s)
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1424309483785424896 (alert TargetDown fired for 43 seconds with labels, Step ocp-jenkins-e2e-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 6m20s)
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1424490678854881280 (alert TargetDown fired for 30 seconds with labels, Step ocp-jenkins-e2e-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 5m20s)
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-image-ecosystem-remote-libvirt-ppc64le/1424294383288586240 (alert TargetDown fired for 60 seconds with labels, Step ocp-image-ecosystem-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 10m0s.)

For other jobs see:
https://search.ci.openshift.org/?search=alert+TargetDown+fired+for+.*+seconds+with+labels%3A+%5C%7Bjob%3D%22machine-config-daemon%22%2C+namespace%3D%22openshift-machine-config-operator%22%2C+service%3D%22machine-config-daemon%22%2C+severity%3D%22warning%22%5C%7D&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/machine-config-operator/pull/3663

Bug OCPBUGS-10122: Update 4.14 ose-aws-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/459

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/459

Bug OCPBUGS-11418: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7091

Bug OCPBUGS-11930: AWS VPC endpoint service not cleaned up when access to customer credentials lost

View the Description View the linked PRs

Description of problem:

VPC endpoint service cannot be cleaned up by HyperShift operator when the OIDC provider of the customer cluster has been deleted.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Sometimes

Steps to Reproduce:

1.Create a HyperShift hosted cluster
2.Delete the HyperShift cluster's OIDC provider in AWS
3.Delete the HyperShift hosted cluster

Actual results:

Cluster is stuck deleting

Expected results:

Cluster deletes

Additional info:

The hypershift operator is stuck trying to delete the AWS endpoint service but it can't be deleted because it gets an error that there are active connections.

https://github.com/openshift/hypershift/pull/2438

Bug OCPBUGS-11691: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/247

Bug OCPBUGS-14461: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/261

Bug OCPBUGS-8349: Bootstrap kubelet client cert should include system:serviceaccounts group

View the Description View the linked PRs

Description of problem:

On a freshly installed cluster, the control-plane-machineset-operator begins rolling a new master node, but the machine remains in a Provisioned state and never joins as a node.

Its status is:
Drain operation currently blocked by: [{Name:EtcdQuorumOperator Owner:clusteroperator/etcd}]

The cluster is left in this state until an admin manually removes the stuck master node, at which point a new master machine is provisioned and successfully joins the cluster.

Version-Release number of selected component (if applicable):

4.12.4

How reproducible:

Observed at least 4 times over the last week, but unsure on how to reproduce.

Actual results:

A master node remains in a stuck Provisioned state and requires manual deletion to unstick the control plane machine set process.

Expected results:

No manual interaction should be necessary.

Additional info:

https://github.com/openshift/installer/pull/7032

Bug OCPBUGS-8446: Pausing pools in OCP 4.13 will cause critical alerts to fire

View the Description View the linked PRs

Description of problem:

The certificates synced by MCO in 4.13 onwards are more comprehensive and correct, and out of sync issues will surface much faster.

See https://issues.redhat.com/browse/MCO-499 for details

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1.Install 4.13, pause MCPs
2.
3.

Actual results:

Within ~24 hours the cluster will fire critical clusterdown alerts

Expected results:

No alerts fire

Additional info:

https://github.com/openshift/machine-config-operator/pull/3575

Bug OCPBUGS-13648: Installation of a non-latest operator version doesn't show correct install state

View the Description View the linked PRs

This PR will allow the installation of non-latest Operator channels and associated versions. https://github.com/openshift/console/pull/12743

When I version is installed that is not the `currentCSV` default version for a channel, The data returns `installed: false` and `installed state: "Not Installed"`

So the UI doesn't place an "Installed" label on the operator card in OperatorHub and the user doesn't see that it's already installed when viewing the operator details.

Version-Release number of selected component (if applicable):

4.14 cluster

Steps to Reproduce:

In OperatorHub select Data Grid operator and install version 8.4.3.
Once installed, go into OperatorHub and select Data Grid operator card. Note there isn't an "Installed" label on card.
Select the Data Grid card, once open is should have a show that the operator is installed with a link to the installed version.

Animated screen gif of installed Data Grid version 8.4.3, the default latest version is 8.4.4

https://drive.google.com/file/d/1KVMCdflBYsI3yiLf2oQv69MoStgA5kof/view?usp=sharing

Actual results:

obj data returns `installState: "Not Installed" and `installed: false`

Expected results:

obj data returns `installState: "Installed" and `installed: true`

Additional info:

Requires 4.14 cluster to support installing previous versions and channels

https://github.com/openshift/console/pull/12743

Bug OCPBUGS-15659: Installer should have pre-check for capability MachineAPI when installing IPI without it

View the Description View the linked PRs

Description of problem:

On 4.14, 'MachineAPI' is marked as optional capability which will disable two operators machine-api and cluster-autoscaler.

epic link: https://issues.redhat.com/browse/CNF-6318

And operator machine-api is required for common IPI (no SNO and no compact) cluster, so if disabling "MachineAPI" in install-config.yaml, common IPI cluster will be installed failed.

Suggest to have pre-check on installer side for common IPI (no SNO and no compact) when running "openshift-installer create cluster". If MachineAPI is disabled, installer should exit with corresponding messages.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-30-131338

How reproducible:

Always

Steps to Reproduce:

1. Prepare install-config.yaml and set baselineCapabilitySet as None, make sure that compute node number is greater than 0.
2. Run command "openshift-install create cluster" to install common IPI
3.

Actual results:

Installation failed since missing machine-api operator

Expected results:

Installer should have pre-check for this scenario and exit with error message if MachineAPI is disabled

Additional info:

https://github.com/openshift/installer/pull/7414

Bug MGMT-15661: Disks sometimes have the wrong serial

View the Description View the linked PRs

Description of the problem:

We get the disk serial from ghw, which gets it from looking at 2 udev properties. There are a couple more recent udev properties that should be tried first, as lsblk does:

https://github.com/util-linux/util-linux/blob/36c52fd14b83e6f7eff9a565c426a1e21812403b/misc-utils/lsblk-properties.c#L122-L128

I have a PR open on ghw that should solve the issue. We'll need to update our version of ghw once it's merged.

See more info in the ABI ticket: https://issues.redhat.com/browse/OCPBUGS-18174

https://github.com/openshift/assisted-installer-agent/pull/594

Bug OCPBUGS-11751: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/763

Bug OCPBUGS-10188: Update 4.14 ose-azure-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/59

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/59

Bug OCPBUGS-9831: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/27673

Bug OCPBUGS-14272: Race condition in TestMCDRotatesCertsOnPausedPool

View the Description View the linked PRs

Description of problem:

This test tends to be flakey; depending on how the cert changes are propagated. We rotate 2/7 certs in the bundle; if the changes don't get batched together, the assert to verify after the certs happens too soon causing the test to fail.

Version-Release number of selected component (if applicable):

4.14.0

https://github.com/openshift/machine-config-operator/pull/3718

Bug OCPBUGS-14466: The statefulset thanos-ruler-user-workload lacks serviceName

View the Description View the linked PRs

Description of problem:

The statefulset thanos-ruler-user-workload no serviceName. As the document described, the serviceName is a must for Statefulset. I'm not sure if we need service here, but one question, if we don't need service, why not use a regular Deployment? Thanks!

MacBook-Pro:k8sgpt jianzhang$ oc explain statefulset.spec.serviceName 
KIND:     StatefulSet
VERSION:  apps/v1FIELD:    serviceName <string>DESCRIPTION:
     serviceName is the name of the service that governs this StatefulSet. This
     service must exist before the StatefulSet, and is responsible for the
     network identity of the set. Pods get DNS/hostnames that follow the
     pattern: pod-specific-string.serviceName.default.svc.cluster.local where
     "pod-specific-string" is managed by the StatefulSet controller.

MacBook-Pro:k8sgpt jianzhang$ oc get statefulset -n openshift-user-workload-monitoring -o=jsonpath={.spec.serviceName}
MacBook-Pro:k8sgpt jianzhang$ 

MacBook-Pro:k8sgpt jianzhang$ oc get statefulset -n openshift-user-workload-monitoring
NAME                         READY   AGE
prometheus-user-workload     2/2     4h44m
thanos-ruler-user-workload   2/2     4h44m

MacBook-Pro:k8sgpt jianzhang$ oc get svc -n openshift-user-workload-monitoring
NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
prometheus-operated                       ClusterIP   None            <none>        9090/TCP,10901/TCP            4h44m
prometheus-operator                       ClusterIP   None            <none>        8443/TCP                      4h44m
prometheus-user-workload                  ClusterIP   172.30.46.204   <none>        9091/TCP,9092/TCP,10902/TCP   4h44m
prometheus-user-workload-thanos-sidecar   ClusterIP   None            <none>        10902/TCP                     4h44m
thanos-ruler                              ClusterIP   172.30.110.49   <none>        9091/TCP,9092/TCP,10901/TCP   4h44m
thanos-ruler-operated                     ClusterIP   None            <none>        10902/TCP,10901/TCP           4h44m

Version-Release number of selected component (if applicable):

MacBook-Pro:k8sgpt jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-05-31-080250   True        False         7h30m   Cluster version is 4.14.0-0.nightly-2023-05-31-080250

How reproducible:

always

Steps to Reproduce:

1. Install OCP 4.14 cluster.
2. Check cluster's statefulset instances or run `k8sgpt analyze -d`
3.

Actual results:

MacBook-Pro:k8sgpt jianzhang$ k8sgpt analyze -d
Service nfs-provisioner/example.com-nfs does not exist
AI Provider: openai


0 openshift-user-workload-monitoring/thanos-ruler-user-workload(thanos-ruler-user-workload)
- Error: StatefulSet uses the service openshift-user-workload-monitoring/ which does not exist.
  Kubernetes Doc: serviceName is the name of the service that governs this StatefulSet. This service must exist before the StatefulSet, and is responsible for the network identity of the set. Pods get DNS/hostnames that follow the pattern: pod-specific-string.serviceName.default.svc.cluster.local where "pod-specific-string" is managed by the StatefulSet controller.

Expected results:

There is the serviceName for statefulset.

Additional info:

https://github.com/openshift/prometheus-operator/pull/236

Bug OCPBUGS-7954: [BZ] The script for certs check fails with Openstack client version 3.18.1

View the Description View the linked PRs

Description of problem:

The script for checking the certs for Openshift install on openstack fails. 

https://docs.openshift.com/container-platform/4.12/installing/installing_openstack/preparing-to-install-on-openstack.html#security-osp-validating-certificates_preparing-to-install-on-openstack

I see that the command "openstack catalog list --format json --column Name --column Endpoints" returns output as,

-----------
[
  {
    "Name": "heat-cfn",
    "Endpoints": "RegionOne\n  admin: http://10.254.x.x:8000/v1\nRegionOne\n  public: https://<domain_name>:8000/v1\nRegionOne\n  internal: http://10.254.x.x:8000/v1\n"
  },
  {
    "Name": "cinderv2",
    "Endpoints": "RegionOne\n  admin: http://10.254.x.x:8776/v2/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  internal: http://10.254.x.x:8776/v2/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  public: https://<domain_name>:8776/v2/f36f2db6bb434484b71a45aa84b9d790\n"
  },
  {
    "Name": "glance",
    "Endpoints": "RegionOne\n  public: https://<domain_name>:9292\nRegionOne\n  admin: http://10.254.x.x:9292\nRegionOne\n  internal: http://10.254.x.x:9292\n"
  },
  {
    "Name": "keystone",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:5000\nRegionOne\n  admin: http://10.254.x.x:35357\nRegionOne\n  public: https://<domain_name>:5000\n"
  },
  {
    "Name": "swift",
    "Endpoints": "RegionOne\n  admin: https://ch-dc-s3-gsn-33.eecloud.nsn-net.net:10032/swift/v1\nRegionOne\n  public: https://ch-dc-s3-gsn-33.eecloud.nsn-net.net:10032/swift/v1\nRegionOne\n  internal: https://ch-dc-s3-gsn-33.eecloud.nsn-net.net:10032/swift/v1\n"
  },
  {
    "Name": "nova",
    "Endpoints": "RegionOne\n  public: https://<domain_name>:8774/v2.1\nRegionOne\n  internal: http://10.254.x.x:8774/v2.1\nRegionOne\n  admin: http://10.254.x.x:8774/v2.1\n"
  },
  {
    "Name": "heat",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:8004/v1/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  public: https://<domain_name>:8004/v1/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  admin: http://10.254.x.x:8004/v1/f36f2db6bb434484b71a45aa84b9d790\n"
  },
  {
    "Name": "cinder",
    "Endpoints": ""
  },
  {
    "Name": "cinderv3",
    "Endpoints": "RegionOne\n  public: https://<domain_name>:8776/v3/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  admin: http://10.254.x.x:8776/v3/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  internal: http://10.254.x.x:8776/v3/f36f2db6bb434484b71a45aa84b9d790\n"
  },
  {
    "Name": "neutron",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:9696\nRegionOne\n  public: https://<domain_name>:9696\nRegionOne\n  admin: http://10.254.x.x:9696\n"
  },
  {
    "Name": "placement",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:8778\nRegionOne\n  admin: http://10.254.x.x:8778\nRegionOne\n  public: https://<domain_name>:8778\n"
  }
]
-----------

Which then expected to be filtered with jq as " | jq -r '.[] | .Name as $name | .Endpoints[] | [$name, .interface, .url] | join(" ")'| sort " 


But it fails with error as,

----------------
./certs.sh
jq: error (at <stdin>:46): Cannot iterate over string ("RegionOne\...)

Further check the script following commands execution is  failing
 openstack catalog list --format json --column Name --column Endpoints \
> | jq -r '.[] | .Name as $name | .Endpoints[] | [$name, .interface, .url] | join(" ")'
jq: error (at <stdin>:46): Cannot iterate over string ("RegionOne\...)
----------------

Where certs.sh is the script we copied from documentation.

I did some debugs to get the things .interface,.url to internal,public,admin fields from endpoint but I'm not sure if that's way it is on openstack so marking this as BZ to have reviewed.

Version-Release number of selected component (if applicable):

Openshift Container Platform 4.12 on 3.18.1 release of openstack

How reproducible:

- Always

Steps to Reproduce:

1. Copy the script and run it on given release of openstack version. 2.
3.

Actual results:

Fails with parsing

Expected results:

Shouldn't fail.

Additional info:

Bug OCPBUGS-8258: create cluster-manifests fails when imageContentSources is missing

View the Description View the linked PRs

Invoking 'create cluster-manifests' fails when imageContentSources is missing in install-config yaml:

$ openshift-install agent create cluster-manifests
INFO Consuming Install Config from target directory
FATAL failed to write asset (Mirror Registries Config) to disk: failed to write file: open .: is a directory

install-config.yaml:

apiVersion: v1alpha1
metadata:
  name: appliance
rendezvousIP: 192.168.122.116
hosts:
  - hostname: sno
    installerArgs: '["--save-partlabel", "agent*", "--save-partlabel", "rhcos-*"]'
    interfaces:
     - name: enp1s0
       macAddress: 52:54:00:e7:05:72
    networkConfig:
      interfaces:
        - name: enp1s0
          type: ethernet
          state: up
          mac-address: 52:54:00:e7:05:72
          ipv4:
            enabled: true
            dhcp: true

https://github.com/openshift/installer/pull/6926

Bug OCPBUGS-10598: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3620

Bug OCPBUGS-13895: route-controller-manager configuration changes for a refactoring

View the Description View the linked PRs

Description of problem:

The following changes are required for openshift/route-controller-manager#22 refactoring.

add POD_NAME to route-controller-manager deployment
introduce route-controller-defaultconfig and customize lease name openshift-route-controllers to override the default one supplied by library-go
add RBAC for infrastructures which is used by library-go for configuring leader election

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/288

Bug OCPBUGS-14428: [HyperShift] Hypershift lanes are failing in CI due to Alerting rule "CsvAbnormalFailedOver2Min" (group: olm.csv_abnormal.rules)

View the Description View the linked PRs

Description of problem:

We are seeing flakes in HyperShift CI jobs: https://search.ci.openshift.org/?search=Alerting+rule+%22CsvAbnormalFailedOver2Min%22&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Sample failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1692/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-hypershift/1664244482360479744

{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:148]: Incompliant rules detected:

Alerting rule "CsvAbnormalFailedOver2Min" (group: olm.csv_abnormal.rules) has no 'description' annotation, but has a 'message' annotation. OpenShift alerts must use 'description' -- consider renaming the annotation
Alerting rule "CsvAbnormalFailedOver2Min" (group: olm.csv_abnormal.rules) has no 'summary' annotation
Alerting rule "CsvAbnormalOver30Min" (group: olm.csv_abnormal.rules) has no 'description' annotation, but has a 'message' annotation. OpenShift alerts must use 'description' -- consider renaming the annotation
Alerting rule "CsvAbnormalOver30Min" (group: olm.csv_abnormal.rules) has no 'summary' annotation
Ginkgo exit error 1: exit with code 1}

Version-Release number of selected component (if applicable):

4.14 CI

How reproducible:

sometimes

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/2636

Bug OCPBUGS-19463: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3029

Bug OCPBUGS-6581: Serverless - Eventing - Channels: Conditions column i18n misses

View the Description View the linked PRs

Description of problem:

Serverless -> Eventing -> Channels, Values under conditions column are in Englis.
Translator comments:
"x OK/y" should be translated as "x个 OK(共y个)"

Version-Release number of selected component (if applicable):

4.13.0-ec.1

How reproducible:

always

Steps to Reproduce:

1. Navigate to Serverless -> Eventing -> Channels.
2. Values under Conditions column are in English.
3.

Actual results:

Content is in English.

Expected results:

Content should be in target language. x OK/y" should be translated as "x个 OK(共y个)"

Additional info:

screenshot provided

https://github.com/openshift/console/pull/12641

Bug MGMT-14883: It shouldn't be possible to create a cluster with platform = OCI and OCP < 4.14

View the Description View the linked PRs

Description of the problem:
OCI platform is available only from OCP 4.14, we shouldn't be able to create an OCI cluster with OCP < 4.14

How reproducible:

You can reproduce with aicli

Steps to reproduce:

$ aicli --integration create cluster agentil-test-oci-19 -P platform='{"type": "oci"}' -P pull_secret=<your pull secret> -P user_managed_networking=true -P minimal=true -P openshift_version=4.13

Actual results:

 [agentil@fedora Downloads]$ aicli --integration create cluster agentil-test-oci-19 -P platform='{"type": "oci"}' -P pull_secret=~/Downloads/pull-secret.txt -P user_managed_networking=true -P minimal=true -P openshift_version=4.13
Creating cluster agentil-test-oci-19
Using karmalabs.corp as DNS domain as no one was provided
Forcing network_type to OVNKubernetes
Using version 4.13.2
Creating infraenv agentil-test-oci-19_infra-env
Using karmalabs.corp as DNS domain as no one was provided

[agentil@fedora Downloads]$ aicli --integration info cluster agentil-test-oci-19
ams_subscription_id: 2QvJWtlvlUIvFtCmOIPiwkHRirC
api_vips: []
base_dns_domain: karmalabs.corp
cluster_networks: [{'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'cidr': '10.128.0.0/14', 'host_prefix': 23}]
connectivity_majority_groups: {"IPv4":[],"IPv6":[]}
controller_logs_collected_at: 0001-01-01 00:00:00+00:00
controller_logs_started_at: 0001-01-01 00:00:00+00:00
cpu_architecture: x86_64
created_at: 2023-06-08 12:42:36.327854+00:00
disk_encryption: {'enable_on': 'none', 'mode': 'tpmv2', 'tang_servers': None}
email_domain: redhat.com
feature_usage: {"Cluster Tags":{"id":"CLUSTER_TAGS","name":"Cluster Tags"},"Hyperthreading":{"data":{"hyperthreading_enabled":"all"},"id":"HYPERTHREADING","name":"Hyperthreading"},"OVN network type":{"id":"OVN_NETWORK_TYPE","name":"OVN network type"},"Platform selection":{"data":{"platform_type":"oci"},"id":"PLATFORM_SELECTION","name":"Platform selection"},"User Managed Networking With Multi Node":{"id":"USER_MANAGED_NETWORKING_WITH_MULTI_NODE","name":"User Managed Networking With Multi Node"}}
high_availability_mode: Full
hyperthreading: all
id: 65f2a1fa-efd2-419a-9bf0-802e595a0a63
ignition_endpoint: {'url': None, 'ca_certificate': None}
imported: False
ingress_vips: []
install_completed_at: 0001-01-01 00:00:00+00:00
install_started_at: 0001-01-01 00:00:00+00:00
ip_collisions: {}
machine_networks: []
monitored_operators: [{'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'name': 'console', 'version': None, 'namespace': None, 'subscription_name': None, 'operator_type': 'builtin', 'properties': None, 'timeout_seconds': 3600, 'status': None, 'status_info': None, 'status_updated_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzutc())}, {'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'name': 'cvo', 'version': None, 'namespace': None, 'subscription_name': None, 'operator_type': 'builtin', 'properties': None, 'timeout_seconds': 3600, 'status': None, 'status_info': None, 'status_updated_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzutc())}]
name: agentil-test-oci-19
network_type: OVNKubernetes
ocp_release_image: quay.io/openshift-release-dev/ocp-release:4.13.2-x86_64
openshift_version: 4.13.2
org_id: 11009103
platform: {'type': 'oci'}
progress: {'total_percentage': None, 'preparing_for_installation_stage_percentage': None, 'installing_stage_percentage': None, 'finalizing_stage_percentage': None}
schedulable_masters: False
schedulable_masters_forced_true: True
service_networks: [{'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'cidr': '172.30.0.0/16'}]
status: insufficient
status_info: Cluster is not ready for install
status_updated_at: 2023-06-08 12:42:36.324000+00:00
tags: aicli
updated_at: 2023-06-08 12:42:43.362119+00:00
user_managed_networking: True
user_name: agentil@redhat.com

Expected results:

The cluster creation should fail because the version of OCP is incompatible with OCI platform.

https://github.com/openshift/assisted-service/pull/5290

Bug OCPBUGS-15421: GCP XPN Installs fail when authenticating with CLI

View the Description View the linked PRs

Description of problem:

When authenticating openshift-install with the gcloud cli, rather than using a service account key file, the installer will throw an error because https://github.com/openshift/installer/blob/master/pkg/asset/machines/gcp/machines.go#L170-L178 ALWAYS expects to extract a service account to passthrough to nodes in XPN installs. 

An alternative approach would be to handle the lack of service account without error, and allow the required service accounts to passed in through another mechanism.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create install config for gcp xpn install
2. Authenticate installer without service account key file (either gcloud cli auth or through a VM).
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7308

Task MON-3216: Add ownership labels to CMO kube resources

View the Description View the linked PRs

As discussed in https://issues.redhat.com/browse/MON-1634, adding ownerref will be put on hold for now until CMO has a CR.

In the meantime we'll add (let's hope temporary) labels to emphasize ownership, this will help guide users for now and help us highlight relations and how we can/want to express them using ownerref in the future. (See option 1 and option 2 in the doc above)

https://github.com/openshift/cluster-monitoring-operator/pull/1986

Bug OCPBUGS-8048: transition to multi-arch via cli have no guard for cluster conditions

View the Description View the linked PRs

Description of problem:

"oc adm upgrade --to-multi-arch" command have no guard in cases where there's cluster conditions that may interfere with the transition, such as:
Invalid=True, Failing=True, and Progressing=True

Steps to Reproduce:

either apply the command while an upgrade is in progress, or while there's cluster conditions such as Invalid=True or Failing=True

Actual results:

accepts the command

Expected results:

warns about the interfering condition, while allowing to progress only if --allow-upgrade-with-warnings is applied

https://github.com/openshift/oc/pull/1359

Bug OCPBUGS-14696: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/45

Bug OCPBUGS-8530: Nutanix cloud-controller-manager pod not have permission to get/list ConfigMap

View the Description View the linked PRs

Description of problem:

The e2e-nutanix test run failed at bootstrap stage when testing the PR https://github.com/openshift/cloud-provider-nutanix/pull/7. Could reproduce the bootstrap failure with the manual testing to create a Nutanix OCP cluster with the latest nutanix-ccm image.

time="2023-03-06T12:25:56-05:00" level=error msg="Bootstrap failed to complete: timed out waiting for the condition"
time="2023-03-06T12:25:56-05:00" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane."
time="2023-03-06T12:25:56-05:00" level=warning msg="The bootstrap machine is unable to resolve API and/or API-Int Server URLs"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

From the PR https://github.com/openshift/cloud-provider-nutanix/pull/7, trigger the e2e-nutanix test. The test will fail at bootstrap stage with the described errors.

Actual results:

The e2e-nutanix test run failed at bootstrapping with the errors: 

level=error msg=Bootstrap failed to complete: timed out waiting for the condition
level=error msg=Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.

Expected results:

The e2e-nutanix test will pass

Additional info:

Investigation showed the root cause was the Nutanix cloud-controller-manager pod did not have permission to get/list ConfigMap resource. The error logs from the Nutanix cloud-controller-manager pod:

E0307 16:08:31.753165       1 reflector.go:140] pkg/provider/client.go:124: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-controller-manager:cloud-controller-manager" cannot list resource "configmaps" in API group "" at the cluster scope
I0307 16:09:30.050507       1 reflector.go:257] Listing and watching *v1.ConfigMap from pkg/provider/client.go:124
W0307 16:09:30.052278       1 reflector.go:424] pkg/provider/client.go:124: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-controller-manager:cloud-controller-manager" cannot list resource "configmaps" in API group "" at the cluster scope
E0307 16:09:30.052308       1 reflector.go:140] pkg/provider/client.go:124: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-controller-manager:cloud-controller-manager" cannot list resource "configmaps" in API group "" at the cluster scope

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/236

Bug OCPBUGS-19430: MCO keeps attempting to pull baremetalRuntimeCfg image again and again

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18772~~. The following is the description of the original issue:
—
MCO installs resolve-prepender NetworkManager script on the nodes. In order to find out node details it needs to pull baremetalRuntimeCfgImage. However, this image needs to be pulled just the first time, in the followup attempts this script just verifies that this image is available.

This is not desirable in situations where mirror / quay are not available or having a temporary problem - these kind of issues should not prevent the node from starting kubelet. During certificate rotation testing I noticed that the node with a significant time skew won't start kubelet, as it tries to pull baremetalRuntimeCfgImage for kubelet to start - but the image is already on the nodes and it doesn't need refreshing.

https://github.com/openshift/machine-config-operator/pull/3925

Task MGMT-15356: Ensure that manifest names are distinct between manifests and openshift directories.

View the Description View the linked PRs

Manifests are copied from the object store (either S3 or pod) into the node that is performing the role of bootstrap during installation (or to the single node in an SNO setup)

They are copied into one of two directories according to the directory into which they were uploaded to the object store.

<cluster-id>/manifests/manifests/* will end up being copied to /run/ephemeral/var/opt/openshift/manifests/
<cluster-id>/manifests/openshift/* will end up being copied to /run/ephemeral/var/opt/openshift/openshift/manifest

After this step, any files that have been written to /run/ephemeral/var/opt/openshift/openshift/ are also copied to /run/ephemeral/var/opt/openshift/manifests/, any identically named files are overwritten as part of this operation.

https://github.com/openshift/installer/blob/1e9209ac80ed2cb4ba5663f519e51161a1d8858a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L71C1-L71C27

This behaviour is entirely expected and correct, however it does lead to an issue where if a user chooses to upload a file to both directories with identical names, for example;

File 1: <cluster-id>/manifests/manifests/manifest1.yaml
File 2: <cluster-id>/manifests/openshift/manifest1.yaml

That the only File 2 would end up being applied and that File 1 would end up being overwritten during the bootkube phase.

We should prevent this from happening by treating any attempt to introduce the same file in two places as illegal, meaning that if File 2 is present, we should prevent the upload of File 1 and vice versa during the creation/update of a manifest.

https://github.com/openshift/assisted-service/pull/5382

Bug OCPBUGS-11801: agent-tui is failing to start when using libnmstate.2

View the Description View the linked PRs

Description of problem:

Now that the bug to include libnmstate.2.2.x has been resolved (https://issues.redhat.com/browse/OCPBUGS-11659) we are seeing a boot issue in which agent-tui can't start. It looks like it is failing to find the symlink libnmstate.2 as when its run directly we see 
$ /usr/local/bin/agent-tui
/usr/local/bin/agent-tui: error while loading shared libraries: libnmstate.so.2: cannot open shared object file: No such file or directory

This results neither the console or ssh available in bootstrap which makes debugging difficult. However it does not affect the installation as we still get a successful install. The bootstrap screenshots are attached.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7095

Bug OCPBUGS-12435: EgressNetworkPolicy DNS resolution does not fall back to TCP for truncated responses

View the Description View the linked PRs

Description of problem:

If the user specifies a DNS name in an egressnetworkpolicy for which the upstream server returns a truncated DNS response, openshift-sdn does not fall back to TCP as expected but just take this as a failure.

Version-Release number of selected component (if applicable):

4.11 (originally reproduced on 4.9)

How reproducible:

Always

Steps to Reproduce:

1. Setup an EgressNetworkPolicy that points to a domain where a truncated response is returned while querying via UDP.
2.
3.

Actual results:

Error, DNS resolution not completed.

Expected results:

Request retried via TCP and succeeded.

Additional info:

In comments.

https://github.com/openshift/sdn/pull/532

Bug OCPBUGS-7620: Edit Deployment (and DC) form doesn't enable Save button when changing strategy type

View the Description View the linked PRs

Description of problem:
When the user edits a deployment and switches (just) the rollout "Strategy type" the form couldn't be saved because the Save button stays disabled.

Version-Release number of selected component (if applicable):
4.13

How reproducible:
Always

Steps to Reproduce:

Import an application from git
Select action "Edit Deployment"
Change the "Strategy type" value

Actual results:
Save button stays disabled

Expected results:
Save button should enable when changing a value (that doesn't make the form state invalid)

Additional info:

https://github.com/openshift/console/pull/12608

Bug OCPBUGS-9964: egressip cannot be assigned on hypershift hosted cluster node

View the Description View the linked PRs

Description of problem:

egressip cannot be assigned on hypershift hosted cluster node

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-09-162945

How reproducible:

100%

Steps to Reproduce:

1. setup hypershift env


2. lable egress ip node on hosted cluster
% oc get node
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-129-175.us-east-2.compute.internal   Ready    worker   3h20m   v1.26.2+bc894ae
ip-10-0-129-244.us-east-2.compute.internal   Ready    worker   3h20m   v1.26.2+bc894ae
ip-10-0-141-41.us-east-2.compute.internal    Ready    worker   3h20m   v1.26.2+bc894ae
ip-10-0-142-54.us-east-2.compute.internal    Ready    worker   3h20m   v1.26.2+bc894ae

% oc label node/ip-10-0-129-175.us-east-2.compute.internal k8s.ovn.org/egress-assignable=""
node/ip-10-0-129-175.us-east-2.compute.internal labeled
% oc label node/ip-10-0-129-244.us-east-2.compute.internal k8s.ovn.org/egress-assignable=""
node/ip-10-0-129-244.us-east-2.compute.internal labeled
% oc label node/ip-10-0-141-41.us-east-2.compute.internal k8s.ovn.org/egress-assignable=""
node/ip-10-0-141-41.us-east-2.compute.internal labeled
% oc label node/ip-10-0-142-54.us-east-2.compute.internal  k8s.ovn.org/egress-assignable=""
node/ip-10-0-142-54.us-east-2.compute.internal labeled


3. create egressip
% cat egressip.yaml 
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressip-1
spec:
  egressIPs: [ "10.0.129.180" ]
  namespaceSelector:
    matchLabels:
      env: ovn-tests
% oc apply -f egressip.yaml 
egressip.k8s.ovn.org/egressip-1 created


4. check egressip assignment

Actual results:

egressip cannot assigned to node
% oc get egressip NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS egressip-1   10.0.129.180

Expected results:

egressip can be assigned to one of the hosted cluster node

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1734

Bug OCPBUGS-10591: machine API operator failing with No Major.Minor.Patch elements found

View the Description View the linked PRs

Description of problem:

Starting with 4.12.0-0.nightly-2023-03-13-172313, the machine API operator began receiving an invalid version tag either due to a missing or invalid VERSION_OVERRIDE(https://github.com/openshift/machine-api-operator/blob/release-4.12/hack/go-build.sh#L17-L20) value being passed tot he build.

This is resulting in all jobs invoked by the 4.12 nightlies failing to install.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-03-13-172313 and later

How reproducible:

consistently in 4.12 nightlies only(ci builds do not seem to be impacted).

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Example of failure https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-csi/1635331349046890496/artifacts/e2e-aws-csi/gather-extra/artifacts/pods/openshift-machine-api_machine-api-operator-866d7647bd-6lhl4_machine-api-operator.log

https://github.com/openshift/machine-api-operator/pull/1128

Bug OCPBUGS-11393: API usage document for route.spec.tls.insecureEdgeTerminationPolicy shows incorrect values in RHOCP 4

View the Description View the linked PRs

Description of problem:

Command `$ oc explain route.spec.tls.insecureEdgeTerminationPolicy` shows different values than the actual values.

Version-Release number of selected component (if applicable):

4.10.z

How reproducible:

100%

Steps to Reproduce:

1. $ oc explain route.spec.tls.insecureEdgeTerminationPolicy
KIND:     Route
VERSION:  route.openshift.io/v1FIELD:    insecureEdgeTerminationPolicy <string>DESCRIPTION:
     insecureEdgeTerminationPolicy indicates the desired behavior for insecure
     connections to a route. While each router may make its own decisions on
     which ports to expose, this is normally port 80.     
    
     * Allow - traffic is sent to the server on the insecure port (default)
     * Disable - no traffic is allowed on the insecure port.
     * Redirect - clients are redirected to the secure port.

2. Set the option to 'Disable' in any secure route :
   $ oc edit route <route-name>
     spec:
       host: hello.example.com
       port:
         targetPort: https
       tls:
         insecureEdgeTerminationPolicy: Disable

3. After editing the route and setting `insecureEdgeTerminationPolicy: Disable` , it gives error :
Danger alert:An error occurred
Error "Invalid value: "Disable": invalid value for InsecureEdgeTerminationPolicy option, acceptable values are None, Allow, Redirect, or empty" for field "spec.tls.insecureEdgeTerminationPolicy".

Actual results:

Based on the API Usage information, the Disable value for insecureEdgeTerminationPolicy field is not acceptable.

Expected results:

The `oc explain route.spec.tls.insecureEdgeTerminationPolicy` must show the correct values.

Additional info:

https://github.com/openshift/openshift-apiserver/pull/368

Bug OCPBUGS-13693: Bridge getConsolePlugins func throws exception if console plugin request fails

View the Description View the linked PRs

Description of problem:

We are not error checking the response when we request console plugins in getConsolePlugins. If this request fails, we still try to access the "Items" property of the response, which is nil, and causes an exception to be trhown. We need to make sure the request succeeded before referencing any properties of the response.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Run bridge locally without setting the requisite env vars

Actual results:

A runtime exception is thrown from the getConsolePlugins function and bridge terminates

Expected results:

An error should be logged and bridge should continue to run

Additional info:

https://github.com/openshift/console/pull/12817

Bug OCPBUGS-6016: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/468

Story HELM-502: Bump Helm version to 3.12 in ODC

View the Description View the linked PRs

Owner: Architect:

Story (Required)

As an ODC helm backend developer I would like to be able to bump version of helm to 3.12 to stay synched up with the version we will ship with OCP 4.14

Background (Required)

Normal activity we do every time a new OCP version is release to stay current

Glossary

Out of scope

Approach(Required)

Bump version of helm to 3.12 run, build and unit test and make sure everything is working as expected. Last time we had a conflict with DevFile backend.

Dependencies

Might had dependencies with DevFile team to move some dependencies forward

Edge Case

Acceptance Criteria

Console Helm dependency is moved to 3.12

INVEST Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Legend

Unknown
Verified
Unsatisfied

https://github.com/openshift/console/pull/13014

Bug OCPBUGS-17869: [Azure] Gate NAT gateway feature behind TechPreview

View the Description View the linked PRs

Description of problem:

NAT gateway is not yet a supported feature and the current implementation is a partial non-zonal solution.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Set OutboundType = NatGateway
2. Deploy cluster
3.

Actual results:

Install successful

Expected results:

Install requires TechPreviewNoUpgrade before proceeding

Additional info:

Bug OCPBUGS-13120: Serverless functions UI warning is misleading

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/openshift-docs/pull/59549#discussion_r1184195239

per the discussion here, the text in the dev console when creating a function says a func.yaml file must be present OR it must use the s2i build strategy, when in fact both things are required

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Go to +Add -> Create Serverless function and use a repo URL that doesn't fit the requirements in order to see the error

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12923

Bug OCPBUGS-9081: Destroy OCP takes huge time with bigger swift container

View the Description View the linked PRs

Version:

$ openshift-install version

./openshift-install 4.9.11
built from commit 4ee186bb88bf6aeef8ccffd0b5d4e98e9ddd895f
release image quay.io/openshift-release-dev/ocp-release@sha256:0f72e150329db15279a1aeda1286c9495258a4892bc5bf1bf5bb89942cd432de
release architecture amd64

Platform: Openstack

install type: IPI

What happened?

Image streams using the swift container to store the images, after running so many image streams I am able to see the huge number of objects in the swift container if I destroy the cluster now, it takes huge time based on the size of the swift container

What did you expect to happen?

The destroy script should clean the resources in some reasonable time

How to reproduce it (as minimally and precisely as possible)?

deploy OCP, run some workload which creates a lot of image streams and destroy the cluster, it will take a lot of time to complete the destroy cmd

Anything else we need to know?

here is the output of the swift state cmd and the time it took to complete the destroy job

$ swift stat vlan609-26jxm-image-registry-nseyclolgfgxoaiysrlejlhvoklcawbxt
Account: AUTH_2b4d979a2a9e4cf88b2509e9c5e0e232
Container: vlan609-26jxm-image-registry-nseyclolgfgxoaiysrlejlhvoklcawbxt
Objects: 723756
Bytes: 652448740473
Read ACL:
Write ACL:
Sync To:
Sync Key:
Meta Name: vlan609-26jxm-image-registry-nseyclolgfgxoaiysrlejlhvoklcawbxt
Meta Openshiftclusterid: vlan609-26jxm
Content-Type: application/json; charset=utf-8
X-Timestamp: 1640248399.77606
Last-Modified: Thu, 23 Dec 2021 08:34:48 GMT
Accept-Ranges: bytes
X-Storage-Policy: Policy-0
X-Trans-Id: txb0717d5198e344a5a095d-0061c93b70
X-Openstack-Request-Id: txb0717d5198e344a5a095d-0061c93b70

Time took to complete the destroy: 6455.42s

Task MGMT-11424: Assisted-service should validate the content of ignition endpoint URL and CA cert

View the Description View the linked PRs

In case of user provides partial/empty/invalid ca certificate in the ignition endpoint override the ignitionDownloadable/API_VIP validation will fail but the user will not know why.
In the agent log we will see this error:

Failed to download worker.ign: unable to parse cert

One option to let the user know about the problem is to return the error in case of failure as part of the APIVipConnectivityResponse and present it to the user.
and use that value as part of the failing validation message.
This is a bit tricky, the current error message are not user facing and we will need to adjust them.
It also requires API changes...
Another option is to validate the parameters the user provides

https://github.com/openshift/assisted-service/pull/5145

Bug MGMT-13925: Ironic agent is trying to access inspection API using the cluster VIP instead of the ip of the metal3 pod

View the Description View the linked PRs

Description of the problem:

While scale testing ACM 2.8, sometimes 0 of the SNOs are discovered. Upon review, the agent on the SNOs is attempting to return the inspection data to the API VIP ip address instead of the ip address of the metal3 pod (which is the node hosting the metal3 pod). Presumbly the times where the agents were discovered, the VIP API address happened to be on the same node as the metal3 pod.

How reproducible:

Roughly it should be 66% of the time you could encounter this with a 3 node cluster.

Steps to reproduce:

Actual results:

Ironic agents attempting to access "fc00:1004::3" which is the API vip address

2023-03-12 17:52:51.441 1 CRITICAL ironic-python-agent [-] Unhandled error: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='fc00:1004::3', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))                                                    
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn                                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     conn = connection.create_connection(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 96, in create_connection                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise err
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 86, in create_connection                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     sock.connect(sa)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 253, in connect                                                                      
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     socket_checkerr(fd)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise socket.error(err, errno.errorcode[err])
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent ConnectionRefusedError: [Errno 111] ECONNREFUSED
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:                                                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen                                                                     
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     httplib_response = self._make_request(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 382, in _make_request                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     self._validate_conn(conn)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     conn.connect()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 353, in connect                                                                         
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     conn = self._new_conn()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 181, in _new_conn                                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise NewConnectionError(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:                                                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 439, in send                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = conn.urlopen(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen                                                                     
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     retries = retries.increment(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment                                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise MaxRetryError(_pool, url, error or ResponseError(cause))                                                                                            
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='fc00:1004::3', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))                                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:                                                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/bin/ironic-python-agent", line 10, in <module>                                                                                                   
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     sys.exit(run())
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/cmd/agent.py", line 50, in run                                                                   
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     agent.IronicPythonAgent(CONF.api_url,
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/agent.py", line 471, in run                                                                      
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     uuid = inspector.inspect()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py", line 106, in inspect                                                              
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = call_inspector(data, failures)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py", line 145, in call_inspector                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = _post_to_inspector()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 329, in wrapped_f                                                                        
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return self.call(f, *args, **kw)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 409, in call                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     do = self.iter(retry_state=retry_state)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 368, in iter                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise retry_exc.reraise()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 186, in reraise                                                                          
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise self.last_attempt.result()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 439, in result                                                                                
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return self.__get_result()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 391, in __get_result                                                                          
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise self._exception
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 412, in call                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     result = fn(*args, **kwargs)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py", line 142, in _post_to_inspector                                                   
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return requests.post(CONF.inspection_callback_url, data=data,                                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/api.py", line 119, in post                                                                                  
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return request('post', url, data=data, json=json, **kwargs)                                                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/api.py", line 61, in request                                                                                
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return session.request(method=method, url=url, **kwargs)                                                                                                  
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/sessions.py", line 542, in request                                                                          
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = self.send(prep, **send_kwargs)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/sessions.py", line 655, in send                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     r = adapter.send(request, **kwargs)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 516, in send                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise ConnectionError(e, request=request)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent requests.exceptions.ConnectionError: HTTPSConnectionPool(host='fc00:1004::3', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))                                                                            
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent

You can see the metal3 pod node and ip address:

# oc get po -n openshift-machine-api metal3-5cc95d74d8-lqd9x -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP             NODE               NOMINATED NODE   READINESS GATES
metal3-5cc95d74d8-lqd9x   5/5     Running   0          2d16h   fc00:1004::7   e27-h05-000-r650   <none>           <none>

The addresses on the e27-h05-000-r650 node:

[root@e27-h05-000-r650 ~]# ip a | grep "fc00"
    inet6 fc00:1004::4/128 scope global nodad deprecated
    inet6 fc00:1004::7/64 scope global noprefixroute

You can see the api VIP is actually on this host:

[root@e27-h03-000-r650 ~]# ip a | grep "fc00"
    inet6 fc00:1004::3/128 scope global nodad deprecated 
    inet6 fc00:1004::6/64 scope global noprefixroute

Expected results:

Versions:

Hub and SNO OCP 4.12.2

ACM - 2.8.0-DOWNSTREAM-2023-02-28-23-06-27

https://github.com/openshift/assisted-service/pull/5041

Bug OCPBUGS-17787: nodeip-configuration.service fails to enable forwarding - No such file or directory

View the Description View the linked PRs

Description of problem:

nodeip-configuration.service is failed on cluster nodes:

systemctl status nodeip-configuration.service
× nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services select a valid node IP
     Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Tue 2023-08-15 16:28:09 UTC; 18h ago
   Main PID: 3709 (code=exited, status=0/SUCCESS)
        CPU: 237ms

Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3761]: ++ [[ -z bond0.354 ]]
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3761]: ++ echo bond0.354
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: + iface=bond0.354
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: + echo 'Node IP interface determined as: bond0.354. Enabling IP forwarding...'
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: Node IP interface determined as: bond0.354. Enabling IP forwarding...
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: + sysctl -w net.ipv4.conf.bond0.354.forwarding=1
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3767]: sysctl: cannot stat /proc/sys/net/ipv4/conf/bond0/354/forwarding: No such file or directory
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com systemd[1]: nodeip-configuration.service: Control process exited, code=exited, status=1/FAILURE
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com systemd[1]: nodeip-configuration.service: Failed with result 'exit-code'.
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Writes IP address configuration so that kubelet and crio services select a valid node IP.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-08-005757

How reproducible:

so far once

Steps to Reproduce:

1. Deploy multinode spoke cluster with GitOps-ZTP
2. Configure baremetal network to be on top of vlan interface

              - name: bond0.354
                description: baremetal network
                type: vlan
                state: up
                vlan:
                  base-iface: bond0
                  id: 354
                ipv4:
                  enabled: true
                  dhcp: false
                  address:
                  - ip: 10.x.x.20
                    prefix-length: 26
                ipv6:
                  enabled: false
                  dhcp: false
                  autoconf: false

Actual results:

Cluster is deployed but nodeip-configuration.service is Failed

Expected results:

nodeip-configuration.service is Active

https://github.com/openshift/machine-config-operator/pull/3870

Bug OCPBUGS-10139: Update 4.14 thanos image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/thanos/pull/104

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/thanos/pull/104

Bug OCPBUGS-12346: Update 4.14 ose-haproxy-router-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/473

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/473

Bug OCPBUGS-14262: Pipeline metrics page breaks

View the Description View the linked PRs

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

create and start a pipeline and navigate to the Pipeline metrics page

Actual results:

Pipeline metrics page crash

Expected results:

Pipeline metrics page should works

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

4.14.0-0.nightly-2023-05-29-174116

Workaround:

Additional info:

It is regression after this got merged https://github.com/openshift/console/pull/12821/commits/c2d24932cd41b1b4c89d7b9fa5ca46d18b0d2d29#diff-782cbf3ae7050932e76be67d990d9cdaa02e322ea6c2b53083a677ed311ff612R40

https://github.com/openshift/console/pull/12863

Bug MGMT-13111: [Staging] - Deleting host from a cluster --> Host registers again after 15 mins without rebooting it

View the Description View the linked PRs

Description of the problem:

In Staging, deleting host in UI {}> Host re{-} register after ~15 mins

How reproducible:

100%

Steps to reproduce:

1. Before cluster installation, delete random host using UI

2. Wait 15 mins

3. Host re-register without rebooting

Actual results:

Agent automatically register himself after 15 min

Expected results:

Agent should register again after reboot

https://github.com/openshift/assisted-installer-agent/pull/583

Bug OCPBUGS-13939: Prometheus remote write tests are flaky

View the Description View the linked PRs

Description of problem:

The test TestPrometheusRemoteWrite/assert_remote_write_cluster_id_relabel_config_works is flaky and keeps blocking PR merges. After investigation it seems like the timeout to wait for the expected value is simply to short.

https://github.com/openshift/cluster-monitoring-operator/pull/1971

Bug OCPBUGS-16076: Validate cluster name when creating from CLI

View the Description View the linked PRs

Description of problem:

hypershift CLI tool allows any string for cluster name. But later when the cluster is to be imported, it needs to confirm to RFC1123.

So the user needs to read the error, destroy the cluster and then try again with a proper name. This experience can be improved.

Version-Release number of selected component (if applicable):

4.13.4

How reproducible:

Always

Steps to Reproduce:

1. hypershift create cluster kubevirt --name virt-4.12 ...
2. try to import it

Actual results:

cluster fails to import due to its name

Expected results:

validate the cluster name in the hypershift cli, fail early

Additional info:

https://github.com/openshift/hypershift/pull/2906

Bug OCPBUGS-16232: Forced upgrade annotation should take precedence over z-stream upgrade detection

View the Description View the linked PRs

Reported by IBM.

Apparently, they run in such a way that status.Version.Desired.Version is not guaranteed to be a parseable semantic version. Thus isUpgradeble returns an error and blocks upgrade, even if the force upgrade annotation is present.

We should check for the annotation first and if the upgrade is being forced, we don't need to do the z-stream upgrade check.

https://redhat-internal.slack.com/archives/C01C8502FMM/p1689279310050439

https://github.com/openshift/hypershift/pull/2823

Bug OCPBUGS-16807: ccoctl does not error when OIDC and installation resource groups are the same

View the Description View the linked PRs

Description of problem:

ccoctl does not prevent the user from using the same resource group name for the OIDC and installation resource groups which can result in resources existing in the resource group used for cluster installation. The OpenShift installer requires that the installation resource group be empty so OIDC and installation resource groups must be distinct.

ccoctl currently allows for providing either --oidc-resource-group-name and --installation-resource-group name but does not indicate a problem when those resource group names are the same. When the same resource group name is provided using a combination of the --name, --oidc-resource-group-name and --installation-resource-group-name parameters, ccoctl should exit with an error indicating that the resource group names must be different.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Run ccoctl azure create-all with a combination of --name, --oidc-resource-group-name or --installation-resource-group-name resulting in OIDC and installation resource group names being the same.

./ccoctl azure create-all --name "abutchertest" --region centralus --subscription-id "${SUBSCRIPTION_ID}"--credentials-requests-dir "${MYDIR}/credreqs" --oidc-resource-group-name test "abutchertest" --dnszone-resource-group-name "${DNS_RESOURCE_GROUP}"

ccoctl will default the installation resource group to match the provided --name parameter "abutchertest" which results in OIDC and installation resource groups being "abutchertest" since --oidc-resource-group uses the same name. This means that OIDC resources will be created in the resource group that will be configured for the OpenShift installer within the install-config.yaml.

2. Run the OpenShift installer having set .platform.azure.resourceGroupName in the install-config.yaml to be "abutchertest" and receive error that the installation resource group is not empty when running the installer. The resource identified will contain user-assigned managed identities meant to be created in the OIDC resource group which must be separate from the installation resource group.

FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.azure.resourceGroupName: Invalid value: "abutchertest": resource group must be empty but it has 8 resources like...

Actual results:

ccoctl allows OIDC and installation resource group names to be the same.

Expected results:

ccoctl does not allow OIDC and installation resource groups to be the same.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/582

Bug OCPBUGS-12297: Update 4.14 ose-aws-ebs-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/220

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/223

Bug OCPBUGS-14550: Openshift Console does not use Proxy consistenty

View the Description View the linked PRs

Description of problem:

Openshift Console fails to render Monitoring Dashboard when there is a Proxy expected to be used. Additionally, Websocket connections fail due to not using Proxy.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Connect to a cluster using backplane and use one of IT's proxies
2. Execute "ocm backplane console -b"
3. Attempt to view the monitoring dashbaord

Actual results:

Monitoring dashboard fails to load with an EOF error
Terminal is spammed with EOF errors

Expected results:

Monitoring dashboard should be rendered correctly
Terminal should not be spammed with error logs

Additional info:

When we apply changes as this PR, the monitoring dashboard works with proxy https://github.com/openshift/console/pull/12877

https://github.com/openshift/console/pull/12877

Bug OCPBUGS-18278: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/281

Bug OCPBUGS-14859: CPO doesn't skip AWS resource deletion for 'Unknown' OIDC state

View the Description View the linked PRs

Description of problem:

When the OIDC provider is deleted on the customer side, AWS resource deletion is not skipped in cases that the ValidAWSIdentityProvider state is on 'Unknown'.

This results in clusters being stuck during deletion.

Version-Release number of selected component (if applicable):

4.12.z, 4.13.z, 4.14.z

How reproducible:

Irregular

Steps to Reproduce:

1.
2.
3.

Actual results:

Cluster stuck in uninstallation

Expected results:

Clusters not stuck in uninstallation, AWS customer resources being skipped for removal

Additional info:

Added MG for all hypershift related NS

Bug seems to be at https://github.com/openshift/hypershift/pull/2281/files#diff-f90ab1b32c9e1b349f04c32121d59f5e9081ccaf2be490f6782165d2960bc6c7R295 : 'Unknown' needs to be added to the check if OIDC is valid or not.

https://github.com/openshift/hypershift/pull/2691

Bug OCPBUGS-18069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2887

Bug OCPBUGS-3986: PromQL queries of the ""API Performance" dasboard can overload Thanos queriers

View the Description View the linked PRs

Description of problem:

A customer has reported that the Thanos querier pods would be OOM-killed when loading the API performance dashboard with large time ranges (e.g. >= 1 week)

Version-Release number of selected component (if applicable):

4.10

How reproducible:

Always for the customer

Steps to Reproduce:

1. Open the "API performance" dashboard in the admin console.
2. Select a time range of 2 weeks.
3.

Actual results:

The dashboard fails to refresh and the thanos-query pods are killed.

Expected results:

The dashboard loads without error.

Additional info:

The issue arises for the customer because they have very large clusters (hundreds of nodes) which generate lots of metrics.
In practice the queries executed by the dashboard are costly because they access lots of series (probably > tens of thousands). To make it more efficient, the "upstream" dashboard from kubernetes-monitoring/kubernetes-mixin uses recording rules [1] instead of raw queries. While it decreases a bit the accuracy (one can only distinguish between read & write API requests), it's the only solution to avoid overloading the Thanos query endpoint.

[1] https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/05a58f765eda05902d4f7dd22098a2b870f7ca1e/dashboards/apiserver.libsonnet#L50-L75

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1484

Bug OCPBUGS-10387: Infra is not usually labeled in capacity_cpu_core

View the Description View the linked PRs

Description of problem:

In the metric `cluster:capacity_cpu_cores:sum` there is an attribute label `label_node_role_kubernetes_io` that has `infra` or `master`. There is no label for `worker`. If the infra nodes are missing this label, they get added into the "unlabeled" worker nodes. 

For example:
This cluster has all three types `cluster:capacity_cpu_cores:sum{_id="0702a3b1-c2d8-427f-865d-3ce7dc3a2be7"}`

But this cluster has the infra and worker merged. `cluster:capacity_cpu_cores:sum{_id="0e60ac76-d61a-4e6d-a4f3-269110b6b1f9"}`


If I count clusters that have sockets with infra but capacity_cpu without infra, I get 7,617 cluster for 2023-03-15

If I count clusters that have sockets with infra but capacity_cpu with infra, I get 2,015 cluster for 2023-03-15

That means that there are 5602 clusters that are missing the infra label. 

This metric is used to identify the vCPU/CPU count that is used in TeleSense. This is presented to the Sales teams and upper management. If there is another metric we should use, please let me know. Otherwise, this needs to be fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

refer to Slack thread: https://redhat-internal.slack.com/archives/C0VMT03S5/p1678967355450719

https://github.com/openshift/cluster-monitoring-operator/pull/1926

Bug OCPBUGS-14716: Add Red Hat OpenShift Service on AWS branding option

View the Description View the linked PRs

Description of problem:

ROSA is being branded via custom branding; as a result, the favicon disappears since we do not want any Red Hat/Openshift-specific branding to appear when custom branding is in use.  Since ROSA is a Red Hat product, it should get a branding option added to the console so all the correct branding including favicon appears.

Version-Release number of selected component (if applicable):

4.14.0, 4.13.z, 4.12.z, 4.11.z

How reproducible:

Always

Steps to Reproduce:

1.  View a ROSA cluster
2.  Note the absence of the OpenShift logo favicon

Bug OCPBUGS-15794: Missing workload annotation for daemonset cni-sysctl-allowlist-ds

View the Description View the linked PRs

Description of problem:

Daemonset cni-sysctl-allowlist-ds is missing annotation for workload partitioning.

Version-Release number of selected component (if applicable):

How reproducible:

Executing the daemonset shows the pod missing the workload annotation

Steps to Reproduce:

1. Run Daemonset
2.
3.

Actual results:

No workload annotation present.

Expected results:

annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1866

Bug OCPBUGS-16654: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-18428: vSphere Dual-stack IPI not waiting for IPv6 address for KUBELET_NODE_IPS

View the Description View the linked PRs

Description of problem:

vSphere dual-stack added support for both IPv4 and IPv6 in kubelet --node-ip
however the masters are booting without the IPv6 address in --node-ip

"Ignoring filtered route {Ifindex: 2 Dst: <nil> Src: 192.168.130.19 Gw: 192.168.130.1 Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: 192.168.130.0/24 Src: 192.168.130.19 Gw: <nil> Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: fd65:a1a8:60ad:271c::22/128 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: <nil> Src: <nil> Gw: fe80::9eb4:f9fa:2b8d:8372 Flags: [] Table: 254}"

"Writing Kubelet service override with content [Service]\nEnvironment=\"KUBELET_NODE_IP=192.168.130.19\" \"KUBELET_NODE_IPS=192.168.130.19\"\n"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-28-154013

How reproducible:

Intermittent (DHCPv6 related)

Steps to Reproduce:

1. install vsphere dual-stack IPI with DHCPv6


networking:
  clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    - cidr: fd65:10:128::/56
      hostPrefix: 64
  machineNetwork:
    - cidr: 192.168.0.0/16
    - cidr: fd65:a1a8:60ad:271c::/64
  networkType: OVNKubernetes

Actual results:

Masters missing IPv6 address in KUBELET_NODE_IPS

Install fails with

time="2023-08-30T19:54:19Z" level=error msg="failed to initialize the cluster: Cluster operators authentication, console, ingress, monitoring are not available"

Expected results:

Both IPv4 and IPv6 address in KUBELET_NODE_IPS

Install succeeds

Additional info:

Do we set ipv6.may-fail with NetworkManager?

https://github.com/openshift/installer/pull/7467

Bug OCPBUGS-3495: Dynamic plugin requests stale files after upgrade

View the Description View the linked PRs

Description of problem:
After upgrading a plugin image the browser continues to request old plugin files

How reproducible:
100%

Steps to Reproduce:
1. Build and deploy a plugin generated from console-plugin-template repo
2. open one of the plugin pages in the browser
4. Make a change in the code of that page, rebuild and deploy a new image
5. Try to view this page in firefox - you'll get a 404 error. In chrome you'll get the old page

The root cause is
The plugin js file names are auto generated, so the new image has different js file names.
But the plugin-entry.js filename remains the same, the file is cached by default and continues to request the old files

https://github.com/openshift/console/pull/13035

Bug OCPBUGS-11112: openshift-manila-csi-driver is missing the workload.openshift.io/allowed label

View the Description View the linked PRs

Description of problem: The openshift-manila-csi-driver namespace should have the "workload.openshift.io/allowed= management" label.

This is currently not the case:

❯ oc describe ns openshift-manila-csi-driver  
Name:         openshift-manila-csi-driver
Labels:       kubernetes.io/metadata.name=openshift-manila-csi-driver
              pod-security.kubernetes.io/audit=privileged
              pod-security.kubernetes.io/enforce=privileged
              pod-security.kubernetes.io/warn=privileged
Annotations:  include.release.openshift.io/self-managed-high-availability: true
              openshift.io/node-selector: 
              openshift.io/sa.scc.mcs: s0:c24,c4
              openshift.io/sa.scc.supplemental-groups: 1000560000/10000
              openshift.io/sa.scc.uid-range: 1000560000/10000
Status:       Active

No resource quota.

No LimitRange resource.

It is causing CI jobs to fail with:

{  fail [github.com/openshift/origin/test/extended/cpu_partitioning/platform.go:82]: projects [openshift-manila-csi-driver] do not contain the annotation map[workload.openshift.io/allowed:management]
Expected
    <[]string | len:1, cap:1>: [
        "openshift-manila-csi-driver",
    ]
to be empty
Ginkgo exit error 1: exit with code 1}

For instance https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27831/pull-ci-openshift-origin-release-4.13-e2e-openstack-ovn/1641317874201006080.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/353

Bug OCPBUGS-11928: thanos-sidecar panicking on start (incompatible with go1.20)

View the Description View the linked PRs

Description of problem:

thanos-sidecar is panicking after the image was rebuilt in this payload https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-04-18-045408


Example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm/1648276769645531136

Logs:
  - containerID: cri-o://c62dcc73b8203bfd968ffca95bba8607e24a06492948a0179cde6a57a897d431
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a007b49153ee517ab4fe0600d217832bac0fd6152b5a709da291b60c82a5875d
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a007b49153ee517ab4fe0600d217832bac0fd6152b5a709da291b60c82a5875d
    lastState:
      terminated:
        containerID: cri-o://c62dcc73b8203bfd968ffca95bba8607e24a06492948a0179cde6a57a897d431
        exitCode: 2
        finishedAt: '2023-04-18T12:30:20Z'
        message: "panic: Something in this program imports go4.org/unsafe/assume-no-moving-gc\
          \ to declare that it assumes a non-moving garbage collector, but your version\
          \ of go4.org/unsafe/assume-no-moving-gc hasn't been updated to assert that\
          \ it's safe against the go1.20 runtime. If you want to risk it, run with\
          \ environment variable ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.20 set.\
          \ Notably, if go1.20 adds a moving garbage collector, this program is unsafe\
          \ to use.\n\ngoroutine 1 [running]:\ngo4.org/unsafe/assume-no-moving-gc.init.0()\n\
          \t/go/src/github.com/improbable-eng/thanos/vendor/go4.org/unsafe/assume-no-moving-gc/untested.go:25\
          \ +0x1ba\n"
        reason: Error
        startedAt: '2023-04-18T12:30:20Z'
    name: thanos-sidecar
    ready: false
    restartCount: 14
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=thanos-sidecar pod=prometheus-k8s-0_openshift-monitoring(bafeb85b-3980-4153-90bc-a302b93c3465)
        reason: CrashLoopBackOff

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-18-045408

How reproducible:

Always

Steps to Reproduce:

1. Install 4.14.0-0.nightly-2023-04-18-045408

Actual results:

thanos-sidecar panics and cluster doesn't install

Expected results:

Additional info:

https://github.com/openshift/thanos/pull/106

Bug OCPBUGS-12637: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12767

Bug OCPBUGS-17678: OperatorHub page in GUI is throwing 404 error for HyperShift cluster

View the Description View the linked PRs

Description of problem:

Deployed a OCP cluster using hypershift agent with 4.14.0-ec.4 release version on Power.
We are observing that when loading operator hub page in GUI is throwing 404 error

Version-Release number of selected component (if applicable):

OCP 4.14.0-ec.4

How reproducible:

Every time

Steps to Reproduce:

1. Deploy Hypershift cluster 
2. Go to GUI and check OperatorHub
3.

Actual results:

OperatorHub page in GUI is throwing 404 error

Expected results:

OperatorHub page should show Operators

Additional information:

Failure status in olm operator pod from management cluster:

# oc get pod olm-operator-754779f559-846tw -n clusters-hypershift-015 -oyaml

        message: |
          'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          time="2023-08-17T10:58:37Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          time="2023-08-17T10:59:37Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          time="2023-08-17T11:00:37Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          I0817 11:01:33.000390       1 trace.go:205] Trace[2006040218]: "DeltaFIFO Pop Process" ID:system:controller:route-controller,Depth:152,Reason:slow event handlers blocking the queue (17-Aug-2023 11:01:28.947) (total time: 456ms):
          Trace[2006040218]: [456.950035ms] [456.950035ms] END
          2023/08/17 11:01:41 http: TLS handshake error from 10.244.0.10:33355: read tcp 172.17.53.0:8443->10.244.0.10:33355: read: connection reset by peer
        reason: Error
        startedAt: "2023-08-14T11:03:46Z"

Screenshot: https://drive.google.com/file/d/1I_XkX15xEl9ZBtAIZ2yp70twD4z2ASlS/view?usp=sharing

Must gather logs:

https://drive.google.com/file/d/1AkmzC_TUi9z6p13funrSygBm2CgepbpU/view?usp=sharing

https://github.com/openshift/hypershift/pull/2937

Bug OCPBUGS-7546: Default Router PDB Allows 2 Disruptions with 3 Replicas

View the Description View the linked PRs

Description of problem:

maxUnavailable defaults to 50% for anything under 4: https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/operator/controller/ingress/poddisruptionbudget.go#L71

Based on PDB rounding logic, it always rounds to the next while integer, so 1.5 becomes 2.

spec:
  maxUnavailable: 50%
  selector:
    matchLabels:
      ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
  currentHealthy: 3
  desiredHealthy: 1
  disruptionsAllowed: 2

Where as with 4 router pods, we only allow 1 of 4 to be disrupted at a time.

Version-Release number of selected component (if applicable):

4.x

How reproducible:

Always

Steps to Reproduce:

1. Set 3 replicas
2. Look at the disruptionsAllowed on the PDB

Actual results:

You can take down 2 of 3 routers at once, leaving no HA.

Expected results:

With 3+ routers, we should always ensure 2 are up with the PDB.

Additional info:

Reduce the maxUnavailable to 25% for >= 3 pods instead of 4

https://github.com/openshift/cluster-ingress-operator/pull/931

Bug OCPBUGS-15021: Route Metrics page is returning empty page for normal user

View the Description View the linked PRs

Description of problem:y

An empty page returned when normal user try to view Route Metrics page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Check any Routes metrics page with cluster-admin user, for example /k8s/ns/openshift-monitoring/routes/alertmanager-main/metrics, we can see the route metrics page and charts are loaded successfully
2. Grant normal user admin permission on 'openshift-monitoring' project
$ oc adm policy add-role-to-user admin testuser-1 -n openshift-monitoring
clusterrole.rbac.authorization.k8s.io/admin added: "testuser-1"
3. Login with normal user 'testuser-1' and check Networking -> Routes -> alertmanager-main -> Metrics page again

Actual results:

3. empty page is returned

Expected results:

3. If normal user doesn't have ability to view Route Metrics, we should better either hide 'Metrics' tab or show an error message instead of totally empty page

Additional info:

https://github.com/openshift/console/pull/12944

Bug OCPBUGS-17882: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/891

Bug OCPBUGS-17171: Operator catalogs from 4.12 are used in 4.13 and 4.14 hosted clusters

View the Description View the linked PRs

Description of problem:

The operator catalog images used in 4.13 hosted clusters are the ones from 4.12

Version-Release number of selected component (if applicable):

4.13.z

How reproducible:

Always

Steps to Reproduce:

1. Create a 4.13 HostedCluster
2. Inspect the image tags used for catalog imagestreams (oc get imagestreams -n CONTROL_PLANE_NAMESPACE)

Actual results:

image tags point to 4.12 catalog images

Expected results:

image tags point to 4.13 catalog images

Additional info:

These image tags need to be updated: https://github.com/openshift/hypershift/blob/release-4.13/control-plane-operator/controllers/hostedcontrolplane/olm/catalogs.go#L117-L120

https://github.com/openshift/hypershift/pull/2877

Bug OCPBUGS-18026: Rebase Azure CCM for bugfixes

View the Description View the linked PRs

In order to ship a high quality Azure CCM we want to downstream important bugfixes that were recently merged upstream.

https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4217
Dropping the need of https://github.com/openshift/cloud-provider-azure/pull/76 given the above
https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4361

https://github.com/openshift/cloud-provider-azure/pull/75

Bug OCPBUGS-8447: The MCO will not be able to downgrade from 4.14 to 4.13 due to ignition spec issues

View the Description View the linked PRs

Description of problem:

The MCO must have compatibility in place one OCP version in advance if we want to bump ignition spec version, otherwise downgrades will fail.

This is NOT needed in 4.14, only 4.13

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. None atm, this is preventative for the future
2.
3.

Actual results:

N/A

Expected results:

N/A

Additional info:

https://github.com/openshift/machine-config-operator/pull/3576

Bug OCPBUGS-14177: Remove duplication of API calls in vSphere problem detector

View the Description View the linked PRs

As part of single run, we are basically fetching same thing over and over again and hence using API calls that should not even be made.

For example:

1. privilges check verifies permissions of datasore which is also verified by storageclass check. What is more each of those checks fetches datacenter and datastore and results in several duplication API calls.

Exit Critirea:
1. Remove duplicate checks
2. Avoid fetching same API object again and again as part of same system check.

https://github.com/openshift/vsphere-problem-detector/pull/123

Bug OCPBUGS-16508: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/153

Bug OU-117: No response for duplicate query with default disabled status when click 'Hide all queries'

View the Description View the linked PRs

https://github.com/openshift/console/pull/12621

Bug MGMT-14266: [Staging][BE] - BE response when trying to create P/Z cluster with OCP ver 4.10 is unclear

View the Description View the linked PRs

Description of the problem:

In staging, BE 2.18.0, using UI trying to create new cluster with P/Z cpu arch. and OCP 4.10 - getting the following response :

Non x86_64 CPU architectures for version 4.10 are supported only with User Managed Networking

How reproducible:

100%

Steps to reproduce:

Actual results:

Expected results:
Message should be clearer for the user to understand the issue:
p/Z cpu arch. is only supported with OCP ver >= 4.12

https://github.com/openshift/assisted-service/pull/5122

Bug OCPBUGS-10845: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7025

Bug OCPBUGS-1147: HostFirmwareSettings controller keeps reconciling detached hosts

View the Description View the linked PRs

Description of problem:

2022-09-12T13:48:57.505323919Z {"level":"info","ts":1662990537.5052269,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"qe2/master-1-0"}
2022-09-12T13:48:57.566917845Z {"level":"info","ts":1662990537.5668473,"logger":"provisioner.ironic","msg":"no node found, already deleted","host":"qe2~master-1-0"}
2022-09-12T13:48:57.566945972Z {"level":"info","ts":1662990537.566904,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"qe2/master-1-0","provisioningState":"available","requeue":true,"after":600}
2022-09-12T13:49:13.556690278Z {"level":"info","ts":1662990553.556591,"logger":"controllers.HostFirmwareSettings","msg":"start","hostfirmwaresettings":"qe2/master-1-0"}
2022-09-12T13:49:13.614818643Z {"level":"info","ts":1662990553.6147015,"logger":"controllers.HostFirmwareSettings","msg":"retrieving firmware settings and saving to resource","hostfirmwaresettings":"qe2/master-1-0","node":"48d24898-1911-4f43-82b0-0b15f8484ae7"}
2022-09-12T13:49:13.629455616Z {"level":"info","ts":1662990553.6293764,"logger":"controllers.HostFirmwareSettings","msg":"provisioner returns error","hostfirmwaresettings":"qe2/master-1-0","RequeueAfter:":30}

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Detach a BMH
2. Check BMO logs for errors
3. Check Ironic logs for errors

Actual results:

BMO and Ironic logs have errors related to the already deleted node.

Expected results:

No noise in the logs.

Additional info:

https://github.com/openshift/baremetal-operator/pull/259

Bug OCPBUGS-12525: node role is calculated twice in thanos-querier API

View the Description View the linked PRs

Description of problem:

tested https://issues.redhat.com/browse/OCPBUGS-10387 with PR

launch 4.14-ci,openshift/cluster-monitoring-operator#1926 no-spot

3 masters, 3 workers, each node is with 4 cpus, no infra node

$ oc get node
NAME                                         STATUS   ROLES                  AGE   VERSION
ip-10-0-132-193.us-east-2.compute.internal   Ready    control-plane,master   23m   v1.26.2+d2e245f
ip-10-0-135-65.us-east-2.compute.internal    Ready    control-plane,master   23m   v1.26.2+d2e245f
ip-10-0-149-72.us-east-2.compute.internal    Ready    worker                 14m   v1.26.2+d2e245f
ip-10-0-158-0.us-east-2.compute.internal     Ready    worker                 14m   v1.26.2+d2e245f
ip-10-0-229-135.us-east-2.compute.internal   Ready    worker                 17m   v1.26.2+d2e245f
ip-10-0-234-36.us-east-2.compute.internal    Ready    control-plane,master   23m   v1.26.2+d2e245f

labels see below

control-plane: node-role.kubernetes.io/control-plane: ""
master: node-role.kubernetes.io/master: ""
worker: node-role.kubernetes.io/worker: ""

search with "cluster:capacity_cpu_cores:sum" on admin console "Observe -> Metrics", label_node_role_kubernetes_io=master and label_node_role_kubernetes_io="" are both calculated twice

Name                label_beta_kubernetes_io_instance_type    label_kubernetes_io_arch    label_node_openshift_io_os_id    label_node_role_kubernetes_io    prometheus            Value
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12

checked from thanos-querier API, same result with that from console UI(console UI used thanos-querier API)

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=cluster:capacity_cpu_cores:sum' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      },
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "label_node_role_kubernetes_io": "master",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      },
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      },
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "label_node_role_kubernetes_io": "master",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      }
    ]
  }
}

no such issue if we query the expr for "cluster:capacity_cpu_cores:sum" directly

Name                label_beta_kubernetes_io_instance_type    label_kubernetes_io_arch    label_node_openshift_io_os_id    label_node_role_kubernetes_io    prometheus             Value
cluster:capacity_cpu_cores:sum    m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum    m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12

should do deduplication for thanos-querier API

Version-Release number of selected component (if applicable):

tested https://issues.redhat.com/browse/OCPBUGS-10387 with PR

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

node role is calculated twice in thanos-querier API

Expected results:

node role should be calculated only once in thanos-querier API

https://github.com/openshift/thanos/pull/112

Bug OCPBUGS-13308: Conditional update "unknown due to an evaluation failure: client-side throttling" message is not clear

View the Description View the linked PRs

Description of problem:

When updating s390x cluster from 4.10.35 to 4.11.34, i got following message in the UI:

Updating this cluster to 4.11.34 is supported, but not recommended as it might not be optimized for some components in this cluster.

Exposure to KeepalivedMulticastSkew is unknown due to an evaluation failure: client-side throttling: only 9m20.476632575s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution
On OpenStack, oVirt, and vSphere infrastructure, updates to 4.11 can cause degraded cluster operators as a result of a multicast-to-unicast keepalived transition, until all nodes have updated to 4.11. https://access.redhat.com/solutions/7007826

As we discussed on Slack[1] message could be more user friendly, something like this[2]:

"Throttling risk evaluation, 2 risks to evaluate, next evaluation in 9m59s."

[1] https://redhat-internal.slack.com/archives/CEGKQ43CP/p1683621220358259
[2] https://redhat-internal.slack.com/archives/CEGKQ43CP/p1683643286581299?thread_ts=1683621220.358259&cid=CEGKQ43CP

Version-Release number of selected component (if applicable):

4.11.34

How reproducible:

Have a cluster on 4.10.35 or i guess any 4.10.z and update to 4.11.34

Steps to Reproduce:

1. Open webconsole
2. On the dashboard/Overview click on Update cluster
3. Change the channel to stable-4.11
4. Select new version and from the drop down menu click on Include supported but not recommended versions
5. Select 4.11.34
6. Message from the problem description appears

Actual results:

Unclear message

Expected results:

Clear message

https://github.com/openshift/cluster-version-operator/pull/955

Bug OCPBUGS-18179: [4.14] ETCD backup script fails on FIPS enabled cluster

View the Description View the linked PRs

Description of problem:

etcd-backup fails with 'FIPS mode is enabled, but the required OpenSSL library is not available' on 4.13 FIPS enabled cluster

Version-Release number of selected component (if applicable):

OCP 4.13

How reproducible:

Steps to Reproduce:

1. run etcd-backup script on FIPS enabled OCP 4.13
2.
3.

Actual results:

backup script fails with

+ etcdctl snapshot save /home/core/assets/backup/snapshot_2023-08-28_125218.db
FIPS mode is enabled, but the required OpenSSL library is not available

Expected results:

successful run of etcd-backup script

Additional info:

4.13 uses RHEL9-based RHCOS while ETCD image still use RHEL8 and this could be main issue. If so, image should be rebuilt with RHEL9.

https://github.com/openshift/etcd/pull/211

Bug OCPBUGS-16614: Remove techpreview for sts-enablement in the API infra

View the Description View the linked PRs

Description of problem:

STS cluster awareness was in techpreview for testing and assurance of quality before release. The created unit tests and runs have indicated no change in operation to the cluster. QE has reported several bugs and they've been fixed. A periodic e2e test to verify that when an STS cluster is detected and proper AWS resource access tokens are present in the CredentialsRequest a Secret is generated has been passing and has passed when run manually on several follow-on PRs.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-7361: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12644

Bug OCPBUGS-8666: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-11324: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/121

Bug OCPBUGS-8474: Undiagnosed panic in cloud-provider-azure pod

View the Description View the linked PRs

Description of problem:

The Azure CCM will panic when it loses its leader election lease. This is contrary to the behaviour of other components which exit intentionally.

See https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure-modern/1632791244243472384

Version-Release number of selected component (if applicable):

How reproducible:

Force the CCM to lose leader election, can happen during upgrades

Steps to Reproduce:

1.
2.
3.

Actual results:

Code will panic, eg 

E0306 18:09:14.315039       1 runtime.go:77] Observed a panic: leaderelection lost
goroutine 1 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1adc660?, 0x219b9c0})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x81e22e?})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1adc660, 0x219b9c0})
	/usr/lib/golang/src/runtime/panic.go:884 +0x212
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1.1()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:138 +0x27
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1()
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:203 +0x1f
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0002c0d80, {0x21bce08, 0xc0001ac008})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x14d
k8s.io/client-go/tools/leaderelection.RunOrDie({0x21bce08, 0xc0001ac008}, {{0x21c0e00, 0xc0002c0c60}, 0x1fe5d61a00, 0x18e9b26e00, 0x60db88400, {0xc000418080, 0x1fc4978, 0x0}, ...})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:226 +0x94
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1(0xc000170000?, {0x1ea43e2?, 0xd?, 0xd?})
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:130 +0x3a7
github.com/spf13/cobra.(*Command).execute(0xc000170000, {0xc00019e010, 0xd, 0xd})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000170000)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:918
main.main()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/controller-manager.go:47 +0xc5
panic: leaderelection lost [recovered]
	panic: leaderelection lost

goroutine 1 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x81e22e?})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x1adc660, 0x219b9c0})
	/usr/lib/golang/src/runtime/panic.go:884 +0x212
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1.1()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:138 +0x27
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1()
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:203 +0x1f
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0002c0d80, {0x21bce08, 0xc0001ac008})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x14d
k8s.io/client-go/tools/leaderelection.RunOrDie({0x21bce08, 0xc0001ac008}, {{0x21c0e00, 0xc0002c0c60}, 0x1fe5d61a00, 0x18e9b26e00, 0x60db88400, {0xc000418080, 0x1fc4978, 0x0}, ...})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:226 +0x94
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1(0xc000170000?, {0x1ea43e2?, 0xd?, 0xd?})
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:130 +0x3a7
github.com/spf13/cobra.(*Command).execute(0xc000170000, {0xc00019e010, 0xd, 0xd})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000170000)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:918
main.main()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/controller-manager.go:47 +0xc5

Expected results:

Code should exit without panicking

Additional info:

https://github.com/openshift/cloud-provider-azure/pull/57

Task OSASINFRA-3063: Bump MAPO to CAPO 0.7

View the linked PRs

https://github.com/openshift/machine-api-provider-openstack/pull/72

Bug OCPBUGS-15310: Helm Chart installation modal "Documentation" field is always N/A

View the Description View the linked PRs

Description of problem:

The modal displayed when installing a Helm chart shows a Documentation link field. This field can't be ever populated with a value and is always N/A

Annotation for documentation URL doesn't exist in https://github.com/redhat-certification/chart-verifier/blob/main/docs/helm-chart-annotations.md#provider-annotations

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Go to Helm chart catalog
2. View any chart
3. See documentation = "N/A"

Actual results:

N/A

Expected results:

A way to populate the value

Additional info:

The value is consumed here: https://github.com/openshift/console/blob/2e8624014065d09ba40164221dd612d882f20395/frontend/packages/console-shared/src/components/catalog/details/CatalogDetailsPanel.tsx

But it is never extracted from a chart:
https://github.com/openshift/console/blob/2e8624014065d09ba40164221dd612d882f20395/frontend/packages/helm-plugin/src/catalog/utils/catalog-utils.tsx#L138

It is probably because no such annotation exists in chart certification requirements/recommendations:
https://github.com/redhat-certification/chart-verifier/blob/main/docs/helm-chart-annotations.md#provider-annotations

https://github.com/openshift/console/pull/13032

Bug OCPBUGS-20163: Wrong port reported in HostedCluster .status.controlPlaneEndpoint.port

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19674~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

When using a route to expose the API server endpoint in a HostedCluster, the .status.controlPlaneEndpoint.port is reported as 6443 (the internal port) instead of 443 which is the port that is externally exposed via the route.

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with a custom dns name using route as the strategy
3. Inspect .status.controlPlaneEndpoint

Actual results:

It has 6443 as the port

Expected results:

It has 443 as the port

Additional info:

https://github.com/openshift/hypershift/pull/3078

Bug OCPBUGS-10191: Update 4.14 csi-driver-manila image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/188

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/188

Bug OCPBUGS-12825: 4.14 prometheus image should be built with go1.20

View the Description View the linked PRs

Description of problem:

based on bugs from ART team, example: https://issues.redhat.com/browse/OCPBUGS-12347, 4.14 image should be built with go 1.20, but prometheus container image is built by go1.19.6

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/goversion/values' | jq
{
  "status": "success",
  "data": [
    "go1.19.6",
    "go1.20.3"
  ]
}

searched from thanos API

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query={__name__=~".*",goversion="go1.19.6"}' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "prometheus_build_info",
          "branch": "rhaos-4.14-rhel-8",
          "container": "kube-rbac-proxy",
          "endpoint": "metrics",
          "goarch": "amd64",
          "goos": "linux",
          "goversion": "go1.19.6",
          "instance": "10.128.2.19:9092",
          "job": "prometheus-k8s",
          "namespace": "openshift-monitoring",
          "pod": "prometheus-k8s-0",
          "prometheus": "openshift-monitoring/k8s",
          "revision": "fe01b9f83cb8190fc8f04c16f4e05e87217ab03e",
          "service": "prometheus-k8s",
          "tags": "unknown",
          "version": "2.43.0"
        },
        "value": [
          1682576802.496,
          "1"
        ]
      },
...

prometheus-k8s-0 container name: [prometheus config-reloader thanos-sidecar prometheus-proxy kube-rbac-proxy kube-rbac-proxy-thanos], prometheus image is built with go1.19.6

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- prometheus --version
prometheus, version 2.43.0 (branch: rhaos-4.14-rhel-8, revision: fe01b9f83cb8190fc8f04c16f4e05e87217ab03e)
  build user:       root@402ffbe02b57
  build date:       20230422-00:43:08
  go version:       go1.19.6
  platform:         linux/amd64
  tags:             unknown

$ oc -n openshift-monitoring exec -c config-reloader prometheus-k8s-0 -- prometheus-config-reloader --version
prometheus-config-reloader, version 0.63.0 (branch: rhaos-4.14-rhel-8, revision: ce71a7d)
  build user:       root
  build date:       20230424-15:53:51
  go version:       go1.20.3
  platform:         linux/amd64

$ oc -n openshift-monitoring exec -c thanos-sidecar prometheus-k8s-0 -- thanos --version
thanos, version 0.31.0 (branch: rhaos-4.14-rhel-8, revision: d58df6d218925fd007e16965f50047c9a4194c42)
  build user:       root@c070c5e6af32
  build date:       20230422-00:44:21
  go version:       go1.20.3
  platform:         linux/amd64


# owned by oauth team, not responsible by Monitoring
$ oc -n openshift-monitoring exec -c prometheus-proxy prometheus-k8s-0 -- oauth-proxy --version
oauth2_proxy was built with go1.18.10

# below isssue is tracked by bug OCPBUGS-12821
$ oc -n openshift-monitoring exec -c kube-rbac-proxy prometheus-k8s-0 -- kube-rbac-proxy --version
Kubernetes v0.0.0-master+$Format:%H$

$ oc -n openshift-monitoring exec -c kube-rbac-proxy-thanos prometheus-k8s-0 -- kube-rbac-proxy --version
Kubernetes v0.0.0-master+$Format:%H$

should fix files
https://github.com/openshift/prometheus/blob/master/.ci-operator.yaml#L4
https://github.com/openshift/prometheus/blob/master/Dockerfile.ocp#L1

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-26-154754

How reproducible:

always

Actual results:

4.14 prometheus is built with go1.19.6

Expected results:

4.14 prometheus image should be built with go1.20

Additional info:

no functional impact

https://github.com/openshift/prometheus/pull/160

Bug OCPBUGS-13158: Run in-cluster disruption tests

View the Description View the linked PRs

Along with external disruption tests via api DNS we should also check that apiserver is not disrupted via api-int and service network endpoints

Ref: https://issues.redhat.com/browse/API-1526

Bug OCPBUGS-17054: Nutanix: CCM should scope secret informers per namespace

View the Description View the linked PRs

Description of problem:

The CCMs at the moment are given RBAC permissions of "get, list, watch" on secrets across all namespaces. This was a security concern raised by the OpenShift Security team. 

In Nutanix CCM, it currently creates a secrets informer and a configmaps informer at the cluster scope, these are then passed into the NewProvider call for the prism environment. Within the prism environment, the configmap and secret informers are used once each, and only to list a single namespace. We should modify the informers creation to limit to just the namespaces required? This would reduce the scope of RBAC required and meet the OpenShift security requirements.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-nutanix/pull/17

Bug OCPBUGS-19361: Expose and propagate TopologySpreadConstraints for admission webhook

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19356~~. The following is the description of the original issue:
—
Backport facilitator for linked issue.

https://github.com/openshift/cluster-monitoring-operator/pull/2088

Bug OCPBUGS-18034: Don't update lastTransitionTime unless condition status changes

View the Description View the linked PRs

Description of problem:

Fixed by @wking, opening bug for Jira linking.

The cluster-dns-operator sets the status condition's lastTransitionTime whenever the status (true, false, unknown), reason, or message changed on a condition.  

It should only set the lastTransitionTime if the condition status changes. Otherwise this can have an affect on status flapping between true and false.  See https://github.com/openshift/api/blob/master/config/v1/types_cluster_operator.go#L129

Version-Release number of selected component (if applicable):

4.15 and earlier

How reproducible:

100%

Steps to Reproduce:

1. Put cluster-dns-operator in a Degraded condition by stopping a pod, notice the lastTransitionTime
2. Wait 1 second and stop another pod, which only updates the condition message

Actual results:

Notice the lastTransitionTime for the Degraded condition changes when the message changes, even though the status is still Degraded=true

Expected results:

The lastTransitionTime should not change unless the Degraded status changes, not the message or reason.

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/375

Bug OCPBUGS-10232: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12558

Bug OCPBUGS-17680: hypershift 4.13 prow jobs failed because of spec.pullSecret is immutable now

View the Description View the linked PRs

Description of problem:

# QE prow CI job update hostedcluster.spec.pullSecret for some qe catalog source configurations. 4.13 jobs failed with error msg:

Error from server (HostedCluster.spec.pullSecret.name: Invalid value: "9509a26c339de31aa3c9-pull-secret-new": Attempted to change an immutable field): admission webhook "hostedclusters.hypershift.openshift.io" denied the request: HostedCluster.spec.pullSecret.name: Invalid value: "9509a26c339de31aa3c9-pull-secret-new": Attempted to change an immutable field

Version-Release number of selected component (if applicable):

4.13

How reproducible:

4.13 job:
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/41339/rehearse-41339-periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-aws-ipi-ovn-hypershift-guest-p1-f7/1689831180221812736

Steps to Reproduce:

see the above job

Actual results:

job failed to config pull secret for hostedcluster

Expected results:

job could run successfully

Additional info:

1. The 4.14  hypershift QE CI jobs were successfully executed with the same codes.
2. I can update 4.13 hostedcluster spec.pullSecret in my local hypershift env.

It seems to be caused by some limitation only in prow?

slack thread: https://redhat-internal.slack.com/archives/C01C8502FMM/p1691736890938529

https://github.com/openshift/hypershift/pull/2910

Bug OCPBUGS-16348: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/756

Bug OCPBUGS-15940: Storage operator stuck with Upgradable=Unknown

View the Description View the linked PRs

Description of problem:

TRT has unfortunately had to revert this breaking change to get CI and/or nightly payloads flowing again. 
The original PR was https://github.com/openshift/cluster-storage-operator/pull/381.
The revert PR: https://github.com/openshift/cluster-storage-operator/pull/384

The following evidence helped us pushing for the revert:
In the nightly payload runs, periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm has been consistently failing in the last three nightly payloads. But the run in the revert PR passed.

To restore your change, create a new PR that reverts the revert and layers additional separate commit(s) on top that addresses the problem.

Contact information for TRT is available at https://source.redhat.com/groups/public/atomicopenshift/atomicopenshift_wiki/how_to_contact_the_technical_release_team. Please reach out if you need assistance in relanding your change or have feedback about this process.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/385

Bug OCPBUGS-15997: Machine with AZ and rootVolume but no volume AZ can't be created if Cinder AZs != Nova AZs

View the Description View the linked PRs

Description of problem:

When a machine is created with a compute availability zone (defined via mpool.zones) and a storage root volume (defined as mpool.rootVolume) and that rootVolume has no specified zones, CAPO will use the compute AZ for the volume AZ.

This can be problematic if the AZ doesn't exist in Cinder.
Source:

https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/9d183bd479fe9aed4f6e7ac3d5eee46681c518e7/pkg/cloud/services/compute/instance.go#L439-L442

Version-Release number of selected component (if applicable):

All versions supporting rootVolume AZ.

Steps to Reproduce:

1. In install-config.yaml, add "zones" with valid Nova AZs, and a rootVolume without "zones". Your OpenStack cloud must not have Cinder AZs (only Nova AZs)
2. Day 1 deployment will go fine, Terraform will create the machines with no AZ.
3. Day 2 operation on machines will fail since CAPO tries to use the Nova AZ for the root volume if no volume AZ is provided, but since the AZ don't match between Cinder & Nova, the machine will never be created

Actual results:

Machine not created

Expected results:

Machine created in the right AZ for both Nova & Cinder

https://github.com/openshift/installer/pull/7309

Bug OCPBUGS-7282: node_exporter shouldn't collect metrics for Calico Virtual NICs

View the Description View the linked PRs

Description of problem:

- Calico Virtual NICs should be excluded from node_exporter collector.
- All NICs beginning with cali* should be added to collector.netclass.ignored-devices to ensure that metrics are not collected.
- node_exporter is meant to collect metrics for physical interfaces only.

Version-Release number of selected component (if applicable):

OpenShift 4.12

How reproducible:

Always

Steps to Reproduce:

Run an OpenShift cluster using Calico SDN.
Observe -> Metrics -> Run the following PromQL query: "group by(device) (node_network_info)"
Observe that Calico Virtual NICs present.

Actual results:

Calico Virtual NICs present in OCP Metrics.

Expected results:

Only physical network interfaces should be present.

Additional info:

Similar to this bug, but for Calico virtual NICs: https://issues.redhat.com/browse/OCPBUGS-1321

https://github.com/openshift/cluster-monitoring-operator/pull/1905

Bug OCPBUGS-15239: python-grpcio and python-protobuf are unneeded dependencies

View the Description View the linked PRs

We've removed SR-IOV code that was using python-grpcio and python-protobuf. These are gone from Python's requirements.txt, but we never removed them from RPM spec we use to build Kuryr in OpenShift. This should be fixed.

https://github.com/openshift/kuryr-kubernetes/pull/734

Bug OCPBUGS-15961: ovn-k8s-cni-overlay: /lib64/libc.so.6: version `GLIBC_2.34' not found on 4.12-to-4.13

View the Description View the linked PRs

Description of problem:

When updating from 4.12 to 4.13, the incoming ovn-k8s-cni-overlay expects RHEL 9, and fails to run on the still-RHEL-8 4.12 nodes.

Version-Release number of selected component (if applicable):

4.13 and 4.14 ovn-k8s-cni-overlay vs. 4.12 RHCOS's RHEL 8.

How reproducible:

100%

Steps to Reproduce:

Picked up in TestGrid.

Actual results:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade/1677232369326624768/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/nodes/ci-op-y7r1x9z3-3a480-9swt7-master-2/journal | zgrep dns-operator | tail -n1
Jul 07 12:34:30.202100 ci-op-y7r1x9z3-3a480-9swt7-master-2 kubenswrapper[2168]: E0707 12:34:30.201720    2168 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"dns-operator-78cbdc89fd-kckcd_openshift-dns-operator(5c97a52b-f774-40ae-8c17-a17b30812596)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"dns-operator-78cbdc89fd-kckcd_openshift-dns-operator(5c97a52b-f774-40ae-8c17-a17b30812596)\\\": rpc error: code = Unknown desc = failed to create pod network sandbox k8s_dns-operator-78cbdc89fd-kckcd_openshift-dns-operator_5c97a52b-f774-40ae-8c17-a17b30812596_0(1fa1dd2b35100b0f1ec058d79042a316b909e38711fcadbf87bd9a1e4b62e0d3): error adding pod openshift-dns-operator_dns-operator-78cbdc89fd-kckcd to CNI network \\\"multus-cni-network\\\": plugin type=\\\"multus\\\" name=\\\"multus-cni-network\\\" failed (add): [openshift-dns-operator/dns-operator-78cbdc89fd-kckcd/5c97a52b-f774-40ae-8c17-a17b30812596:ovn-kubernetes]: error adding container to network \\\"ovn-kubernetes\\\": netplugin failed: \\\"/var/lib/cni/bin/ovn-k8s-cni-overlay: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /var/lib/cni/bin/ovn-k8s-cni-overlay)\\\\n/var/lib/cni/bin/ovn-k8s-cni-overlay: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /var/lib/cni/bin/ovn-k8s-cni-overlay)\\\\n\\\"\"" pod="openshift-dns-operator/dns-operator-78cbdc89fd-kckcd" podUID=5c97a52b-f774-40ae-8c17-a17b30812596

Expected results:

Successful update.

Additional info:

Both 4.14 and 4.13 control planes can be associated with 4.12 compute nodes, because of EUS-to-EUS updates.

https://github.com/openshift/cluster-network-operator/pull/1901

Bug OCPBUGS-19623: Limit multus pod watch to pods on the local node

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19550~~. The following is the description of the original issue:
—
Multus doesn't need to watch pods on other nodes. To save memory and CPU set MULTUS_NODE_NAME to filter pods that multus watches.

https://github.com/openshift/cluster-network-operator/pull/2022

Bug OCPBUGS-19862: Multus annotation permissions: CNO should configure 24h cert for multus [backport 4.14]

View the Description View the linked PRs

Description of problem: Multus currently implements a certificate that exists for 10 minutes, we need to add configuration for certificates for 24 hours

https://github.com/openshift/cluster-network-operator/pull/2040

Bug OCPBUGS-11671: ccoctl cannot create STS documents in 4.10-4.13 due to s3 policy changes

View the Description View the linked PRs

Description of problem:

Similar to OCPBUGS-11636 ccoctl needs to be updated to account for the s3 bucket changes described in https://aws.amazon.com/blogs/aws/heads-up-amazon-s3-security-changes-are-coming-in-april-of-2023/

these changes have rolled out to us-east-2 and China regions as of today and will roll out to additional regions in the near future

See OCPBUGS-11636 for additional information

Version-Release number of selected component (if applicable):

How reproducible:

Reproducible in affected regions.

Steps to Reproduce:

1. Use "ccoctl aws create-all" flow to create STS infrastructure in an affected region like us-east-2. Notice that document upload fails because the s3 bucket is created in a state that does not allow usage of ACLs with the s3 bucket.

Actual results:

./ccoctl aws create-all --name abutchertestue2 --region us-east-2 --credentials-requests-dir ./credrequests --output-dir _output
2023/04/11 13:01:06 Using existing RSA keypair found at _output/serviceaccount-signer.private
2023/04/11 13:01:06 Copying signing key for use by installer
2023/04/11 13:01:07 Bucket abutchertestue2-oidc created
2023/04/11 13:01:07 Failed to create Identity provider: failed to upload discovery document in the S3 bucket abutchertestue2-oidc: AccessControlListNotSupported: The bucket does not allow ACLs
        status code: 400, request id: 2TJKZC6C909WVRK7, host id: zQckCPmozx+1yEhAj+lnJwvDY9rG14FwGXDnzKIs8nQd4fO4xLWJW3p9ejhFpDw3c0FE2Ggy1Yc=

Expected results:

"ccoctl aws create-all" successfully creates IAM and S3 infrastructure. OIDC discovery and JWKS documents are successfully uploaded to the S3 bucket and are publicly accessible.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/526

Bug OCPBUGS-12913: CI fails on TestRouterCompressionOperation

View the Description View the linked PRs

Description of problem

CI is flaky because the TestRouterCompressionOperation test fails.

Version-Release number of selected component (if applicable)

I have seen these failures on 4.14 CI jobs.

How reproducible

Presently, search.ci reports the following stats for the past 14 days:

Found in 7.71% of runs (16.58% of failures) across 402 total runs and 24 jobs (46.52% failed)

GCP is most impacted:

pull-ci-openshift-cluster-ingress-operator-master-e2e-gcp-operator (all) - 44 runs, 86% failed, 37% of failures match = 32% impact

Azure and AWS are also impacted:

pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator (all) - 36 runs, 64% failed, 43% of failures match = 28% impact

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 38 runs, 79% failed, 23% of failures match = 18% impact

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check https://search.ci.openshift.org/?search=compression+error%3A+expected&maxAge=336h&context=1&type=build-log&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job.

Actual results

The test fails:

TestAll/serial/TestRouterCompressionOperation 
=== RUN   TestAll/serial/TestRouterCompressionOperation
    router_compression_test.go:209: compression error: expected "gzip", got "" for canary route

Expected results

CI passes, or it fails on a different test.

https://github.com/openshift/cluster-ingress-operator/pull/920

Bug OCPBUGS-17487: wrong annotation for ThanosRulerConfig.Resources in code

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/pkg/manifests/types.go#L648-L649

// Defines resource requests and limits for the Alertmanager container.

should be

// Defines resource requests and limits for the Thanos Ruler container.

https://github.com/openshift/cluster-monitoring-operator/pull/2070

Bug OCPBUGS-12313: Update 4.14 kube-rbac-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/66

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-rbac-proxy/pull/66

Bug OCPBUGS-18072: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2895

Bug OCPBUGS-16071: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/263

Bug OCPBUGS-16496: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/90

Bug USHIFT-1300: openshift-tests panic when checking cluster infrastructure

View the Description View the linked PRs

Description of problem:

Since the introduction of https://github.com/openshift/origin/pull/27570 the openshift-tests binary now looks for the cluster infra resource for later usage (setting TEST_PROVIDER env var when running run-test command to inject details about the cluster). Since microshift does not have this resource the returned value is nil and it panics when its used later in the code.

Version-Release number of selected component (if applicable):

How reproducible:

Run openshift-tests and it immediately panics

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27964

Bug OCPBUGS-11352: --external-cloud-volume-plugin for out-of tree providers

View the Description View the linked PRs

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/721

Bug OCPBUGS-16150: Start last pipeline run results to error message in topology side bar

View the Description View the linked PRs

Description of problem:

In topology side panel, in pipelineruns section, on click of "Start last run" button, error alert message is displayed

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a deployment with pipeline
2. Click on deployment to open side panel
3. Click "Start last run" button in PipelineRuns section

Actual results:

Error alert message is displayed

Expected results:

Should be able to run the last run

Additional info:

https://github.com/openshift/console/pull/13009

Bug OCPBUGS-15338: Race condition in failure domain mapping with on-delete policy

View the Description View the linked PRs

Description of problem:

We have seen unit tests flaking on the mapping within the OnDelete policy tests for the control plane machine set.

It turns out there is a race condition, and, given the right timing, if a reconcile is in progress while a machine is marked for deletion, the load balancing part of the algorithm fails to properly apply

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/220

Story HOSTEDCP-1079: Add Dockerfile that references public, productized base images for HyperShift

View the Description View the linked PRs

The Dockerfile should not reference any CI images.

https://github.com/openshift/hypershift/pull/2857

Bug OCPBUGS-10509: Sync "Debug in Terminal" feature with 3.x pods in web console

View the Description View the linked PRs

Description of problem:

Sync "Debug in Terminal" feature with 3.x pods in web console
The types of pods that enable the "Debug in terminal" feature should be in alignment with those in v3.11. See code here: https://github.com/openshift/origin-web-console/blob/c37982397087036321312172282e139da378eff2/app/scripts/directives/resources.js#L33-L53

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12657

Bug OCPBUGS-17719: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/591

Bug OCPBUGS-18608: Windows kubelet build is broken

View the Description View the linked PRs

Description of problem:

UPSTREAM: <carry>: Force using host go always and use host libriaries introduced a build failure for the Windows kubelet that is showing up only in release-4.11 for an unknown reason but could potentially occur on other releases too.

Version-Release number of selected component (if applicable):

WMCO version: 9.0.0 and below

How reproducible:

Always on release-4.11

Steps to Reproduce:

1. Clone the WMCO repo
2. Build the WMCO image

Actual results:

WMCO image build fails

Expected results:

 WMCO image build should succeed

https://github.com/openshift/kubernetes/pull/1688

Bug OCPBUGS-9991: Most of contents are lack of i18n on "Command Line Tools" page

View the Description View the linked PRs

Description of problem:

Most contents on "Command Line Tools" page are not i18n.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-10-165006

How reproducible:

Always

Steps to Reproduce:

1.Go to "?"-> "Command Line Tools" page. Add "?pseudolocalization=true&lng=en" at the end of the url. Check if all contents are i18n.
2.
3.

Actual results:

1. Most of contents are not i18n.

Expected results:

1.All contents should be i18n.

Additional info:

https://github.com/openshift/console/pull/12995

Bug OCPBUGS-10164: Update 4.14 ose-cluster-capi-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/104

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/104

Bug OCPBUGS-3680: 4.15: Upgrade blocked: csi-snapshot-controller fails with read-only filesystem

View the Description View the linked PRs

Description of problem:

OCP upgrade blocks because of cluster operator csi-snapshot-controller fails to start its deployment with a fatal message of read-only filesystem

Version-Release number of selected component (if applicable):

Red Hat OpenShift 4.11
rhacs-operator.v3.72.1

How reproducible:

At least once in user's cluster while upgrading

Steps to Reproduce:

1. Have a OCP 4.11 installed
2. Install ACS on top of the OCP cluster
3. Upgrade OCP to the next z-stream version

Actual results:

Upgrade gets blocked: waiting on csi-snapshot-controller

Expected results:

Upgrade should succeed

Additional info:

stackrox SCCs (stackrox-admission-control, stackrox-collector and stackrox-sensor) contain the `readOnlyRootFilesystem` set to `true`, if not explicitly defined/requested, other Pods might receive this SCC which will make the deployment to fail with a `read-only filesystem` message

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/154

Bug OCPBUGS-5461: [IPI on BareMetal]: Workers failing inspection when installing with proxy

View the Description View the linked PRs

Description of problem:

When installing a 3 master + 2 worker BM IPv6 cluster with proxy, worker BMHs are failing inspection with the message: "Could not contact ironic-inspector for version discovery: Unable to find a version discovery document". This causes the installation to fail due to nodes with worker role never joining the cluster. However, when installing with no workers, the issue does not reproduce and the cluster installs successfully.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-01-04-203333

How reproducible:

100%

Steps to Reproduce:

1. Attempt to install an IPv6 cluster with 3 masters + 2 workers and proxy with baremetal installer

Actual results:

Installation never completes because a number of pods are in Pending status

Expected results:

Workers join the cluster and installation succeeds

Additional info:

$ oc get events
LAST SEEN   TYPE     REASON              OBJECT                               MESSAGE
174m        Normal   InspectionError     baremetalhost/openshift-worker-0-1   Failed to inspect hardware. Reason: unable to start inspection: Could not contact ironic-inspector for version discovery: Unable to find a version discovery document at https://[fd2e:6f44:5dd8::37]:5050, the service is unavailable or misconfigured. Required version range (any - any), version hack disabled.
174m        Normal   InspectionError     baremetalhost/openshift-worker-0-0   Failed to inspect hardware. Reason: unable to start inspection: Could not contact ironic-inspector for version discovery: Unable to find a version discovery document at https://[fd2e:6f44:5dd8::37]:5050, the service is unavailable or misconfigured. Required version range (any - any), version hack disabled.
174m        Normal   InspectionStarted   baremetalhost/openshift-worker-0-0   Hardware inspection started
174m        Normal   InspectionStarted   baremetalhost/openshift-worker-0-1   Hardware inspection started

https://github.com/openshift/cluster-baremetal-operator/pull/322

Bug OCPBUGS-14064: BMO is sharing the same pod as Ironic

View the Description View the linked PRs

This is actually a better design since BMO does not need to be coupled with Ironic (unlike Ironic and httpd, for example). But the current architecture also has two real issues:

BMO needs to know the IP address of Ironic, which causes a chicken-and-egg problem: the IP is not known until the pod starts.
Since BMO is a part of the Metal3 pod, it also uses host networking and other privileges. For example, the webhook port is exposed externally.

The main thing to fix is to make BMO talk to Ironic via its external IP instead of localhost.

https://github.com/openshift/cluster-baremetal-operator/pull/342

Bug OCPBUGS-14757: images: RHEL-8 container image is missing `xz`

View the Description View the linked PRs

Description of problem:

RHEL-7 already comes with {{xz}} installed but in RHEL-8 it needs to explicitly installed.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Use an image based on Dockerfile.upi.ci.rhel8
2. Trigger a CI job that uses the xz tool
3.

Actual results:

/bin/sh: xz: command not found
tar: /tmp/secret/terraform_state.tar.xz: Wrote only 4096 of 10240 bytes
tar: Child returned status 127
tar: Error is not recoverable: exiting now

Expected results:

no errors

Additional info:

Step: https://github.com/openshift/release/blob/master/ci-operator/step-registry/upi/install/vsphere/upi-install-vsphere-commands.sh#L185

And investigation by Jinyun Ma: https://github.com/openshift/release/pull/39991#issuecomment-1581937323

https://github.com/openshift/installer/pull/7238

Bug OCPBUGS-7249: Machine and respective Node should indicate proper zones

View the Description View the linked PRs

Description of problem:

Machine and respective Node should indicate proper zones, but machine doesn’t indicate proper zones on multiple vCenter zones cluster

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-07-064924

How reproducible:

always

Steps to Reproduce:

1.Create a multiple vCenter zones cluster 

sh-4.4$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-02-07-064924   True        False         58m     Cluster version is 4.13.0-0.nightly-2023-02-07-064924
sh-4.4$ oc get machine
NAME                           PHASE     TYPE   REGION    ZONE   AGE
jima15b-x4584-master-0         Running          us-east          88m
jima15b-x4584-master-1         Running          us-east          88m
jima15b-x4584-master-2         Running          us-west          88m
jima15b-x4584-worker-0-26hml   Running          us-east          81m
jima15b-x4584-worker-1-zljp8   Running          us-east          81m
jima15b-x4584-worker-2-kkdzf   Running          us-west          81m

2.Check machine labels and node labels 
sh-4.4$ oc get machine jima15b-x4584-worker-0-26hml -oyaml 
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: poweredOn
  creationTimestamp: "2023-02-09T02:28:03Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: jima15b-x4584-worker-0-
  generation: 2
  labels:
    machine.openshift.io/cluster-api-cluster: jima15b-x4584
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: jima15b-x4584-worker-0
    machine.openshift.io/region: us-east
    machine.openshift.io/zone: ""
  name: jima15b-x4584-worker-0-26hml
  namespace: openshift-machine-api

sh-4.4$ oc get node jima15b-x4584-worker-0-26hml --show-labels
NAME                           STATUS   ROLES    AGE    VERSION           LABELS
jima15b-x4584-worker-0-26hml   Ready    worker   9m4s   v1.26.0+9eb81c2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east,failure-domain.beta.kubernetes.io/zone=us-east-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=jima15b-x4584-worker-0-26hml,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos,topology.csi.vmware.com/openshift-region=us-east,topology.csi.vmware.com/openshift-zone=us-east-1a,topology.kubernetes.io/region=us-east,topology.kubernetes.io/zone=us-east-1a

Actual results:

Machine doesn’t indicate proper zone, it's machine.openshift.io/zone: ""

Expected results:

Machine should indicate proper zone

Additional info:

Discussed here https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1675848293159359

https://github.com/openshift/machine-api-operator/pull/1126

Bug OCPBUGS-2765: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-12244: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12794

Bug OCPBUGS-12769: CPMSO tests: out of memory errors on linting job

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/203

Bug OCPBUGS-13153: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1996

Bug OCPBUGS-15978: DNSReady is True even dns records failed to be published to public zone

View the Description View the linked PRs

Description of problem:

when checking the bug https://issues.redhat.com/browse/OCPBUGS-15976, found that the default ingresscontroller DNSReady is True even dns records failed to be published to public zone, the co/ingress doesn't report any error.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-05-191022

How reproducible:

100%

Steps to Reproduce:

1. install Azure cluster configured for manual mode with Azure Workload Identity 

2. check dnsrecords of default-wildcard
$ oc -n openshift-ingress-operator get dnsrecords default-wildcard -oyaml
<---snip--->
  - conditions:
    - lastTransitionTime: "2023-07-10T04:23:55Z"
      message: 'The DNS provider failed to ensure the record: failed to update dns ......
      reason: ProviderError
      status: "False"
      type: Published
    dnsZone:
      id: /subscriptions/xxxxx/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/qe.azure.devcluster.openshift.com

3. Check ingresscontroller status
$ oc -n openshift-ingress-operator get ingresscontroller default -oyaml
<---snip--->
  - lastTransitionTime: "2023-07-10T04:23:55Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady

4. Check co/ingress status
$ oc get co/ingress
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.14.0-0.nightly-2023-07-05-191022   True        False         False      127m

Actual results:

1. DNSReady is True and message shows: The record is provisioned in all reported zones.
2. co/ingress doesn't report any error

Expected results:

DNSReady should be False since failed to publish to public zone

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/967

Bug OCPBUGS-19362: Hide the DeploymentConfig option in the User Preferences

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19314~~. The following is the description of the original issue:
—

Description

As a user, I dont want to see the option of "DeploymentConfigs" in the User settings, when I have not installed the same in the cluster.

Acceptance Criteria

Hide the DeploymentConfig option as the Default Resource Type when its not installed

Additional Details:

https://github.com/openshift/console/pull/13164

Bug OCPBUGS-17913: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13096

Bug OCPBUGS-14922: cluster operator monitoring is not available when deploying 4.14 spoke when console operator is disabled

View the Description View the linked PRs

Description of problem:

When deploying 4.14 spoke, agentclusterinstall is stuck at finalizing stage

clusterverions on spoke report "Unable to apply 4.14.0-0.ci-2023-06-13-083232: the cluster operator monitoring is not available"

Please note: console operator is disabled purposely - it is needed in telco case to reduce platform resource usage

[kni@registry.kni-qe-28 ~]$ oc get clusterversions.config.openshift.io -A
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          46m     Unable to apply 4.14.0-0.ci-2023-06-13-083232: the cluster operator monitoring is not available

[kni@registry.kni-qe-28 ~]$ oc get clusterversions.config.openshift.io -n version -o yaml 
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2023-06-13T15:16:32Z"
    generation: 2
    name: version
    resourceVersion: "20061"
    uid: f8fc0c3e-009d-4d86-a05d-2fd0aba59528
  spec:
    capabilities:
      additionalEnabledCapabilities:
      - marketplace
      - NodeTuning
      baselineCapabilitySet: None
    channel: stable-4.14
    clusterID: 5cfc0491-5a23-4383-935b-71e3c793e875
  status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - NodeTuning
      - marketplace
      knownCapabilities:
      - CSISnapshot
      - Console
      - Insights
      - NodeTuning
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
    conditions:
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: 'Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.14&id=5cfc0491-5a23-4383-935b-71e3c793e875&version=4.14.0-0.ci-2023-06-13-083232":
        dial tcp 54.211.39.83:443: connect: network is unreachable'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: Capabilities match configured spec
      reason: AsExpected
      status: "False"
      type: ImplicitlyEnabledCapabilities
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: Payload loaded version="4.14.0-0.ci-2023-06-13-083232" image="registry.kni-qe-28.ptp.lab.eng.bos.redhat.com:5000/openshift-release-dev/ocp-release@sha256:826bb878c5a1469ee8bb991beebc38a4e25b8f5cef9cdf1931ef99ffe5ffbc80"
        architecture="amd64"
      reason: PayloadLoaded
      status: "True"
      type: ReleaseAccepted
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2023-06-13T15:41:36Z"
      message: Cluster operator monitoring is not available
      reason: ClusterOperatorNotAvailable
      status: "True"
      type: Failing
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: 'Unable to apply 4.14.0-0.ci-2023-06-13-083232: the cluster operator
        monitoring is not available'
      reason: ClusterOperatorNotAvailable
      status: "True"
      type: Progressing
    desired:
      image: registry.kni-qe-28.ptp.lab.eng.bos.redhat.com:5000/openshift-release-dev/ocp-release@sha256:826bb878c5a1469ee8bb991beebc38a4e25b8f5cef9cdf1931ef99ffe5ffbc80
      version: 4.14.0-0.ci-2023-06-13-083232
    history:
    - completionTime: null
      image: registry.kni-qe-28.ptp.lab.eng.bos.redhat.com:5000/openshift-release-dev/ocp-release@sha256:826bb878c5a1469ee8bb991beebc38a4e25b8f5cef9cdf1931ef99ffe5ffbc80
      startedTime: "2023-06-13T15:16:33Z"
      state: Partial
      verified: false
      version: 4.14.0-0.ci-2023-06-13-083232
    observedGeneration: 2
    versionHash: H6tRc6p_ZWU=
kind: List
metadata:
  resourceVersion: ""

[kni@registry.kni-qe-28 ~]$ oc get co -A
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.ci-2023-06-13-083232   True        False         False      14m     
cloud-controller-manager                   4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
cloud-credential                           4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
cluster-autoscaler                         4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
config-operator                            4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
control-plane-machine-set                  4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
dns                                        4.14.0-0.ci-2023-06-13-083232   True        False         False      19m     
etcd                                       4.14.0-0.ci-2023-06-13-083232   True        False         False      22m     
image-registry                             4.14.0-0.ci-2023-06-13-083232   True        False         False      14m     
ingress                                    4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
kube-apiserver                             4.14.0-0.ci-2023-06-13-083232   True        False         False      18m     
kube-controller-manager                    4.14.0-0.ci-2023-06-13-083232   True        False         False      19m     
kube-scheduler                             4.14.0-0.ci-2023-06-13-083232   True        False         False      17m     
kube-storage-version-migrator              4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
machine-api                                4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
machine-approver                           4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
machine-config                             4.14.0-0.ci-2023-06-13-083232   True        False         False      21m     
marketplace                                4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
monitoring                                                                 False       True          True       14m     reconciling Console Plugin failed: creating ConsolePlugin object failed: the server could not find the requested resource (post consoleplugins.console.openshift.io)
network                                    4.14.0-0.ci-2023-06-13-083232   True        False         False      26m     
node-tuning                                4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
openshift-apiserver                        4.14.0-0.ci-2023-06-13-083232   True        False         False      14m     
openshift-controller-manager               4.14.0-0.ci-2023-06-13-083232   True        False         False      18m     
operator-lifecycle-manager                 4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
operator-lifecycle-manager-catalog         4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
operator-lifecycle-manager-packageserver   4.14.0-0.ci-2023-06-13-083232   True        False         False      19m     
service-ca                                 4.14.0-0.ci-2023-06-13-083232   True        False         False      25m

Version-Release number of selected component (if applicable):
4.14

How reproducible:

100%

Steps to Reproduce:

1. Deploy RAN DU spoke cluster via gitops ZTP approach with multiple base capabilities disabled including Console operator.
   spec:     
     capabilities:       
       additionalEnabledCapabilities:
         - marketplace       
         - NodeTuning       
     baselineCapabilitySet: None     
     channel: stable-4.14 
2. Monitor ocp deployment on spoke.

Actual results:

Deployment fails while finalizing agentclusterinstall.  clusterverions on spoke report "the cluster operator monitoring is not available"

Expected results:

Successful spoke deployment

Additional info:

After manually enabling console in clusterversion, the monitoring operator succeeded and OCP install completed

must-gather logs:
https://drive.google.com/file/d/19zO21jqcVTIkAdGS2DEqQuhg2oGUmuNY/view?usp=sharing
https://drive.google.com/file/d/1PXjZmBdMwHWNwkaXr2wE9tTtBRJWYeKP/view?usp=sharing

https://github.com/openshift/cluster-monitoring-operator/pull/2011

Bug OCPBUGS-19805: CoreDNS panics if an EndpointSlice object contains a port without a port number

View the Description View the linked PRs

Description of problem:

While reviewing PRs in CoreDNS 1.11.0, we stumbled upon https://github.com/coredns/coredns/pull/6179, which describes an CoreDNS crash in the kubernetes plugin if you create an EndpointSlice object contains a port without a port number.

I reproduced this myself and was able to successfully bring down all of CoreDNS so that the cluster was put into a degraded state.

We've bumped to CoreDNS 1.11.1 in 4.15, so this is concern for < 4.15.

Version-Release number of selected component (if applicable):

Less than or equal to 4.14

How reproducible:

100%

Steps to Reproduce:

1. Create an endpointslice with a port with no port number:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: example-abc
addressType: IPv4
ports:
  - name: ""

2.Shortly after creating this object, all DNS pods continuously crash:
oc get -n openshift-dns pods
NAME                  READY   STATUS             RESTARTS     AGE
dns-default-57lmh     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-h6cvm     1/2     CrashLoopBackOff   1 (4s ago)   79m
dns-default-mn7qd     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-mxq5g     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-wdrff     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-zs7cd     1/2     CrashLoopBackOff   1 (3s ago)   79m

Actual results:

DNS Pods crash

Expected results:

DNS Pods should NOT crash

Additional info:

https://github.com/openshift/coredns/pull/96

Bug OCPBUGS-6553: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/egress-router-cni/pull/67

Bug OCPBUGS-13361: Update a plural string in dynamic demo plugin locales

View the Description View the linked PRs

Description of problem:

The dynamic demo plugin locales is missing a correct plural string. The dynamic demo plugin doesn't make use of the script console uses to transform plural strings, so we need to update the plural string manually

This would help with the further validation of i18n dependencies update changes, and also the investigation of [Dynamic plugin translation support for plurals broken](https://issues.redhat.com/browse/OCPBUGS-11285) bug

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Deploy dynamic demo plugin on a cluster
2. Goto Overview page
3.

Actual results:

The Node Worker string is NOT in correct plural format

Expected results:

The node Worker string is in the correct plural format

Additional info:

https://github.com/openshift/console/pull/12799

Bug OCPBUGS-14581: Windows support is not enabled in vsphere CSI FSS ConfigMap

View the Description View the linked PRs

Description of problem:

In order for Windows nodes to use the openshift-cluster-csi-drivers/internal-feature-states.csi.vsphere.vmware.com ConfigMap, which contains the configuration for vSphere CSI, `csi-windows-support` must be set to true.
This is documented here: https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/833421f42475809b4f76ea125095b5120af0f8e1/docs/book/features/csi_driver_on_windows.md#how-to-enable-vsphere-csi-with-windows-nodes

Without this, a separate ConfigMap must be created and used for a user deploying Windows vSphere CSI drivers.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Add a Windows node to the cluster
2. Deploy vsphere csi daemonset for windows nodes as documented upstream
3. Add a Windows pod with a pvc mount

Actual results:

The pod is unable to mount the volume as windows support is not enabled

Expected results:

The pod can mount the volume

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/158

Bug OCPBUGS-4465: Error should be logged and stop the customized image generation if the nmstate output is "--- {}\n"

View the Description View the linked PRs

Description of problem:

When we exapnd the baremetal IP cluster with static IP, no information is logged if nmstate output is "--- {}\n" and the customized image generates without the static network configuration.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. Exapand baremetal ipi cluster node with the below invalid nmstate data.
   ---
   apiVersion: v1
   kind: Secret
   metadata:
    name: openshift-worker-0-network-config-secret
   type: Opaque
   stringData:
    nmstate: |
     foo:
      bar: baz
   ---
   apiVersion: v1
   kind: Secret
   metadata:
     name: openshift-worker-0-bmc-secret
     namespace: openshift-machine-api
   type: Opaque
   data:
     username: YWRtaW4K
     password: cGFzc3dvcmQK
   ---
   apiVersion: metal3.io/v1alpha1
   kind: BareMetalHost
   metadata:
     name: openshift-worker-0
     namespace: openshift-machine-api
   spec:
     online: True
     bootMACAddress: 52:54:00:11:22:b4
     bmc:
       address: ipmi://192.168.123.1:6233
       credentialsName: openshift-worker-0-bmc-secret
       disableCertificateVerification: True
       username: admin
       password: password
     rootDeviceHints:
       deviceName: "/dev/sda"
     preprovisioningNetworkDataName: openshift-worker-0-network-config-secret

2. Check if an IP is configured with the node
3.

Actual results:

No static network configuration in the metal3 customized image.

Expected results:

Information should be logged and the metal3 customized image should not be generated.

Additional info:

https://github.com/openshift/image-customization-controller/pull/72

https://github.com/openshift/image-customization-controller/pull/72

Bug OCPBUGS-19846: Unable to destroy cluster when AWS Organization SCP prevents use of iam:GetUser

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17724~~. The following is the description of the original issue:
—
Environment: OCP 4.12.24
Installation Method: IPI: Manual Mode + STS using a customer provider AWS IAM Role

I am trying to deploy an OCP4 cluster on AWS for my customer. The customer does not permit creation of IAM users so I am performing a Manual Mode with STS IPI installation instead. I have been given an IAM role to assume for the OCP installation, but unfortunately the customer's AWS Organizational Service Control Policy (SCP) does not permit the use of the iam:GetUser{} permission.

(I have informed my customer that iam:GetUser is an installation requirement - it's clearly documented in our docs, and I have raised a ticket with their internal support team requesting that their SCP is amended to include iam:getUser, however I have been informed that my request is likely to be rejected).

With this limitation understood, I still attempted to install OCP4. Surprisingly, I was able to deploy an OCP (4.12) cluster without any apparent issues, however when I tried to destroy the cluster I encountered the following error from the installer (note: fields in brackets <> have been redacted):

DEBUG search for IAM roles
DEBUG iterating over a page of 74 IAM roles
DEBUG search for IAM users
DEBUG iterating over a page of 1 IAM users
INFO get tags for <ARN of the IAM user>: AccessDenied: User:<ARN of my user> is notauthorized to perform: iam:GetUser on resource: <IAMusername> with an explicit deny in a service control policy
INFO status code: 403, request id: <request ID>
DEBUG search for IAM instance profiles
INFO error while finding resources to delete error=get tags for <ARN of IAM user> AccessDenied: User:<ARN of my user> is not authorized to perform: iam:GetUser on resource: <IAM username> with an explicit deny in a service control policy status code: 403, request id: <request ID>

Similarly, the error in AWS CloudTrail logs shows the following (note: some fields in brackets have been redacted):
User: arn:aws:sts::<AWS account no>:assumed-role/<role-name>/<user name> is not authorized to perform: iam:GetUser on resource <IAM User> with an explicit deny in a service control policy

It appears that the destroy operation is failing when the installer is trying to list tags on the only IAM user in the customer's AWS account. As discussed, the SCP does not permit the use of iam:GetUser and consequently this API call on the IAM user is denied. The installer then enters an endless loop as it continuously retries the operation. We have potentially identified the iamUserSearch function within the installer code at pkg/destroy/aws/iamhelpers.go as the area where this call is failing.

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L95

There does not appear to be a handler for "AccessDenied" API error in this function. Therefore we request that the access denied event is gracefully handled and skipped over when processing IAM users, allowing the installer to continue with the destroy operation, much in the same way that a similar access denied event is handled within the iamRoleSearch function when processing IAM roles:

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L51

We therefore request that the following is considered and addressed:

1. Re-assess if the iam:GetUser permission is actually needed for cluster installation/cluster operations.
2. If the permission is required then the installer should provide a warning or halt the installation.
2. During a "destroy" cluster operation - the installer should gracefully handle AccessDenied errors from the API and "skip over" any IAM Users that the installer does not have permission to list tags for and then continue gracefully with the destroy operation.

https://github.com/openshift/installer/pull/7532

Task MGMT-15344: Assisted-controller should not timeout on waiting cvo by itself

View the Description View the linked PRs

Controller should wait till service will timeout on cvo and not timeout by itself

https://github.com/openshift/assisted-installer/pull/688

Bug OCPBUGS-10149: Update 4.14 ose-ibmcloud-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-ibmcloud/pull/18

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/18

Bug OCPBUGS-10173: Update 4.14 oauth-server image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-server/pull/119

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-server/pull/134

Bug OCPBUGS-10230: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12550

Bug OCPBUGS-14969: HCP Service Loadbalancer uses default SecurityGroup

View the Description View the linked PRs

Description of problem:

When an HCP Service LB is created, for example for an IngressController, the CAPA controller calls ModifyNetworkInterfaceAttribute. It references the default security group for the VPC in addition to the security group created for the cluster ( with the right tags). Ideally, the LBs (and any other HCP components) should not be using the default VPC SecurityGroup

Version-Release number of selected component (if applicable):

All 4.12 and 4.13

How reproducible:

100%

Steps to Reproduce:

1. Create HCP
2. Wait for Ingress to come up.
3. Look in CloudTrail for ModifyNetworkInterfaceAttribute, and see default security group referenced

Actual results:

Default security group is used

Expected results:

Default security group should not be used

Additional info:

This is problematic as we are attempting to scope our AWS permissions as small as possible. The goal is to only use resources that are tagged with `red-hat-managed: true` so that our IAM Policies can conditioned to only access these resources. Using the Security Group created for the cluster should be sufficient, and the default Security Group does not need to be used, so if the usage can be removed here, we can secure our AWS policies that much better. Similar to OCPBUGS-11894

https://github.com/openshift/cluster-api-provider-aws/pull/467

Bug OCPBUGS-15291: DeploymentConfig deprecation warning breaks oc idle tests

View the Description View the linked PRs

Description of problem:

oc idle tests do not expect the deprecation warning in its output and breaks.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Run the test
2. Watch it fail
3.

Actual results:

Error running /usr/bin/oc --namespace=e2e-test-oc-idle-hns4c --kubeconfig=/tmp/configfile3347652119 describe deploymentconfigs v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
deploymentconfig.apps.openshift.io:
StdOut>
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
Error from server (NotFound): deploymentconfigs.apps.openshift.io "v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
deploymentconfig.apps.openshift.io" not found
StdErr>
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
Error from server (NotFound): deploymentconfigs.apps.openshift.io "v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
deploymentconfig.apps.openshift.io" not found
exit status 1

Expected results:

Tests should pass

Additional info:

I have tracked down the problem to this line: https://github.com/openshift/origin/blob/master/test/extended/cli/idle.go#LL49C40-L49C40

deploymentConfigName gets assigned to "v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ deploymentconfig.apps.openshift.io", which leads to the next command not finding a deployment config.

Bug OCPBUGS-18135: target.workload.openshift.io/management annotation on CNO causes delays for IBM ROKS Toolkit

View the Description View the linked PRs

Description of problem:

The target.workload.openshift.io/management annotation causes CNO operator pods to wait for nodes to appear. Eventually they give up waiting and they get scheduled. This annotation should not be set for the hosted control plane topology, given that we should not wait for nodes to exist for the CNO to be scheduled.

Version-Release number of selected component (if applicable):

4.14, 4.13

How reproducible:

always

Steps to Reproduce:

1. Create IBM ROKS cluster
2. Wait for cluster to come up
3.

Actual results:

Cluster takes a long time to come up because CNO pods take ~15 min to schedule.

Expected results:

Cluster comes up quickly

Additional info:

Note: Verification for the fix has already happened on the IBM Cloud side. All OCP QE needs to do is to make sure that the fix doesn't cause any regression to the regular OCP use case.

https://github.com/openshift/cluster-network-operator/pull/1955

Bug OCPBUGS-2474: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1561

Bug OCPBUGS-14270: techpreview jobs are failing due to new gathering pods

View the Description View the linked PRs

Description of problem:

Techpreview parallel jobs are failing due to changes in the insights operator

Example failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview/1663408887002304512

Looks like it's from https://github.com/openshift/insights-operator/pull/764

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.14/analysis?filters=%7B%22items%22%3A%5B%7B%22id%22%3A0%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-techpreview%22%7D%2C%7B%22id%22%3A1%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview%22%7D%2C%7B%22id%22%3A2%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-nightly-4.14-e2e-vsphere-ovn-techpreview%22%7D%5D%2C%22linkOperator%22%3A%22or%22%7D

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/insights-operator/pull/785

Bug OCPBUGS-15991: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2787

Bug OCPBUGS-16093: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13021

Bug MGMT-13431: [Assisted-4.12][Staging] ODF Storage class not recognizing all additional storage device sets

View the Description View the linked PRs

Description of the problem:

While having a cluster with 3 masters and attaching 5 additional disks , on the 3 masters , checking the device storage sets for the operator show only 3 storage devices and not as expected the 5 additional disks

How reproducible:

80%,

OCP 4.12, OCS 4.12.1

also reproduces on OCP 4.11

Steps to reproduce:

1. Create a Cluster with 3 master nodes

2. attach 2 additional disks to master1 , 2 additional disks to master 2 , 1 additional disk to master 3

3. check count of storage devices on operator

Actual results:
operator show device set count = 3

Expected results:
device set count should be as the amount of the different valid additional attached disks (= 5)

https://github.com/openshift/assisted-service/pull/5268

Bug OCPBUGS-10343: Ironic inspector service should be proxied

View the Description View the linked PRs

Description of problem:

When deploying hosts using ironic's agent both the ironic service address and inspector address are required.

The ironic service is proxied such that it can be accessed at a consistent endpoint regardless of where the pod is running. This is not the case for the inspection service.

This means that if the inspection service moves after we find the address, provisioning will fail.

In particular this non-matching behavior is frustrating when using the CBO [GetIronicIP function|https://github.com/openshift/cluster-baremetal-operator/blob/6f0a255fdcc7c0e5c04166cb9200be4cee44f4b7/provisioning/utils.go#L95-L127] as one return value is usable forever but the other needs to somehow be re-queried every time the pod moves.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Relatively

Steps to Reproduce:

1. Retrieve the inspector IP from GetIronicIP
2. Reschedule the inspector service pod
3. Provision a host

Actual results:

Ironic python agent raises an exception

Expected results:

Host provisions

Additional info:

This was found while deploying clusters using ZTP

In this scenario specifically an image containing the ironic inspector IP is valid for an extended period of time. The same image can be used for multiple hosts and possibly multiple different spoke clusters.

Our controller shouldn't be expected to watch the ironic pod to ensure we update the image whenever it moves. The best we can do is re-query the inspector IP whenever a user makes changes to the image, but that may still not be often enough.

Bug OCPBUGS-13332: add checkpoint for name when create catalogsouce

View the Description View the linked PRs

Description of problem:

when catalogsouce name started with number , the pod will not running well , could we add checkpoint for the name , if the name is not suitable for regex used validation  ''[a-z]([-a-z0-9]*[a-z0-9])?'')',  print message and can't create the catalogsource .

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.skopeo copy --all --format v2s2 docker://icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:6f02ecef46020bcd21bdd24a01f435023d5fc3943972ef0d9769d5276e178e76 oci:///home1/611/oci-index
2. change the work directory to :  `cd  home1/611/oci-index` 
3. run the oc-mirror command : 
cat config.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /home1/ocilocalstorage
mirror:
  operators:
  - catalog: oci:///home1/611/oci-index

`oc-mirror --config config.yaml docker://ec2-18-217-58-249.us-east-2.compute.amazonaws.com:5000/multi-oci --dest-skip-tls --include-local-oci-catalogs`
4. apply the catalogsouce and ICSP yaml file;
5 . check the catalogsource pod

Actual results:

[root@preserve-fedora36 oci-index]# oc get pod --show-labels 
NAME                                    READY   STATUS              RESTARTS   AGE     LABELS
611-oci-index-2sfh8                     0/1     Terminating         0          4s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87
611-oci-index-dbj9b                     0/1     ContainerCreating   0          1s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87
611-oci-index-w4tfd                     0/1     Terminating         0          2s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87
611-oci-index-zj8nn                     0/1     Terminating         0          3s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87

oc get catalogsource 611-oci-index -oyaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: "2023-05-10T03:01:36Z"
  generation: 1
  name: 611-oci-index
  namespace: openshift-marketplace
  resourceVersion: "97108"
  uid: 2287434b-9e70-4865-b1a1-95997165f94e
spec:
  image: ec2-18-217-58-249.us-east-2.compute.amazonaws.com:5000/multi-oci/home1/611/oci-index:6f02ec
  sourceType: grpc
status:
  message: 'couldn''t ensure registry server - error ensuring service: 611-oci-index:
    Service "611-oci-index" is invalid: metadata.name: Invalid value: "611-oci-index":
    a DNS-1035 label must consist of lower case alphanumeric characters or ''-'',
    start with an alphabetic character, and end with an alphanumeric character (e.g.
    ''my-name'',  or ''abc-123'', regex used for validation is ''[a-z]([-a-z0-9]*[a-z0-9])?'')'
  reason: RegistryServerError

Expected results:

should not create the catalogsouce when it's name is not suitable for the regex used validation  .

Additional info:
rename the catalogsource with oci-611-index, pod running well, and could create the operator and instance .

https://github.com/openshift/oc-mirror/pull/636

Bug OCPBUGS-13963: Bump cluster-ingress-operator to k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/cluster-ingress-operator vendors Kubernetes 1.26 packages. OpenShift 4.13 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/cluster-ingress-operator/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.
controller-runtime will need to be bumped to 1.15 as well

https://github.com/openshift/cluster-ingress-operator/pull/936

Bug OCPBUGS-14667: "invalid 'runbook_url' annotation" test permafailing for prometheus

View the Description View the linked PRs

after the 'runbook_url' annotation test was increased in severity in https://github.com/openshift/origin/pull/27933 it started permafailing
example logs
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ironic-image/379/pull-ci-openshift-ironic-image-master-prevalidation-e2e-metal-ipi-virtualmedia-prevalidation/1666311316056313856

https://github.com/openshift/origin/pull/27969

Bug OCPBUGS-20103: [gcp] IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19376~~. The following is the description of the original issue:
—
Description of problem:

IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-15-233408

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit install-config.yaml to insert "credentialsMode: Manual"
3. "create manifests"
4. manually create the required credentials and copy the manifests to installation-dir/manifests directory
5. launch the bastion host along with binding to the pre-configured service account ipi-on-bastion-sa@openshift-qe.iam.gserviceaccount.com and scopes being "cloud-platform"
6. copy the installation-dir and openshift-install to the bastion host
7. try "create cluster" on the bastion host

Actual results:

The installation failed on "Creating infrastructure resources"

Expected results:

The installation should succeed.

Additional info:

(1) FYI the 4.12 epic: https://issues.redhat.com/browse/CORS-2260

(2) 4.12.34 doesn't have the issue (Flexy-install/234112/). 

(3) 4.13.13 doesn’t have the issue (Flexy-install/234126/).

(4) The 4.14 errors (Flexy-install/234113/):
09-19 16:13:44.919  level=info msg=Consuming Master Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Bootstrap Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Worker Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Credentials loaded from gcloud CLI defaults
09-19 16:13:49.071  level=info msg=Creating infrastructure resources...
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "cluster" stage: failed to create cluster: failed to apply Terraform: exit status 1
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error

https://github.com/openshift/installer/pull/7553

Bug MGMT-13009: [Staging] agent's auto name replace doesn't work with Static IPs and VLAN

View the Description View the linked PRs

Agent does not replace localhost.localdomain node names with MAC addresses
in case Cluster network configuration is Static IPs with VLAN
Found in agent log
Dec 20 17:37:42 localhost.localdomain inventory[2284]: time="20-12-2022 17:37:42" level=info msg="Replaced original forbidden hostname with calculated one" file="inventory.go:63" calculated=localhost.localdomain original=localhost.localdomain

As result
Cluster is not ready yet.
The cluster is not ready yet. Some hosts have an ineligible name. To change the hostname, click on it.

How reproducible:
1. Provision libvirt VMs and network with VLAN
2. Create cluster and select Static IP Network configuration
3. Fill all required filed in from view and press Next
4. Generate and download ISO
5. Wait until nodes will be UP and discovered

Actual results:
Nodes have localhost.localdomain names

Expected results:
Nodes have name as host's MAC address

https://github.com/openshift/assisted-installer-agent/pull/553

Bug OCPBUGS-10647: multus-admission-controller should not run as root under Hypershift-managed CNO

View the Description View the linked PRs

Description of problem:

Cluster Network Operator managed component multus-admission-controller does not conform to Hypershift control plane expectations.

When CNO is managed by Hypershift, multus-admission-controller must run with non-root security context. If Hypershift runs control plane on kubernetes (as opposed to Openshift) management cluster, it adds pod or container security context to most deployments with runAsUser clause inside.

In Hypershift CPO, the security context of deployment containers, including CNO, is set when it detects that SCC's are not available, see https://github.com/openshift/hypershift/blob/9d04882e2e6896d5f9e04551331ecd2129355ecd/support/config/deployment.go#L96-L100. In such a case CNO should do the same, set security context for its managed deployment multus-admission-controller to meet Hypershift standard.

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift using Kube management cluster
2.Check pod security context of multus-admission-controller

Actual results:

no pod security context is set

Expected results:

pod security context is set with runAsUser: xxxx

Additional info:

This is the highest priority item from https://issues.redhat.com/browse/OCPBUGS-7942 and it needs to be fixed ASAP as it is a security issue preventing IBM from releasing Hypershift-managed Openshift service.

https://github.com/openshift/cluster-network-operator/pull/1745

Bug MGMT-14074: Acquiring live pxe rootfs fails with "could not resolve host" error when 4.13 9.2 live iso is used

View the Description View the linked PRs

Description of the problem:

When 9.2 based live iso is used in agentserviceconfig, after booting into CD, spoke console stuck at acquire live pxe rootfs with could not resolve host error.

It seems the DNS server configured in nmstate is not applied to spoke.

How reproducible:

100%

Steps to reproduce:

configure agentserviceconfig to use 4.13 9.2 live iso. (413.92.202303190222-0)

2. install SNO via ZTP

3. Monitor install CRs on hub

Actual results:

agentclusterinstall stuck at "insufficient" state
spoke console shows could not resolve host when attempt to download rootfs image (screenshot attached)

Expected results:

install succeeded

Extra info:

ACM version: latest 2.7.3 downstream snapshot
Did not encounter this specific issue if switch to 8.6 based 4.13 live iso in agentserviceconfig.
However, even though we can by pass this step, then similar issue happens after booting into HD which has 9.2 based OS - the DNS server on spoke is different than what is configured in nmstate, causing DNS resolution to fail.
- And we did not see this issue when using ACM 2.7.2 snapshot from about 3 weeks ago. We were able to install the same cluster using same networking configs with 4.13 9.2 build (8.6 live iso).

Bug MGMT-14338: Missing notifications on infraenv registration

View the Description View the linked PRs

Description of the problem:

Infraenv creation data missing

How reproducible:

data is propagated only on infraenv update

Steps to reproduce:

1. create new cluster

2. check elastic data: some special feature is missing

https://github.com/openshift/assisted-service/pull/5132

Bug OCPBUGS-10141: Update 4.14 ose-nutanix-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-nutanix/pull/42

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-nutanix/pull/42

Bug OCPBUGS-19526: Unnecessary API calls if TektonConfig is not minimal

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13152~~. The following is the description of the original issue:
—
Description of problem:
With ~~OCPBUGS-11099~~ our Pipeline Plugin supports the TektonConfig config "embedded-status: minimal" option that will be the default in OpenShift Pipelines 1.11+.

But since this change, the Pipeline pages loads the TaskRuns for any Pipeline and PipelineRun rows. To decrease the risk of a performance issue we should make this call only if the status.tasks wasn't defined.

Version-Release number of selected component (if applicable):

4.12-4.14, as soon as ~~OCPBUGS-11099~~ is backported.
Tested with Pipelines operator 1.10.1

How reproducible:
Always

Steps to Reproduce:

Install Pipelines operator
Import a Git repository and enable the Pipeline option
Open the browser network inspector
Navigate to the Pipeline page

Actual results:
The list page load a list of TaskRuns for each Pipeline / PipelineRun also if the PipelineRun contains the related data already (status.tasks)

Expected results:
No unnecessary network calls. When the admin changes the TektonConfig config "embedded-status" option to minimal the UI should still work and load the TaskRuns as it does it today.

Additional info:
None

https://github.com/openshift/console/pull/13178

Bug MGMT-15389: Generating ignition request is very expensive

View the Description View the linked PRs

Description of the problem:

#!/bin/bashwhile sleep 0.5; do
    for i in {1..10}; do
        curl -I -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" 'https://api.stage.openshift.com/api/assisted-install/v2/infra-envs/3dc00d41-46bf-4b83-9874-f21812263c97/downloads/files?discovery_iso_type=full-iso&file_name=discovery.ign' > /dev/null &
    done ;
done

This script above would cause assisted-service to spike CPU and 99th percentile of requests to jump to 10s

How reproducible:

100%

Steps to reproduce:

1. run script above

2. check response time/cpu usage

Actual results:

response time really slow / 504

Expected results:

service continues to run smoothly

https://github.com/openshift/assisted-service/pull/5400

Bug MGMT-15423: [Minikube] [BE] - change the user message when the host is not compatible with cluster platform

View the Description View the linked PRs

Description of the problem:

Change the user message from: "Host is not compatible with cluster platform %s; either disable this host or choose a compatible cluster platform (%v)" to "Host is not compatible with cluster platform %s; either disable this host or discover a new, compatible host."

How reproducible:

100%

Steps to reproduce:

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5412

Bug OCPBUGS-10269: Fix grammatical error in feedback modal

View the Description View the linked PRs

Description of problem:

Fix grammatical error in feedback modal. Remove 'the' before openshift text.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12634

Story STOR-1334: Update CSO to read FeatureGates from FeatureGate.Status

View the Description View the linked PRs

OCP FeatureGate object gets a new status field, where the enabled feature gates are listed. We should use this new field instead of parsing FeatureGate.Spec.

This should be fully transparent to users, they still set FeatureGate.Spec and they should still observe that SharedResource CSI driver + operator is installed when they enable TechPreviewNoUpgrade feature set there.

Enhancement: https://github.com/openshift/cluster-storage-operator/pull/368

Bug OCPBUGS-14814: Update OWNERS and OWNERS_ALIASES in node-driver-registrar repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-node-driver-registrar/pull/47

Bug OCPBUGS-5453: [Openshift Pipelines] Metrics page is broken

View the Description View the linked PRs

Description of problem:

Metrics page is broken

Version-Release number of selected component (if applicable):

Openshift Pipelines 1.9.0 on 4.12

How reproducible:

Always

Steps to Reproduce:

1. Install Openshift Pipelines 1.9.0
2. Create a pipeline and run it several times
3. Update metrics.pipelinerun.duration-type and metrics.taskrun.duration-type to lastvalue
4. Navigate to created pipeline 
5. Switch to Metrics tab

Actual results:

The Metrics page is showing error

Expected results:

Metrics of the pipeline should be shown

Additional info:

https://github.com/openshift/console/pull/12435

Bug OCPBUGS-2633: Got the `file exists` error when different digest direct to the same tag

View the Description View the linked PRs

Description of problem:

There are different versions, channel for the operator, but may be they use the same 'latest' label, when mirroring them as `additionalImages`, got the below error:

[root@ip-172-31-249-209 jian]# oc-mirror --config mirror.yaml file:///root/jian/test/
...
...
sha256:672b4bee759f8115e5538a44c37c415b362fc24b02b0117fd4bdcc129c53e0a1 file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest
sha256:d90aecc425e1b2e0732d0a90bc84eb49eb1139e4d4fd8385070d00081c80b71c file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest
error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:f6b6a15c4477615ff202e73d77fc339977aeeca714b9667196509d53e2d2e4f5 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists
error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:6a1de43c60d021921973e81c702e163a49300254dc3b612fd62ed2753efe4f06 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists
info: Mirroring completed in 22.48s (125.8MB/s)
error: one or more errors occurred while uploading images

Version-Release number of selected component (if applicable):

[root@ip-172-31-249-209 jian]# oc-mirror version
Client Version: version.Info{Major:"0", Minor:"1", GitVersion:"v0.1.0", GitCommit:"6ead1890b7a21b6586b9d8253b6daf963717d6c3", GitTreeState:"clean", BuildDate:"2022-08-25T05:27:39Z", GoVersion:"go1.17.12", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1. use the below config:
[cloud-user@preserve-olm-env2 mirror-tmp]$ cat mirror.yaml
apiVersion: mirror.openshift.io/v1alpha1
kind: ImageSetConfiguration
# archiveSize: 4
mirror:
  additionalImages:
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:46a62d73aeebfb72ccc1743fc296b74bf2d1f80ec9ff9771e655b8aa9874c933
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:9e549c09edc1793bef26f2513e72e589ce8f63a73e1f60051e8a0ae3d278f394
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:c16891ee9afeb3fcc61af8b2802e56605fff86a505e62c64717c43ed116fd65e
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:5c37bd168645f3d162cb530c08f4c9610919d4dada2f22108a24ecdea4911d60
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:89a6abbf10908e9805d8946ad78b98a13a865cefd185d622df02a8f31900c4c1
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:de5b339478e8e1fc3bfd6d0b6784d91f0d3fbe0a133354be9e9d65f3d7906c2d
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:fdf774c4365bde48d575913d63ef3db00c9b4dda5c89204029b0840e6dc410b1
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:d90aecc425e1b2e0732d0a90bc84eb49eb1139e4d4fd8385070d00081c80b71c
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:15cc75164335fa178c80db4212d11e4a793f53d2b110c03514ce4c79a3717ca0
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:9e66db3a282ee442e71246787eb24c218286eeade7bce4d1149b72288d3878ad
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:546b14c1f3fb02b1a41ca9675ac57033f2b01988b8c65ef3605bcc7d2645be60
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:12d7061012fd823b57d7af866a06bb0b1e6c69ec8d45c934e238aebe3d4b68a5
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:41025e3e3b72f94a3290532bdd6cabace7323c3086a9ce434774162b4b1dd601
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:672b4bee759f8115e5538a44c37c415b362fc24b02b0117fd4bdcc129c53e0a1
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:92542b22911fbd141fadc53c9737ddc5e630726b9b53c477f4dfe71b9767961f
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:f6b6a15c4477615ff202e73d77fc339977aeeca714b9667196509d53e2d2e4f5
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:1feb7073dec9341cadcc892df39ae45c427647fb034cf09dce1b7aa120bbb459
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:7ca05f93351959c0be07ec3af84ffe6bb5e1acea524df210b83dd0945372d432
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:c0fe8830f8fdcbe8e6d69b90f106d11086c67248fa484a013d410266327a4aed
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:6a1de43c60d021921973e81c702e163a49300254dc3b612fd62ed2753efe4f06
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:b386d0e1c9e12e9a3a07aa101257c6735075b8345a2530d60cf96ff970d3d21a


2. Run the 
$ oc-mirror --config mirror.yaml file:///root/jian/test/

Actual results:

error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:f6b6a15c4477615ff202e73d77fc339977aeeca714b9667196509d53e2d2e4f5 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists
error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:6a1de43c60d021921973e81c702e163a49300254dc3b612fd62ed2753efe4f06 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists

Expected results:

No error

Additional info:

https://github.com/openshift/oc-mirror/pull/601

Bug OCPBUGS-10842: CI fails on "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install" for azure-file-csi-driver pods

View the Description View the linked PRs

Description of problem

CI is flaky because of test failures such as the following:

[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]
Run #0: Failed
{  fail [github.com/openshift/origin/test/extended/authorization/scc.go:69]: 1 pods failed before test on SCC errors
Error creating: pods "azure-file-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[6]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[9]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[10]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.initContainers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.initContainers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[1].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[2].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[2].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/azure-file-csi-driver-node -n openshift-cluster-csi-drivers happened 12 times

Ginkgo exit error 1: exit with code 1}

Run #1: Failed
{  fail [github.com/openshift/origin/test/extended/authorization/scc.go:69]: 1 pods failed before test on SCC errors
Error creating: pods "azure-file-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[6]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[9]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[10]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.initContainers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.initContainers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[1].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[2].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[2].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/azure-file-csi-driver-node -n openshift-cluster-csi-drivers happened 12 times

Ginkgo exit error 1: exit with code 1}

This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/901/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-ovn/1638557668689842176. Search.ci has additional similar errors.

Version-Release number of selected component (if applicable)

I have seen these failures in 4.14 CI jobs.

How reproducible

Presently, search.ci shows the following stats for the past two days:

Found in 0.00% of runs (0.01% of failures) across 131399 total runs and 7623 jobs (19.50% failed) in 1.01s

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check search.ci: https://search.ci.openshift.org/?search=pods+%22azure-file-csi-driver-%28controller%7Cnode%29-%22+is+forbidden&maxAge=168h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Actual results

CI fails.

Expected results

CI passes, or fails on some other test failure, and the failures don't show up in search.ci.

Bug OCPBUGS-11773: create hosted cluster failed with aws s3 access issue

View the Description View the linked PRs

Description of problem:

with new s3 bucket, hc failed with condition :
- lastTransitionTime: “2023-04-13T14:17:11Z”
   message: ‘failed to upload /.well-known/openid-configuration to the heli-hypershift-demo-oidc-2
    s3 bucket: aws returned an error: AccessControlListNotSupported’
   observedGeneration: 3
   reason: OIDCConfigurationInvalid
   status: “False”
   type: ValidOIDCConfiguration

Version-Release number of selected component (if applicable):

How reproducible:

1 create s3 bucket 
$ aws s3api create-bucket --create-bucket-configuration  LocationConstraint=us-east-2 --region=us-east-2 --bucket heli-hypershift-demo-oidc-2
{
  "Location": "http://heli-hypershift-demo-oidc-2.s3.amazonaws.com/"
}
[cloud-user@heli-rhel-8 ~]$ aws s3api delete-public-access-block --bucket heli-hypershift-demo-oidc-2

2 install HO and create a hc on aws us-west-2
3. hc failed with condition:
- lastTransitionTime: “2023-04-13T14:17:11Z”    message: ‘failed to upload /.well-known/openid-configuration to the heli-hypershift-demo-oidc-2     s3 bucket: aws returned an error: AccessControlListNotSupported’    observedGeneration: 3    reason: OIDCConfigurationInvalid    status: “False”    type: ValidOIDCConfiguration

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

create a hc successfully

Additional info:

https://github.com/openshift/hypershift/pull/2423

Bug OCPBUGS-13366: DNS operator prone to spamming TopologyAwareHintsDisable events on GCP/Azure since May 5

View the Description View the linked PRs

The dns operator appears to have begun frequently spamming kube Events in some serial jobs across multiple clouds. (especially gcp and azure, aws is less common but there are some failures with the same signature)

The pathological events test and here it appears this started on May 5th. See the Pass Rate By NURP+ Combination panel for where this is most common.

As of the date of filing, pass rates are:
56% - gcp, amd64, sdn, ha, serial, techpreview
57% - gcp, amd64, sdn, ha, serial
60% - azure, amd64, ovn, ha, serial
60% - azure, amd64, ovn, ha, serial, techpreview

The events seem to consistently appear as follows on all clouds:

ns/openshift-dns service/dns-default hmsg/ade328ddf3 - pathological/true reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (5 endpoints, 3 zones), addressType: IPv4 From: 08:58:41Z To: 08:58:42Z

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-sdn-techpreview-serial/1656207924667617280 (intervals)

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview-serial/1656207916375478272

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1655277608981499904

The Intervals item under "Debug Tools" is a great way to see these charted in time, see the "interesting events" section.

test=[sig-arch] events should not repeat pathologically for namespace openshift-dns

https://github.com/openshift/origin/pull/27916

Bug OCPBUGS-14926: [4.14] No suitable virtual media device found for Cisco UCS Blade

View the Description View the linked PRs

Description of problem:

Not able to provision a new baremetalhost because ironic is not able to find a suitable virtual media device.

Version-Release number of selected component (if applicable):

How reproducible:

100% if you have a UCS Blade

Steps to Reproduce:

1. add the baremetalhost 
2. wait for the error
3.

Actual results:

No suitable virtual media device found.

Expected results:

That the provisioning would succeeed

Additional info:

I tried to insert an ISO using curl and I can do it on the virtualmedia[3] device, which is a virtual DVD.

When I'm looking at the metal3-ironic logs I can see the follow entry:
Received representation of VirtualMedia /redfish/v1/Managers/CIMC/VirtualMedia/3: {'_actions': {'eject_media': {'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Managers/CIMC/VirtualMedia/3/Actions/VirtualMedia.EjectMedia'}, 'insert_media': {'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Managers/CIMC/VirtualMedia/3/Actions/VirtualMedia.InsertMedia'}}, '_certificates_path': None, '_oem_vendors': ['Cisco'], 'connected_via': <ConnectedVia.URI: 'URI'>, 'identity': '3', 'image': None, 'image_name': None, 'inserted': False, 'links': None, 'media_types': [<VirtualMediaType.DVD: 'DVD'>], 'name': 'CIMC-Mapped vDVD', 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.DISABLED: 'Disabled'>}, 'transfer_method': None, 'user_name': None, 'verify_certificate': None, 'write_protected': False}

I'm sure this is the correct device, and verified that I can insert vmedia using curl.

Someone metal3/ironic is not selecting this device.
I'm suspecting that the reason is that "DVD" is not a valid media_type.
When I look at [the ironic code](https://github.com/openstack/ironic/blob/b4f8209b99af32d8d2a646591af9b62436aad3d8/ironic/drivers/modules/redfish/boot.py#LL188C31-L188C31) I can see that there is a check for the media_type.

I'm not able to see which values are accepted by metal3.

I was able to validate the media_types for a rackmount server which works and there I see the following values: "CD, DVD".

This led me to believe that DVD is not an accepted value.

Can you please confirm that this is the case and if so, can we add the DVD as a suitable device?

https://github.com/openshift/ironic-image/pull/394

Bug OCPBUGS-16108: Web console DeploymentConfig list page becomes slow when user visit workload page with more than 300 workloads

View the Description View the linked PRs

Description of problem:

Customer is facing issue with console slowness when loading workloads page having 300+ workloads.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

1. Login to OCP console
2. Workloads — > Projects --> Project-> Deployment Configs(300+)
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13118

Bug OCPBUGS-16501: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/624

Bug OCPBUGS-12305: Update 4.14 openshift-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/97

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/97

Bug OCPBUGS-14340: oc should not append the -x86_64 suffix when mirroring multi-arch payloads

View the Description View the linked PRs

Description of problem:

oc should not append the -x86_64 suffix when mirroring multi-arch payloads

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.oc adm release mirror quay.io/openshift-release-dev/ocp-release:4.12.13-multi --keep-manifest-list=true --to=someregistry.io/somewhere/release  
2.
3.

Actual results:

05-31 04:54:15.807        sha256:cd8639e34840833dd98d8323f1999b00ca06c73d7ae9ad8945f7b397450821ee -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-insights-operator
05-31 04:54:15.807        sha256:d0443f26968a2159e8b9590b33c428b6af7c0220ab6cc13633254d8843818cdf -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-keepalived-ipfailover
05-31 04:54:15.807        sha256:d2126187264d04f812068c03b59316547f043f97e90ec1a605ac24ab008c85a0 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-agent-installer-orchestrator
05-31 04:54:15.807        sha256:d445a4ece53f0695f1b812920e4bbb8a73ceef582918a0f376c2c5950a3e050b -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-ovn-kubernetes
05-31 04:54:15.807        sha256:d4bfe3bac81d5bb758efced8706a400a4b1dad7feb2c9a9933257fde9f405866 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-csi-snapshot-controller
05-31 04:54:15.807        sha256:d50c009e4b47bb6d93125c08c19c13bf7fd09ada197b5e0232549af558b25d19 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-vsphere-csi-driver-operator
05-31 04:54:15.807        sha256:d844ecbbba99e64988f4d57de9d958172264e88b9c3bfc7b43e5ee19a1a2914e -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-ironic
05-31 04:54:15.807        sha256:d90b37357d4c2c0182787f6842f89f56aaebeab38a139c62f4a727126e036578 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-baremetal-machine-controllers
05-31 04:54:15.807        sha256:d928536d8d9c4d4d078734004cc9713946da288b917f1953a8e7b1f2a8428a64 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-azure-cloud-controller-manager
05-31 04:54:15.807        sha256:da049d5a453eeb7b453e870a0c52f70df046f2df149bca624248480ef83f2ac8 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-cli-artifacts
05-31 04:54:15.807        sha256:db1cf013e3f845be74553eecc9245cc80106b8c70496bbbc0d63b497dcbb6556 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-cluster-capi-controllers
05-31 04:54:15.807        sha256:dc7b1305c7fec48d29adc4d8b3318d3b1d1d12495fb2d0ddd49a33e3b6aed0cc -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-gcp-pd-csi-driver
05-31 04:54:15.807        sha256:de8753eb8b2ccec3474016cd5888d03eeeca7e0f23a171d85b4f9d76d91685a3 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-baremetal-installer

Expected results:

no -x86_64 suffix added to the images tags

Additional info:

https://github.com/openshift/oc/pull/1423

Bug OCPBUGS-9355: Workloads -> Deployments -> Edit update strategy: 'greater than pod' translation miss

View the Description View the linked PRs

Description of problem:
Navigation:
Workloads -> Deployments -> Edit update strategy
'greater than pod' is in English

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-06-23-044003

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
Translation missing

Expected results:
Translation should appear

Additional info:

https://github.com/openshift/console/pull/13049

Bug MGMT-13977: [Staging] [BE] - Creating a cluster with 1 char string long base domain (i.e. - "c" as base domain) - getting DNS wildcard not configured: error

View the Description View the linked PRs

Description of the problem:

BE 2.16, base domain allows 1 char string long. This results with cluster address like: clustername.r, but in networking page I get DNS wildcard not configured

How reproducible:

100%

Steps to reproduce:

1. Create a cluster with 1 character string as base domain (i.e. "c" )

2. move to Networking page

3. set all needed info (api + ingress vips) . Validation error - DNS wildcard not configured: is shown

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5196

Bug OCPBUGS-15008: Global configuration of 'KnativeServing' is missing

View the Description View the linked PRs

Description of problem:

Global configuration of 'KnativeServing' is missing after user installed the Operator of 'Serverless' successfully

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Installed 'Serveless' Operator, make sure the operator has been installed successfully, and the Knative Serving instance is created without any error
2. Navigate to Administration -> Cluster Settings -> Global Configuration
3. Check if KnativeServing is listed in the Cluster Setting page

Actual results:

KnativeServing is missing

Expected results:

KnativeServing should list in the Global Configuration page

Additional info:

https://github.com/openshift/console/pull/13059

Bug OCPBUGS-17714: oc-mirror will panic when use oci-registries-config

View the Description View the linked PRs

Description of problem:

when use oci-registries-config, the oc-mirror will panic

Version-Release number of selected component (if applicable):

Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.14.0-202308091944.p0.gdba4a0c.assembly.stream-dba4a0c", GitCommit:"dba4a0cfd0a9fd29c1e4b5bc1da737e1153cc679", GitTreeState:"clean", BuildDate:"2023-08-10T00:13:31Z", GoVersion:"go1.20.5 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1.  mirror to localhost :
cat config.yaml 
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
mirror:
  operators:
    - catalog: oci:///home1/oci-414
      packages:
      - name: cluster-logging
oc-mirror --config config.yaml docker://localhost:5000 --dest-use-http
2. use oci-registries-config 
`oc-mirror --config config.yaml docker://localhost:5000 --dest-use-http   --oci-registries-config /home1/registry.conf`

Actual results:

2. The oc-mirror will panic :
oc-mirror --config config.yaml docker://ec2-18-117-165-30.us-east-2.compute.amazonaws.com:5000  --dest-use-http   --oci-registries-config /home1/registry.conf 
Logging to .oc-mirror.log
Checking push permissions for ec2-18-117-165-30.us-east-2.compute.amazonaws.com:5000
Found: oc-mirror-workspace/src/publish
Found: oc-mirror-workspace/src/v2
Found: oc-mirror-workspace/src/charts
Found: oc-mirror-workspace/src/release-signatures
backend is not configured in config.yaml, using stateless mode
backend is not configured in config.yaml, using stateless mode
No metadata detected, creating new workspace
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x2e8a774]

goroutine 43 [running]:
github.com/containers/image/v5/docker.(*dockerImageSource).Close(0x3?)
	/go/src/github.com/openshift/oc-mirror/vendor/github.com/containers/image/v5/docker/docker_image_src.go:170 +0x14
github.com/openshift/oc-mirror/pkg/cli/mirror.findFirstAvailableMirror.func1()
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:449 +0x42
github.com/openshift/oc-mirror/pkg/cli/mirror.findFirstAvailableMirror({0x4c67b38, 0xc0004ca230}, {0xc00ad56000, 0x1, 0x40d19c0?}, {0xc00077e000, 0x94}, {0xc00ac0f6b0, 0x24}, {0x0, ...})
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:467 +0x6df
github.com/openshift/oc-mirror/pkg/cli/mirror.(*MirrorOptions).addRelatedImageToMapping(0xc0001c0f00, {0x4c67b38, 0xc0004ca230}, 0xc00ac13480?, {{0xc0074a14e8?, 0x18?}, {0xc0076563f0?, 0x8b?}}, {0xc000c5b580, 0x36})
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:154 +0x3c5
github.com/openshift/oc-mirror/pkg/cli/mirror.(*OperatorOptions).plan.func3()
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/operator.go:570 +0x52
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/src/github.com/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/src/github.com/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:72 +0xa5

Expected results:

Should  not panic

Additional info:

https://github.com/openshift/oc-mirror/pull/680

Bug OCPBUGS-18257: API VIP stuck on node with inaccessible API

View the Description View the linked PRs

Description of problem:

The fix for https://issues.redhat.com/browse/OCPBUGS-15947 seems to have introduced a problem in our keepalived-monitor logic. What I'm seeing is that at some point all of the apiservers became unavailable, which caused haproxy-monitor to drop the redirect firewall rule since it wasn't able to reach the API and we normally want to fall back to direct, un-loadbalanced API connectivity in that case.

However, due to the fix linked above we now short-circuit the keepalived-monitor update loop if we're unable to retrieve the node list, which is what will happen if the node holding the VIP has neither a local apiserver nor the HAProxy firewall rule. Because of this we will also skip updating the status of the firewall rule and thus the keepalived priority for the node won't be dropped appropriately.

Version-Release number of selected component (if applicable):

We backported the fix linked above to 4.11 so I expect this goes back at least that far.

How reproducible:

Unsure. It's clearly not happening every time, but I have a local dev cluster in this state so it can happen.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

I think the solution here is just to move the firewall rule check earlier in the update loop so it will have run before we try to retrieve nodes. There's no dependency on the ordering of those two steps so I don't foresee any major issues.

To workaround this I believe we can just bounce keepalived on the affected node until the VIP ends up on the node with a local apiserver.

https://github.com/openshift/baremetal-runtimecfg/pull/270

Bug OCPBUGS-12347: Update 4.14 kube-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/94

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/94

Bug OCPBUGS-9464: mtls CRL not working when using an intermediate CA

View the Description View the linked PRs

Description of problem:

mtls connection is not working when using an intermetiate CA appart from the root CA, both with CRL defined.
The Intermediate CA Cert had a published CDP which directed to a CRL issued by the root CA.

The config map in the openshift-ingress namespace contains the CRL as issued by the root CA. The CRL issued by the Intermediate CA is not present since that CDP is in the user cert and so not in the bundle.

When attempting to connect using a user certificate issued by the Intermediate CA it fails with an error of unknown CA.

When attempting to connect using a user certificate issued by the to Root CA the connection is successful.

Version-Release number of selected component (if applicable):

4.10.24

How reproducible:
Always

Steps to Reproduce:

1. Configure CA and intermediate CA with CRL
2. Sign client certificate with the intermediate CA
3. Configure mtls in openshift-ingress

Actual results:

When attempting to connect using a user certificate issued by the Intermediate CA it fails with an error of unknown CA.
When attempting to connect using a user certificate issued by the to Root CA the connection is successful.

Expected results:

Be able to connect with client certificated signed by the intermediate CA

Additional info:

Bug OCPBUGS-13735: Cluster-api SA can't create events

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13034~~. The following is the description of the original issue:
—
Description of problem:

Cluster-api pod can't create events due to RBAC. we may miss some useful event due to this.

E0503 07:20:44.925786       1 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ad1-workers-f5f568855-vnzmn.175b911e43aa3f41", GenerateName:"", Namespace:"ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Machine", Namespace:"ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1", Name:"ad1-workers-f5f568855-vnzmn", UID:"2b40a694-d36d-4b13-9afc-0b5daeecc509", APIVersion:"cluster.x-k8s.io/v1beta1", ResourceVersion:"144260357", FieldPath:""}, Reason:"DetectedUnhealthy", Message:"Machine ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1/ad1-workers/ad1-workers-f5f568855-vnzmn/ has unhealthy node ", Source:v1.EventSource{Component:"machinehealthcheck-controller", Host:""}, FirstTimestamp:time.Date(2023, time.May, 3, 7, 20, 44, 923289409, time.Local), LastTimestamp:time.Date(2023, time.May, 3, 7, 20, 44, 923289409, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1:cluster-api" cannot create resource "events" in API group "" in the namespace "ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1"' (will not retry!)

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Create an hosted cluster
2. Check cluster-api pod for some kind of error (e.g. slow node startup)
3.

Actual results:

Error

Expected results:

Event generated

Additional info:
ClusterRole hypershift-cluster-api is created here https://github.com/openshift/hypershift/blob/e7eb32f259b2a01e5bbdddf2fe963b82b331180f/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L2720

We should add create/patch/update for events there

https://github.com/openshift/hypershift/pull/2586

Bug OCPBUGS-16051: MetalLB does not work when traffic comes from a secondary nic

View the Description View the linked PRs

Description of problem:

MetalLB does not work when traffic comes from a secondary nic. The root cause of this failure is net.ipv4.ip_forward flag change from 1 to 0. If we re-enable this flag everything works as expected.

Version-Release number of selected component (if applicable):

Server Version: 4.14.0-0.nightly-2023-07-05-191022

How reproducible:

Run any test case that tests metallb via secondary interface.

Steps to Reproduce:

1.
2.
3.

Actual results:

Test failed

Expected results:

Test Passed

Additional info:

Looks like this PR is the root cause: https://github.com/openshift/machine-config-operator/pull/3676/files#

https://github.com/openshift/cluster-network-operator/pull/1952

Bug OCPBUGS-16504: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-bootstrap/pull/99

Bug OCPBUGS-11894: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2475

Bug OCPBUGS-15896: STS annotation is invalid

View the Description View the linked PRs

Description of problem:

when applying a CSV with the current label recommendation for STS, the following error occurs:

error creating csv ack-s3-controller.v1.0.3: ClusterServiceVersion.operators.coreos.com "ack-s3-controller.v1.0.3" is invalid: metadata.annotations: Invalid value: "operators.openshift.io/infrastructure-features/token-auth/aws": a qualified name must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. create a CSV with an annotation "operators.openshift.io/infrastructure-features/token-auth/aws: `false`"
2. apply the CSV on cluster

Actual results:

fails with the above error

Expected results:

should not fail

Additional info:

https://github.com/openshift/console/pull/12980

Bug OCPBUGS-10394: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6976

Bug OCPBUGS-11493: vsphereStorageDriver validation is misleading

View the Description View the linked PRs

Description of problem:
{{}}
vsphereStorageDriver validation error message here is odd when I change LegacyDeprecatedInTreeDriver to "" . I get:

Invalid value: "string": VSphereStorageDriver can not be changed once it is set to CSIWithMigrationDriver

There is no CSIWithMigrationDriver either in the old or new Storage CR.

Version-Release number of selected component (if applicable):

4.13 with this PR: https://github.com/openshift/api/pull/1433

https://github.com/openshift/cluster-storage-operator/pull/357

Bug OCPBUGS-14033: Pathological test failing on reason/RecreatingFailedPod in openshift-monitoring

View the Description View the linked PRs

Description of problem:

We have presubmit and periodic jobs failing on

: [sig-arch] events should not repeat pathologically for namespace openshift-monitoring
{  2 events happened too frequently

event happened 21 times, something is wrong: ns/openshift-monitoring statefulset/prometheus-k8s hmsg/6f9bc9e1d7 - pathological/true reason/RecreatingFailedPod StatefulSet openshift-monitoring/prometheus-k8s is recreating failed Pod prometheus-k8s-1 From: 16:11:36Z To: 16:11:37Z result=reject 
event happened 22 times, something is wrong: ns/openshift-monitoring statefulset/prometheus-k8s hmsg/ecfdd1d225 - pathological/true reason/SuccessfulDelete delete Pod prometheus-k8s-1 in StatefulSet prometheus-k8s successful From: 16:11:36Z To: 16:11:37Z result=reject }

The failure occurs when the event happens over 20 times.

The RecreatingFailedPod reason shows up in 4.14 and Presubmits and does not show up in 4.13.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Run presubmits or periodics; here are latest examples:

 2023-05-24 06:25:52.551883+00 | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1661210557367193600                                                                                        | {aws,amd64,sdn,ha,serial}
 2023-05-24 10:20:54.91883+00  | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-gcp-sdn-serial/1661267817128792064                                                                                   | {gcp,amd64,sdn,ha,serial}
 2023-05-24 14:17:18.849402+00 | https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27899/pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade/1661321663389634560                                                                                      | {gcp,amd64,ovn,upgrade,upgrade-micro,ha}
 2023-05-24 14:17:51.908405+00 | https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/1583/pull-ci-openshift-kubernetes-master-e2e-azure-ovn-upgrade/1661324100011823104                                                            | {azure,amd64,ovn,upgrade,upgrade-micro,ha}

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

That event/reason should not show up as a failure in the pathological test

Additional info:

This table shows what variants on 4.14 and Presubmits:

                     variants                     | test_count 
--------------------------------------------------+------------
 {aws,amd64,ovn,upgrade,upgrade-micro,ha}         |         63
 {gcp,amd64,ovn,upgrade,upgrade-micro,ha}         |         14
 {gcp,amd64,sdn,ha,serial,techpreview}            |         12
 {azure,amd64,sdn,ha,serial,techpreview}          |          7
 {aws,amd64,sdn,upgrade,upgrade-micro,ha}         |          6
 {aws,amd64,ovn,ha}                               |          6
 {vsphere-ipi,amd64,ovn,upgrade,upgrade-micro,ha} |          5
 {aws,amd64,sdn,ha,serial}                        |          5
 {azure,amd64,ovn,upgrade,upgrade-micro,ha}       |          5
 {metal-ipi,amd64,ovn,upgrade,upgrade-micro,ha}   |          5
 {vsphere-ipi,amd64,ovn,ha,serial}                |          4
 {gcp,amd64,sdn,ha,serial}                        |          3
 {aws,amd64,ovn,single-node}                      |          3
 {metal-ipi,amd64,ovn,ha,serial}                  |          2
 {aws,amd64,ovn,ha,serial}                        |          2
 {aws,amd64,upgrade,upgrade-micro,ha}             |          1
 {aws,arm64,sdn,ha,serial}                        |          1
 {aws,arm64,ovn,ha,serial,techpreview}            |          1
 {vsphere-ipi,amd64,ovn,ha,serial,techpreview}    |          1
 {aws,amd64,sdn,ha,serial,techpreview}            |          1
 {libvirt,ppc64le,ovn,ha,serial}                  |          1
 {amd64,upgrade,upgrade-micro,ha}                 |          1

Just for my record, I'm using this query to check 4.14 and Presubmits:

SELECT
    rt.created_at, url, variants
FROM
    prow_jobs pj
    JOIN prow_job_runs r ON r.prow_job_id = pj.id
    JOIN prow_job_run_tests rt ON rt.prow_job_run_id = r.id
    JOIN prow_job_run_test_outputs o ON o.prow_job_run_test_id = rt.id
    JOIN tests ON rt.test_id = tests.id
WHERE
    pj.release IN ('4.14', 'Presubmits')
    AND rt.status = 12
    AND tests.id = 65991
    AND o.output LIKE '%RecreatingFailedPod%'
ORDER BY rt.created_at, variants DESC;

And this query for checking 4.13:

SELECT
    rt.created_at, url, variants
FROM
    prow_jobs pj
    JOIN prow_job_runs r ON r.prow_job_id = pj.id
    JOIN prow_job_run_tests rt ON rt.prow_job_run_id = r.id
    JOIN prow_job_run_test_outputs o ON o.prow_job_run_test_id = rt.id
    JOIN tests ON rt.test_id = tests.id
WHERE
    pj.release IN ('4.13')
    AND rt.status = 12
    AND tests.id IN (65991, 244,245)
    AND o.output LIKE '%RecreatingFailedPod%'
ORDER BY rt.created_at, variants DESC;

This shows jobs beginning on 4/13 to today.

Bug OCPBUGS-14890: Missing 'View details' link for several servicemonitors.spec.endpoints fields in YAML sidebar

View the Description View the linked PRs

Description of problem:

when viewing servicemonitor schema in YAML sidebar, for many fields whose type is Object, console doesn't have a 'View details' button to show more details

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-12-044657

How reproducible:

Always

Steps to Reproduce:

1. goes to any ServiceMonitor yaml page, open Schema by clicking on 'View sidebar'
click 'View details' of 'spec' -> click 'View details' of 'endpoints'
2. Check object and array type schema
spec.endpoints.authorization
spec.endpoints.basicAuth
spec.endpoints.bearerTokenSecret
spec.endpoints.oauth2
spec.endpoints.params
spec.endpoints.tlsConfig
spec.endpoints.relabelings

Actual results:

2. there is no 'View details' button for these 'object' and 'array' type field

Expected results:

2. we should provide 'View details' link for 'object' and 'array' fields so that user has ability to view more details

For example
$ oc explain servicemonitors.spec.endpoints.tlsConfig
KIND: ServiceMonitor
VERSION: monitoring.coreos.com/v1RESOURCE: tlsConfig <Object>DESCRIPTION:
TLS configuration to use when scraping the endpointFIELDS:
ca <Object>
Certificate authority used when verifying server certificates. caFile <string>
Path to the CA cert in the Prometheus container to use for the targets. cert <Object>
Client certificate to present when doing client-authentication. certFile <string>
Path to the client cert file in the Prometheus container for the targets. insecureSkipVerify <boolean>
Disable target certificate validation. keyFile <string>
Path to the client key file in the Prometheus container for the targets. keySecret <Object>
Secret containing the client key file for the targets. serverName <string>
Used to verify the hostname for the targets.

oc explain servicemonitors.spec.endpoints.relabelings
KIND: ServiceMonitor
VERSION: monitoring.coreos.com/v1RESOURCE: relabelings <[]Object>DESCRIPTION:
RelabelConfigs to apply to samples before scraping. Prometheus Operator
automatically adds relabelings for a few standard Kubernetes fields. The
original scrape job's name is available via the `__tmp_prometheus_job_name`
label. More info:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config RelabelConfig allows dynamic rewriting of the label set, being applied to
samples before ingestion. It defines `<metric_relabel_configs>`-section of
Prometheus configuration. More info:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configsFIELDS:
action <string>
Action to perform based on regex matching. Default is 'replace'. uppercase
and lowercase actions require Prometheus >= 2.36. modulus <integer>
Modulus to take of the hash of the source label values. regex <string>
Regular expression against which the extracted value is matched. Default is
'(.*)' replacement <string>
Replacement value against which a regex replace is performed if the regular
expression matches. Regex capture groups are available. Default is '$1' separator <string>
Separator placed between concatenated source label values. default is ';'. sourceLabels <[]string>
The source labels select values from existing labels. Their content is
concatenated using the configured separator and matched against the
configured regular expression for the replace, keep, and drop actions. targetLabel <string>
Label to which the resulting value is written in a replace action. It is
mandatory for replace actions. Regex capture groups are available.

Additional info:

https://github.com/openshift/console/pull/12895

Bug OCPBUGS-14945: machine-config-daemon rprivate default mount propagation with `hostPath: path: /` breaks CSI driver relying on multipath

View the Description View the linked PRs

Description of problem:

`rprivate`  default mount propagation in combination with `hostPath: path: /` breaks CSI driver relying on multipath

How reproducible:

Always

Steps to Reproduce (simplified):

1. ssh to node, 
2.  mount a partition (for instance) /dev/{s,v}da2 which on CoreOs is an UEFI FAT partition
    $ sudo mount /dev/vda2 /mnt
3. start a debug pod on that node ( or any pod that does a hostPath mount of /, like the node tuning operand pod, the machine config operand, the filesystem integrity operand ) 
    $ oc debug nodes/master-2.sharedocp4upi411ovn.lab.upshift.rdu2.redhat.com
4. unmount the partition on node

5. notice the debug pod still has a reference to the filesystem
grep vda2 /proc/*/mountinfo
/proc/3687945/mountinfo:11219 10837 252:2 / /host/var/mnt rw,relatime - vfat /dev/vda2 rw,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro

6. On the node, although the mount is absent from /proc/mounts, the file system is still mounted, as shown by the dirty bit being still set on the FAT filesystem:

sudo fsck -n  /dev/vda2 
fsck from util-linux 2.32.1
fsck.fat 4.1 (2017-01-24)
0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.

Expected results:

File system is unmounted in host and in container.

Additional info:

Although the steps above show the behaviour in a simple way, this becomes quite problematic when using multipath on a host mount.
We noticed in a customer environment that we cannot reschedule some pods from old node to new node using oc adm drain when these pods have a Persistent Volume mount created by the third party CSI driver block.csi.ibm.com.

The CSI driver is using multipath from CoreOS to manage multipath block devices, however the multipath daemon blocks the volume removal from the node (the multipath -f flushing calls from the CSI driver always return busy. Flushing a multiple device means removing it from the device tree in /dev in storage parlance)

multipath flush are always failing because although the multipath block device is unmounted on the host, machine-config, file integrity, node tuning pods are doing hostPath volume mounts of /, the host root filesystem.
and thus get a copy of the mounts.
Due to that mount copy the kernel sees the filesystem is still in use, although there a no file descriptors open on that filesyste, and considers it is unsafe to remove the multipath block device, and the node CSI driver cannot finish the unmount of the volume, thus blocking the container creation on another node.

We can see this mount copies by looking at /proc/<container pid>/mountinfo:

$ grep mpathes proc/*/mountinfo
proc/3295781/mountinfo:56348 52693 253:42 / /var/lib/kubelet/plugins/kubernetes.io/csi/block.csi.ibm.com/12345/globalmount rw,relatime - xfs /dev/mapper/mpathes rw,seclabel,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota

cri-o is doing this mount copy using `rprivate` mount propagation
( see https://github.com/cri-o/cri-o/blob/b098bec2d4d79bdf99c3ce89b0eeb16bfe8b5645/server/container_create_linux.go#L1030 )

the semantics of rprivate are mapped in`runc`
https://github.com/opencontainers/runc/blob/ba58ee9c3b9550c3e32b94802b0fb29761955290/libcontainer/specconv/spec_linux.go#L55
to mount flags passed to the mount(2) systemcall

MS_REC (since Linux 2.4.11)
              Used  in  conjunction  with  MS_BIND to create a recursive bind mount, and in
              conjunction with the propagation type flags to recursively change the  propa‐
              gation  type  of  all  of the mounts in a subtree.  See below for further de‐
              tails.

MS_PRIVATE
              Make this mount private.  Mount and unmount events do not propagate  into  or
              out of this mount.

the key is the MS_PRIVATE mount here. The unmounting of the multipath block device is not propagated to the mount namespace of containers, thus keeping the filesystem eternally mounted, preventing the flushing of the multipath device.

Maybe hostPath mounts should be done using `rslave` mount propagation, when we see we try to bind mount /var/lib ?
Seems cri-dockerd is doing something similar according to https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation

https://github.com/openshift/machine-config-operator/pull/3792

Bug HELM-484: Basic authentication documentation update

View the Description View the linked PRs

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Follow https://github.com/openshift/console/blob/master/docs/helm/configure-namespaced-helm-repos.md for adding project helm chart repositories supporting basic auth.
If we create a repository and provide basicAuthConfig as shown in current documentation we will get an error.
The documentation here needs an update as the basicAuthConfig secret name should be specified with a `name` field

Actual results:

* spec.connectionConfig.basicAuthConfig: Invalid value: "string": spec.connectionConfig.basicAuthConfig in body must be of type object: "string"
Expected results:

We should be able to add the repository supporting basic auth

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Additional info:

Documentation Requirement: Yes/No (needs-docs|upstream-docs / no-doc)

Upstream: <Inputs/Requirement details>/ Not Applicable

Downstream: <Type: Doc defect/More inputs to doc>/ Not Applicable

Provide link to the relevant section
Provide doc inputs and details required

Release Notes Type: <New Feature/Enhancement/Known Issue/Bug
fix/Breaking change/Deprecated Functionality/Technology Preview>

https://github.com/openshift/console/pull/12768

Bug OCPBUGS-15877: LatencySensitive featureset must be removed

View the Description View the linked PRs

LatencySensitive has been functionally equivalent to "" (Default) for several years. Code has forgotten that the featureset must be handled and its more efficacious to remove the featureset (with migration code) than try to plug all the holes.

To ensure this is working, update a cluster to use LatencySensitve and see that the FEatureSet value is reset after two minutes

https://github.com/openshift/cluster-config-operator/pull/325

Bug OCPBUGS-19794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3051

Bug OCPBUGS-10612: oc commands should use podman credentials by default instead of docker ones

View the Description View the linked PRs

Description of problem:

In 4.10 we added an option REGISTRY_AUTH_PREFERENCE to opt-in for podman registry auth file prefence reading order. This is important for oc registry commands like oc registry login and oc image. https://github.com/openshift/oc/pull/893

We also started warning users that we will remove support for docker order and default to podman order - meaning we will check podman locations first and then we will fallback to docker locations.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

We should default to podman auth file locations and remove a warning when using oc registry login or oc image commands without REGISTRY_AUTH_PREFERENCE variable.

Additional info:

https://github.com/openshift/oc/pull/1376

Bug OCPBUGS-15359: Operator installed Namespace dropdown should always be enabled and user can choose from the full set of namespaces

View the Description View the linked PRs

Description of problem:

During an operator installation with the Installation mode set to all namespaces, the "Installed Namespace" dropdown selection is restricted to "openshift-operators" or another specific namespace, if one is recommended by the operator owners.

With to recent* change to allow non-latest operator version installs, users should be allowed to select any namespace to install a globally installed operator.

Related info:
Operators can now be installed on non-latest versions with the merge of * https://github.com/openshift/console/pull/12743 They require a manual approval and because of the way InstallPlan upgrades work, this effects all operators installed that namespace.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12975

Bug OCPBUGS-19496: cluster-autoscaler-operator clusterrole needs watch on clusteroperators

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19411~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.oc -n openshift-machine-api get role/cluster-autoscaler-operator -o yaml
2. Observe missing watch verb
3. Tail cluster-autoscaler logs to see error

status.go:444] No ClusterAutoscaler. Reporting available.
I0919 16:40:52.877216       1 status.go:244] Operator status available: at version 4.14.0-rc.1
E0919 16:40:53.719592       1 reflector.go:148] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.ClusterOperator: unknown (get clusteroperators.config.openshift.io)

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-autoscaler-operator/pull/288

Bug OCPBUGS-19926: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-libvirt/pull/269

Bug OCPBUGS-16203: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2045

Bug OCPBUGS-18881: Failure when creating operator-backed resources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18439~~. The following is the description of the original issue:
—
Description of problem:

In the developer sandbox, the happy path to create operator-backed resources is broken.

Users can only work on their assigned namespace. When doing so, and attempting to create an Operator-backed resource from the Developer console, the user interface switches inadvertendly the working namespace from the user's to the `openshift` one. The console shows an error message when the user clicks the "create" button.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Login to the Developer Sandbox
2. Choose the Developer view
3. Click Add+ -> Developer Catalog -> Operator Backed
4. Filter by "integration"
5. Notice the working namespace is still the user's one. 
6. Select "Integration" (Camel K operator)
7. Click "Create"
8. Notice the working namespace has switched to `openshift`
9. Notice the custom resource in YAML view includes `namespace: openshift`
10. Click "Create"

Actual results:

An error message shows: "Danger alert:An error occurredintegrations.camel.apache.org is forbidden: User "bmesegue" cannot create resource "integrations" in API group "camel.apache.org" in the namespace "openshift""

Expected results:

On step 8, the working directory should remain the user's one
On step 9, in the YAML view, the namespace should be the user's one, or none.
After step 10, the creation process should trigger the creation of a Camel K integration.

Additional info:

https://github.com/openshift/console/pull/13150

Story MGMT-14195: Update gomock.Any() calls in infrastructure_test

View the Description View the linked PRs

The code in our infrastructure test needs to be updated to make the test more accurate. Currently we are targeting gomock.any() in many cases, this means that the tests are not as accurate as they could be.

Updates should be similar to ~~MGMT-13918~~

https://github.com/openshift/assisted-service/pull/5104

Bug MGMT-14416: [Staging] [UI] - In networking page - activating DHCP and then trying to switch to UMN throws an error

View the Description View the linked PRs

Description of the problem:

In Staging, UI 2.18.6 - Enable DHCP and then switch to UMN --> BE response "User Managed Networking cannot be set with VIP DHCP Allocation"

How reproducible:

100%

Steps to reproduce:

1. In networking page - enable DHCP

2. Switch to UMN

3. BE response with "User Managed Networking cannot be set with VIP DHCP Allocation"

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5209

Bug OCPBUGS-13833: Installing cert-manager operator in console, its version flips constantly between v1.10.2 and v1.11.1

View the Description View the linked PRs

Description of problem:

Install cert-manager operator of version cert-manager-operator-bundle:v1.11.1-6 from console, the UI shown version slips between from v1.11.1 and v1.10.2 and v1.11.1 again and v1.10.2 again ... constantly.

Version-Release number of selected component (if applicable):

cert-manager-operator-bundle:v1.11.1-6, 4.13.0-0.nightly-2023-05-18-195839

How reproducible:

Always. I tried a few times in different envs, double confirmed.

Steps to Reproduce:

1. Install cert-manager operator of version cert-manager-operator-bundle:v1.11.1-6 from console
2. Watch console
3.

Actual results:

The UI shown version slips between from v1.11.1 and v1.10.2 and v1.11.1 again and v1.10.2 again ... constantly.
See attached video https://drive.google.com/drive/folders/1AFWquCK-pDCoQFMEOONQwGByBUg6tKR9?usp=sharing .

Expected results:

Should always show v1.11.1

Additional info:

No matter using index image v4.13 brew.registry.redhat.io/rh-osbs/iib:500235 (gotten from email "[CVP] (SUCCESS) (cvp-redhatopenshiftcfe: cert-manager-operator-bundle-container-v1.11.1-6)") or brew.registry.redhat.io/rh-osbs/iib-pub-pending:v4.13, both reproduced it.

https://github.com/openshift/console/pull/12743

Bug OCPBUGS-19860: Multus annotation permissions: Certificate duration should be configurable [backport 4.14]

View the Description View the linked PRs

Description of problem: the per-node certificates should be a configurable duration

https://github.com/openshift/multus-cni/pull/192

Story ODC-7319: Correcting - Missing package tag across gherkin files

View the Description View the linked PRs

Description

Multiple gherkin files have missing package tags, these tags can be utilised for further automation. Currently tag allocation is inconsistent across gherkin files.

Acceptance Criteria

Every gherkin file should have package tag in it's first line.

PR: https://github.com/openshift/console/pull/12847

https://github.com/openshift/console/pull/12847

Bug OCPBUGS-11450: Multus admission controller must have "hypershift.openshift.io/release-image" annotation when CNO is managed by Hypershift

View the Description View the linked PRs

Description of problem:

When CNO is managed by Hypershift, it's deployment has "hypershift.openshift.io/release-image" template metadata annotation. The annotation's value is used to track progress of cluster control plane version upgrades. But multus-admission-controller created and managed by CNO does not have that annotation so service providers are not able to track its version upgrades.

The proposed solution is for CNO to propagate its "hypershift.openshift.io/release-image" annotation down to the multus-admission-controller deployment. For that CNO need to have "get" access to its own deployment manifest to be able to read the deployment template metadata annotations. 

Hypershift needs code change to assign CNO "get" permission on the CNO deployment object.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Check deployment template metadata annotations on multus-admission-controller

Actual results:

No "hypershift.openshift.io/release-image" deployment template metadata annotation exists

Expected results:

"hypershift.openshift.io/release-image" annotation must be present

Additional info:

https://github.com/openshift/hypershift/pull/2384

Bug OCPBUGS-16511: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-8215: Ignore device list missing in Node Exporter

View the Description View the linked PRs

Description of problem:

When setting no configuration for node-exporter in CMO config, we did not see the 2 arguments collector.netclass.ignored-devices and collector.netdev.device-exclude in node-exporter daemonset, full info see: http://pastebin.test.redhat.com/1093428

and checked in 4.13.0-0.nightly-2023-02-27-101545, no configuration for node-exporter, there is collector.netclass.ignored-devices setting
see from: http://pastebin.test.redhat.com/1093429

after disabled netdev/netclass on bot cluster, would see collector.netclass.ignored-devices and collector.netdev.device-exclude settings in node-exporter, since OCPBUGS-7282 is filed on 4.12, disable netdev/netclass is not supported then, I don't think we should disable netdev/netclass

$ oc -n openshift-monitoring get ds node-exporter -oyaml | grep collector
        - --no-collector.wifi
        - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|run/k3s/containerd/.+|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
        - --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
        - --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
        - --collector.cpu.info
        - --collector.textfile.directory=/var/node_exporter/textfile
        - --no-collector.cpufreq
        - --no-collector.tcpstat
        - --no-collector.netdev
        - --no-collector.netclass
        - --no-collector.buddyinfo
        - '[[ ! -d /node_exporter/collectors/init ]] || find /node_exporter/collectors/init

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

The 2 arguments are missing when booting up OCP with default configurations for CMO.

Actual results:

The 2 arguments collector.netclass.ignored-devices and collector.netdev.device-exclude are missing in node-exporter DaemonSet.

Expected results:

The 2 arguments collector.netclass.ignored-devices and collector.netdev.device-exclude are present in node-exporter DaemonSet.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1909

Bug OCPBUGS-9972: Azure; NLB; OVN-K: Requests from CNI pods to internalAPI server domain fails intermittently

View the Description View the linked PRs

Description of problem:

OpenShift Container Platform 4.12.5 installation with IPI installation method on Microsoft Azure is showing undesired behavior when trying to curl "https://api.<clustername>.<domain>:6443/readyz". When using `HostNetwork` it all works without any issues. But when doing the same request from a pod that does not have `HostNetwork` capabilties and therefore has an IP from the SDN range, a big portion of the requests is failing.

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.5    True        False         29m     Cluster version is 4.12.5

$ oc get network cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2023-03-10T13:12:06Z"
  generation: 2
  name: cluster
  resourceVersion: "2975"
  uid: e1e9c464-526c-4ebf-ab84-0deedf092cac
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  clusterNetworkMTU: 1400
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16

$ oc get infrastructure cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-03-10T13:12:04Z"
  generation: 1
  name: cluster
  resourceVersion: "430"
  uid: 5c260276-d901-40f7-a28c-172c492e81e6
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: Azure
status:
  apiServerInternalURI: https://api-int.clustername.domain.lab:6443
  apiServerURL: https://api.clustername.domain.lab:6443
  controlPlaneTopology: HighlyAvailable
  etcdDiscoveryDomain: ""
  infrastructureName: sreberazure-njj24
  infrastructureTopology: HighlyAvailable
  platform: Azure
  platformStatus:
    azure:
      cloudName: AzurePublicCloud
      networkResourceGroupName: sreberazure-njj24-rg
      resourceGroupName: sreberazure-njj24-rg
    type: Azure

$ oc project openshift-apiserver
Already on project "openshift-apiserver" on server "https://api.clustername.domain.lab:6443".
$ oc get pod
NAME                         READY   STATUS    RESTARTS   AGE
apiserver-6f58784797-kq4kr   2/2     Running   0          41m
apiserver-6f58784797-l69jr   2/2     Running   0          38m
apiserver-6f58784797-nn6tn   2/2     Running   0          45m

$ oc get pod -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE                         NOMINATED NODE   READINESS GATES
apiserver-6f58784797-kq4kr   2/2     Running   0          42m   10.130.0.21   sreberazure-njj24-master-0   <none>           <none>
apiserver-6f58784797-l69jr   2/2     Running   0          38m   10.129.0.29   sreberazure-njj24-master-2   <none>           <none>
apiserver-6f58784797-nn6tn   2/2     Running   0          45m   10.128.0.36   sreberazure-njj24-master-1   <none>           <none>

$ oc rsh apiserver-6f58784797-l69jr
Defaulted container "openshift-apiserver" out of: openshift-apiserver, openshift-apiserver-check-endpoints, fix-audit-permissions (init)
sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
curl: (28) Connection timed out after 1000 milliseconds
okokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1003 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
okokokokokokokokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
^C
sh-4.4# exit
exit
command terminated with exit code 130

$ oc project openshift-kube-apiserver
Now using project "openshift-kube-apiserver" on server "https://api.clustername.domain.lab:6443".

$ oc get pod -o wide
NAME                                              READY   STATUS      RESTARTS   AGE   IP            NODE                         NOMINATED NODE   READINESS GATES
apiserver-watcher-sreberazure-njj24-master-0      1/1     Running     0          55m   10.0.0.6      sreberazure-njj24-master-0   <none>           <none>
apiserver-watcher-sreberazure-njj24-master-1      1/1     Running     0          57m   10.0.0.8      sreberazure-njj24-master-1   <none>           <none>
apiserver-watcher-sreberazure-njj24-master-2      1/1     Running     0          57m   10.0.0.7      sreberazure-njj24-master-2   <none>           <none>
installer-2-sreberazure-njj24-master-2            0/1     Completed   0          51m   10.129.0.27   sreberazure-njj24-master-2   <none>           <none>
installer-3-sreberazure-njj24-master-2            0/1     Completed   0          50m   10.129.0.32   sreberazure-njj24-master-2   <none>           <none>
installer-4-sreberazure-njj24-master-2            0/1     Completed   0          49m   10.129.0.36   sreberazure-njj24-master-2   <none>           <none>
installer-5-sreberazure-njj24-master-2            0/1     Completed   0          46m   10.129.0.15   sreberazure-njj24-master-2   <none>           <none>
installer-6-sreberazure-njj24-master-0            0/1     Completed   0          37m   10.130.0.27   sreberazure-njj24-master-0   <none>           <none>
installer-6-sreberazure-njj24-master-1            0/1     Completed   0          39m   10.128.0.45   sreberazure-njj24-master-1   <none>           <none>
installer-6-sreberazure-njj24-master-2            0/1     Completed   0          36m   10.129.0.37   sreberazure-njj24-master-2   <none>           <none>
kube-apiserver-guard-sreberazure-njj24-master-0   1/1     Running     0          37m   10.130.0.29   sreberazure-njj24-master-0   <none>           <none>
kube-apiserver-guard-sreberazure-njj24-master-1   1/1     Running     0          38m   10.128.0.47   sreberazure-njj24-master-1   <none>           <none>
kube-apiserver-guard-sreberazure-njj24-master-2   1/1     Running     0          50m   10.129.0.31   sreberazure-njj24-master-2   <none>           <none>
kube-apiserver-sreberazure-njj24-master-0         5/5     Running     0          37m   10.0.0.6      sreberazure-njj24-master-0   <none>           <none>
kube-apiserver-sreberazure-njj24-master-1         5/5     Running     0          38m   10.0.0.8      sreberazure-njj24-master-1   <none>           <none>
kube-apiserver-sreberazure-njj24-master-2         5/5     Running     0          34m   10.0.0.7      sreberazure-njj24-master-2   <none>           <none>
revision-pruner-6-sreberazure-njj24-master-0      0/1     Completed   0          33m   10.130.0.35   sreberazure-njj24-master-0   <none>           <none>
revision-pruner-6-sreberazure-njj24-master-1      0/1     Completed   0          33m   10.128.0.56   sreberazure-njj24-master-1   <none>           <none>
revision-pruner-6-sreberazure-njj24-master-2      0/1     Completed   0          33m   10.129.0.39   sreberazure-njj24-master-2   <none>           <none>

$ oc rsh kube-apiserver-sreberazure-njj24-master-1
sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokok

Also changing  `--connect-timeout 1` from curl to `--connect-timeout 10` for example does not have any impact. It simply takes longer until the timeout is reached.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12 (also previous version were not tested)

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.12 on Azure using IPI install method and set the SDN to OVN-Kubernetes
2. Once successfully installed run `oc project openshift-apiserver`
3. rsh apiserver-<podID>
4. while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done

Actual results:

sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
curl: (28) Connection timed out after 1000 milliseconds
okokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1003 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
okokokokokokokokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds

Expected results:

sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokok

Additional info:

https://github.com/openshift/machine-config-operator/pull/3878

Story HOSTEDCP-975: Review NodePool metrics and set some internal SLOs/SLIs

View the Description View the linked PRs

Follow up for https://issues.redhat.com/browse/HOSTEDCP-969

Create metrics and grafana panel in

https://hypershift-monitoring.homelab.sjennings.me:3000/d/PGCTmCL4z/hypershift-slos-slis-alberto-playground?orgId=1&from=now-24h&to=now

https://github.com/openshift/hypershift/tree/main/contrib/metrics

for NodePool internal SLOs/SLIs:

NodePoolDeletionDuration
NodePoolInitialRolloutDuration

Move existing metrics when possible from metrics loop into nodepool controller:

- nodePoolSize

Explore and discuss granular metrics to track NodePool lifecycle bottle necks, infra, ignition, node networking, available. Consolidate that with hostedClusterTransitionSeconds metrics and dashboard panels

Explore and discuss metrics for upgrade duration SLO for both HC and NodePool.

Bug OCPBUGS-10551: 4.13 uses pre-release of vSphere CSI driver

View the Description View the linked PRs

Description of problem:

OCP 4.13 uses a release candidate v3.0.0-rc.1 of vsphere-csi-driver. We should ship OCp with a GA version

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-17-161027

https://github.com/openshift/vmware-vsphere-csi-driver/pull/64

Bug OCPBUGS-7694: prometheus adapter crashlooping

View the Description View the linked PRs

Trying to update my cluster from 4.12.0 to 4.12.2 and this resulted in a crashlooping state for both prometheus adapter pods. Tried to downgrade back to 4.12.0 and then upgrade to 4.12.4 but neither approach solved the situation.

What I can see in the logs of the adapters is the following:

I0216 15:24:59.144559 1 adapter.go:114] successfully using in-cluster auth
I0216 15:25:00.345620 1 request.go:601] Waited for 1.180640418s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s
I0216 15:25:10.345634 1 request.go:601] Waited for 11.180149045s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/triggers.tekton.dev/v1beta1?timeout=32s
I0216 15:25:20.346048 1 request.go:601] Waited for 2.597453714s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
I0216 15:25:30.347435 1 request.go:601] Waited for 12.598768922s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s
I0216 15:25:40.545767 1 request.go:601] Waited for 22.797001115s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/samples.operator.openshift.io/v1?timeout=32s
I0216 15:25:50.546588 1 request.go:601] Waited for 32.797748538s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/metrics.k8s.io/v1beta1?timeout=32s
I0216 15:25:56.041594 1 secure_serving.go:210] Serving securely on [::]:6443
I0216 15:25:56.042265 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/etc/tls/private/tls.crt::/etc/tls/private/tls.key"
I0216 15:25:56.042971 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/etc/tls/private/requestheader-client-ca-file"
I0216 15:25:56.043309 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0216 15:25:56.043310 1 object_count_tracker.go:84] "StorageObjectCountTracker pruner is exiting"
I0216 15:25:56.043398 1 dynamic_serving_content.go:146] "Shutting down controller" name="serving-cert::/etc/tls/private/tls.crt::/etc/tls/private/tls.key"
I0216 15:25:56.043562 1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I0216 15:25:56.043606 1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/tls/private/client-ca-file"
I0216 15:25:56.043614 1 secure_serving.go:255] Stopped listening on [::]:6443
I0216 15:25:56.043621 1 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/tls/private/client-ca-file"
I0216 15:25:56.043635 1 dynamic_cafile_content.go:171] "Shutting down controller" name="request-header::/etc/tls/private/requestheader-client-ca-file"

I also tried to search online for known issues and bugs and found this one that might be related:

https://github.com/kubernetes-sigs/metrics-server/issues/983

I also tried rebooting the server but it didn't help.

Need a workaround at least because at the moment the cluster is still in a pending stage.

https://github.com/openshift/cluster-monitoring-operator/pull/1917

Bug OCPBUGS-10887: oauth-server fails to invalidate cache, causing non existing groups being referenced

View the Description View the linked PRs

Description of problem:

Following https://bugzilla.redhat.com/show_bug.cgi?id=2102765 respectively https://issues.redhat.com/browse/OCPBUGS-2140 problems with OpenID Group sync have been resolved.

Yet the problem documented in https://bugzilla.redhat.com/show_bug.cgi?id=2102765 still does exist and we see that Groups that are being removed are still part of the chache in oauth-apiserver, causing a panic of the respective components and failures during login for potentially affected users.

So in general, it looks like that oauth-apiserver cache is not properly refreshing or handling the OpenID Groups being synced.

E1201 11:03:14.625799       1 runtime.go:76] Observed a panic: interface conversion: interface {} is nil, not *v1.Group
goroutine 3706798 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:103 +0xb0
panic({0x1aeab00, 0xc001400390})
    runtime/panic.go:838 +0x207
k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1.1()
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit.go:80 +0x2a
k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1()
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit.go:89 +0x250
panic({0x1aeab00, 0xc001400390})
    runtime/panic.go:838 +0x207
github.com/openshift/library-go/pkg/oauth/usercache.(*GroupCache).GroupsFor(0xc00081bf18?, {0xc000c8ac03?, 0xc001400360?})
    github.com/openshift/library-go@v0.0.0-20211013122800-874db8a3dac9/pkg/oauth/usercache/groups.go:47 +0xe7
github.com/openshift/oauth-server/pkg/groupmapper.(*UserGroupsMapper).processGroups(0xc0002c8880, {0xc0005d4e60, 0xd}, {0xc000c8ac03, 0x7}, 0x1?)
    github.com/openshift/oauth-server/pkg/groupmapper/groupmapper.go:101 +0xb5
github.com/openshift/oauth-server/pkg/groupmapper.(*UserGroupsMapper).UserFor(0xc0002c8880, {0x20f3c40, 0xc000e18bc0})
    github.com/openshift/oauth-server/pkg/groupmapper/groupmapper.go:83 +0xf4
github.com/openshift/oauth-server/pkg/oauth/external.(*Handler).login(0xc00022bc20, {0x20eebb0, 0xc00041b058}, 0xc0015d8200, 0xc001438140?, {0xc0000e7ce0, 0x150})
    github.com/openshift/oauth-server/pkg/oauth/external/handler.go:209 +0x74f
github.com/openshift/oauth-server/pkg/oauth/external.(*Handler).ServeHTTP(0xc00022bc20, {0x20eebb0, 0xc00041b058}, 0x0?)
    github.com/openshift/oauth-server/pkg/oauth/external/handler.go:180 +0x74a
net/http.(*ServeMux).ServeHTTP(0x1c9dda0?, {0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    net/http/server.go:2462 +0x149
github.com/openshift/oauth-server/pkg/server/headers.WithRestoreAuthorizationHeader.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    github.com/openshift/oauth-server/pkg/server/headers/oauthbasic.go:27 +0x10f
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0xc0005e0280?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/authorization.go:64 +0x498
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:79 +0x178
net/http.HandlerFunc.ServeHTTP(0x2f6cea0?, {0x20eebb0?, 0xc00041b058?}, 0x3?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/maxinflight.go:187 +0x2a4
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0x11?, {0x20eebb0?, 0xc00041b058?}, 0x1aae340?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/impersonation.go:50 +0x21c
net/http.HandlerFunc.ServeHTTP(0xc000d52120?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:79 +0x178
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0xc0015d8100?, {0x20eebb0?, 0xc00041b058?}, 0xc000531930?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1({0x7fae682a40d8?, 0xc00041b048}, 0x9dbbaa?)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit.go:111 +0x549
net/http.HandlerFunc.ServeHTTP(0xc00003def0?, {0x7fae682a40d8?, 0xc00041b048?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x7fae682a40d8, 0xc00041b048}, 0xc0015d8100)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:79 +0x178
net/http.HandlerFunc.ServeHTTP(0x0?, {0x7fae682a40d8?, 0xc00041b048?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x7fae682a40d8, 0xc00041b048}, 0xc0015d8100)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x7fae682a40d8?, 0xc00041b048?}, 0x20cfd00?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.withAuthentication.func1({0x7fae682a40d8, 0xc00041b048}, 0xc0015d8100)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/authentication.go:80 +0x8b9
net/http.HandlerFunc.ServeHTTP(0x20f0f20?, {0x7fae682a40d8?, 0xc00041b048?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x7fae682a40d8, 0xc00041b048}, 0xc000e69e00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:88 +0x46b
net/http.HandlerFunc.ServeHTTP(0xc0019f5890?, {0x7fae682a40d8?, 0xc00041b048?}, 0xc000848764?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.WithCORS.func1({0x7fae682a40d8, 0xc00041b048}, 0xc000e69e00)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/cors.go:75 +0x10b
net/http.HandlerFunc.ServeHTTP(0xc00149a380?, {0x7fae682a40d8?, 0xc00041b048?}, 0xc0008487d0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1()
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:108 +0xa2
created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:94 +0x2cc

goroutine 3706802 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x19eb780?, 0xc001206e20})
    k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:74 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0xc0016aec60, 0x1, 0x1560f26?})
    k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:48 +0x75
panic({0x19eb780, 0xc001206e20})
    runtime/panic.go:838 +0x207
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc0005047c8, {0x20eecd0?, 0xc0010fae00}, 0xdf8475800?)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:114 +0x452
k8s.io/apiserver/pkg/endpoints/filters.withRequestDeadline.func1({0x20eecd0, 0xc0010fae00}, 0xc000e69d00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/request_deadline.go:101 +0x494
net/http.HandlerFunc.ServeHTTP(0xc0016af048?, {0x20eecd0?, 0xc0010fae00?}, 0xc0000bc138?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1({0x20eecd0?, 0xc0010fae00}, 0xc000e69d00)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/waitgroup.go:59 +0x177
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20eecd0?, 0xc0010fae00?}, 0x7fae705daff0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithAuditAnnotations.func1({0x20eecd0, 0xc0010fae00}, 0xc000e69c00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit_annotations.go:37 +0x230
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20eecd0?, 0xc0010fae00?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithWarningRecorder.func1({0x20eecd0?, 0xc0010fae00}, 0xc000e69b00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/warning.go:35 +0x2bb
net/http.HandlerFunc.ServeHTTP(0x1c9dda0?, {0x20eecd0?, 0xc0010fae00?}, 0xd?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1({0x20eecd0, 0xc0010fae00}, 0x0?)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/cachecontrol.go:31 +0x126
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20eecd0?, 0xc0010fae00?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/httplog.WithLogging.func1({0x20ef480?, 0xc001c20620}, 0xc000e69a00)
    k8s.io/apiserver@v0.22.2/pkg/server/httplog/httplog.go:103 +0x518
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20ef480?, 0xc001c20620?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1({0x20ef480, 0xc001c20620}, 0xc000e69900)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/requestinfo.go:39 +0x316
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20ef480?, 0xc001c20620?}, 0xc0007c3f70?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.withRequestReceivedTimestampWithClock.func1({0x20ef480, 0xc001c20620}, 0xc000e69800)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/request_received_time.go:38 +0x27e
net/http.HandlerFunc.ServeHTTP(0x419e2c?, {0x20ef480?, 0xc001c20620?}, 0xc0007c3e40?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1({0x20ef480?, 0xc001c20620?}, 0xc0004ff600?)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/wrap.go:74 +0xb1
net/http.HandlerFunc.ServeHTTP(0x1c05260?, {0x20ef480?, 0xc001c20620?}, 0x8?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.withAuditID.func1({0x20ef480, 0xc001c20620}, 0xc000e69600)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/with_auditid.go:66 +0x40d
net/http.HandlerFunc.ServeHTTP(0x1c9dda0?, {0x20ef480?, 0xc001c20620?}, 0xd?)
    net/http/server.go:2084 +0x2f
github.com/openshift/oauth-server/pkg/server/headers.WithPreserveAuthorizationHeader.func1({0x20ef480, 0xc001c20620}, 0xc000e69600)
    github.com/openshift/oauth-server/pkg/server/headers/oauthbasic.go:16 +0xe8
net/http.HandlerFunc.ServeHTTP(0xc0016af9d0?, {0x20ef480?, 0xc001c20620?}, 0x16?)
    net/http/server.go:2084 +0x2f
github.com/openshift/oauth-server/pkg/server/headers.WithStandardHeaders.func1({0x20ef480, 0xc001c20620}, 0x4d55c0?)
    github.com/openshift/oauth-server/pkg/server/headers/headers.go:30 +0x18f
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20ef480?, 0xc001c20620?}, 0xc0016afac8?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc00098d622?, {0x20ef480?, 0xc001c20620?}, 0xc000401000?)
    k8s.io/apiserver@v0.22.2/pkg/server/handler.go:189 +0x2b
net/http.serverHandler.ServeHTTP({0xc0019f5170?}, {0x20ef480, 0xc001c20620}, 0xc000e69600)
    net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc0002b1720, {0x20f0f58, 0xc0001e8120})
    net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
    net/http/server.go:3071 +0x4db

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.11.13

How reproducible:

- Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.11
2. Configure OpenID Group Sync (as per https://docs.openshift.com/container-platform/4.11/authentication/identity_providers/configuring-oidc-identity-provider.html#identity-provider-oidc-CR_configuring-oidc-identity-provider)
3. Have users with hundrets of groups
4. Login and after a while, remove some Groups from the user in the IDP and from OpenShift Container Platform 
5. Try to login again and see the panic in oauth-apiserver

Actual results:

User is unable to login and oauth pods are reporting a panic as shown above

Expected results:

oauth-apiserver should invalidate the cache quickly to remove potential invalid references to non exsting groups

Additional info:

https://github.com/openshift/oauth-server/pull/123

Bug OCPBUGS-11389: CPMS doesn't always generate configurations for AWS

View the Description View the linked PRs

Description of problem:

In certain cases, an AWS cluster running 4.12 doesn't automatically generate a controlplanemachineset when it's expected to.

It looks like CPMS is looking for `infrastructure.Spec.PlatformSpec.Type` (https://github.com/openshift/cluster-control-plane-machine-set-operator/blob/2aeaaf9ec714ee75f933051c21a44f648d6ed42b/pkg/controllers/controlplanemachinesetgenerator/controller.go#L180) and as result, clusters born earlier than 4.5 when this field was introduced (https://github.com/openshift/installer/pull/3277) will not be able to generate a CPMS.

I believe we should be looking at `infrastructure.Status.PlatformStatus.Type` instead

Version-Release number of selected component (if applicable):

4.12.9

How reproducible:

Consistent

Steps to Reproduce:

1. Install a cluster on a version earlier than 4.5
2. Upgrade cluster through to 4.12
3. Observe "Unable to generate control plane machine set, unsupported platform" error message from the control-plane-machine-set-operator, as well as the missing CPMS object in the openshift-machine-api namespace

Actual results:

No generated CPMS is created, despite the platform being AWS

Expected results:

A generated CPMS existing in the openshift-machine-api namespace

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/185

Bug OCPBUGS-14434: Running `yarn dev` results in the build running on a loop

View the Description View the linked PRs

Description of problem:

Running `yarn dev` results in the build running on a loop.  This issue appears to be related to changes in https://github.com/openshift/console/pull/12821.

How reproducible:

Always

Steps to Reproduce:

1. Run `yarn dev`
2. Make changes to a file and save
3. Watch the terminal output of `yarn dev` and note the build is looping

https://github.com/openshift/console/pull/12990

Bug OCPBUGS-3356: HAproxy warning when httpCaptureCookies.maxLength exceeds 63 bytes

View the Description View the linked PRs

Description of problem:
IHAC with OCP 4.9 who has configured the IngressControllers with a long httpLogFormat, and the routers are printing every time it reloads

I0927 13:29:45.495077 1 router.go:612] template "msg"="router reloaded" "output"="[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'public'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_sni'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_no_sni'.\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

This is the Ingress Contoller configuration:

  logging:
    access:
      destination:
        syslog:
          address: 10.X.X.X
          port: 10514
        type: Syslog
      httpCaptureCookies:
      - matchType: Exact
        maxLength: 128
        name: ITXSESSIONID
      httpCaptureHeaders:
        request:
        - maxLength: 128
          name: Host
        - maxLength: 128
          name: itxrequestid
      httpLogFormat: actconn="%ac",backend_name="%b",backend_queue="%bq",backend_source_ip="%bi",backend_source_port="%bp",beconn="%bc",bytes_read="%B",bytes_uploaded="%U",captrd_req_cookie="%CC",captrd_req_headers="%hr",captrd_res_cookie="%CS",captrd_res_headers="%hs",client_ip="%ci",client_port="%cp",cluster="ieec1ocp1",datacenter="ieec1",environment="pro",fe_name_transport="%ft",feconn="%fc",frontend_name="%f",hostname="%H",http_version="%HV",log_type="http",method="%HM",query_string="%HQ",req_date="%tr",request="%HP",res_time="%TR",retries="%rc",server_ip="%si",server_name="%s",server_port="%sp",srv_queue="%sq",srv_conn="%sc",srv_queue="%sq",status_code="%ST",Ta="%Ta",Tc="%Tc",tenant="bk",term_state="%tsc",tot_wait_q="%Tw",Tr="%Tr"
      logEmptyRequests: Ignore

Any way to avoid this truncate warning?

How reproducible:
For every reload of haproxy config

Steps to Reproduce:
You can reproduce easily with the following configuration in the default ingress controller:

logging:
access:
destination:
type: Container
httpCaptureCookies:

matchType: Exact
maxLength: 128
name: _abck
And accessing from out console, you will get a log like:

2022-10-18T14:13:53.068164+00:00 xxxx xxxxxx haproxy[38]: 10.39.192.203:40698 [18/Oct/2022:14:13:52.488] fe_sni~ be_secure:openshift-console:console/pod:console-5976495467-zxgxr:console:https:10.128.1.116:8443 0/0/0/10/580 200 1130598 _abck=B7EA642C9E828FA8210F329F80B7B2D80YAAQnVozuFVfkOaDAQAADk - --VN 78/37/33/33/0 0/0 "GET /api/kubernetes/openapi/v2 HTTP/1.1"

https://github.com/openshift/router/pull/436

Bug OCPBUGS-14163: HostedCluster's ETCD pod cannot run on IPv6 as a primary network

View the Description View the linked PRs

Description of problem:

Trying to deploy a HostedCluster using an IPv6 network, the control plane fails to start. These are the networking parameters for the HostedCluster:

  networking:
    clusterNetwork:
    - cidr: fd01::/48
    networkType: OVNKubernetes
    serviceNetwork:
    - cidr: fd02::/112


When the control plane pods are created, the etcd pod will remain in crashloopbackoff. The error in the logs:

invalid value "https://fd01:0:0:3::4c:2380" for flag -listen-peer-urls: URL address does not have the form "host:port": https://fd01:0:0:3::4c:2380

Version-Release number of selected component (if applicable):

Any

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with the networking parameters set to IPv6 networks.
2. The etcd pod will be created and will fail to start.

Actual results:

etcd crashses at start

Expected results:

etcd starts properly and the other control plane pods follow

Additional info:

N/A

https://github.com/openshift/hypershift/pull/2846

Bug OCPBUGS-15572: The install operator Update Approval radio button to switch to Manual approval does not work

View the Description View the linked PRs

Description of problem:

Selecting "Manual" for Update approval does not take effect.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12959

Bug OCPBUGS-12499: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/197

Bug OCPBUGS-17367: ose-gcp-pd-csi-driver build failure

View the Description View the linked PRs

Description of problem:

ose-gcp-pd-csi-driverfails to build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=54433295

Error:
/usr/lib/golang/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
gcc: error: static: No such file or directory

make: *** [Makefile:40: gce-pd-driver] Error 1

Version-Release number of selected component (if applicable):

4.14 / master

How reproducible:

run osbs build

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/gcp-pd-csi-driver/pull/41

Bug OCPBUGS-5469: Risk cache warming takes too long on channel changes

View the Description View the linked PRs

Description of problem:

When changing channels it's possible that multiple new conditional update risks will need to be evaluated. For instance, a cluster running 4.10.34 in a 4.10 channel today only has to evaluate `OpenStackNodeCreationFails` but when the channel is changed to a 4.11 channel multiple new risks require evaluation and the evaluation of new risks is throttled at one every 10 minutes. This means if there are three new risks it may take up to 30 minutes after the channel has changed for the full set of conditional updates to be computed. This leads to a perception that no update paths are recommended because most will not wait 30 minutes, they expect immediate feedback.

Version-Release number of selected component (if applicable):

4.10.z, 4.11.z, 4.12, 4.13

How reproducible:

100%

Steps to Reproduce:

1. Install 4.10.34
2. Switch from stable-4.10 to stable-4.11
3.

Actual results:

Observe no recommended updates for 10-20 minutes because all available paths to 4.11 have a risk associated with them

Expected results:

Risks are computed in a timely manner for an interactive UX, lets say < 10s

Additional info:

This was intentional in the design, we didn't want risks to continuously re-evaluate or overwhelm the monitoring stack, however we didn't anticipate that we'd have long standing pile of risks and realize how confusing the user experience would be.

We intend to work around this in the deployed fleet by converting older risks from `type: promql` to `type: Always` avoiding the evaluation period but preserving the notification. While this may lead customers to believe they're exposed to a risk they may not be, as long as the set of outstanding risks to the latest version is limited to no more than one it's likely no one will notice. All 4.10 and 4.11 clusters currently have a clear path toward relatively recent 4.10.z or 4.11.z with no more than one risk to be evaluated.

https://github.com/openshift/cluster-version-operator/pull/909

Bug OCPBUGS-10899: Bootstrap etcd pod should use node name in bootstrap-in-place mode

View the Description View the linked PRs

Description of problem:

Usually etcd pod is named "etcd-bootstrap" for multinode install. In bootstrap-in-place mode the only master is not started during bootstrap, so its useful to use the expected pod name during bootstrap. This would allow us to re-use the bootstrap-generated certificates on "real" master startup

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1035

Bug OCPBUGS-12640: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-admission-controller/pull/67

Bug OCPBUGS-13348: Hypershift Audit configuration not working for Hypershift HostedCluster

View the Description View the linked PRs

Description of problem:

Add Audit configuration for hypershift Hosted Cluster not working as expected.

Version-Release number of selected component (if applicable):

# oc get clusterversions.config.openshift.io
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-05-04-090524   True        False         15m     Cluster version is 4.13.0-0.nightly-2023-05-04-090524

How reproducible:

Always

Steps to Reproduce:

1. Get hypershift hosted cluster detail from management cluster. 

# hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r .items[].metadata.name)  

2. Apply audit profile for hypershift hosted cluster. 
# oc patch HostedCluster $hostedcluster -n clusters -p '{"spec": {"configuration": {"apiServer": {"audit": {"profile": "WriteRequestBodies"}}}}}' --type merge     
hostedcluster.hypershift.openshift.io/85ea85757a5a14355124 patched 

# oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.apiServer.audit        
{
  "profile": "WriteRequestBodies"
}

3. Check Pod or operator restart to apply configuration changes. 

# oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS    RESTARTS   AGE
kube-apiserver-7c98b66949-9z6rw   5/5     Running   0          36m
kube-apiserver-7c98b66949-gp5rx   5/5     Running   0          36m
kube-apiserver-7c98b66949-wmk8x   5/5     Running   0          36m

# oc get pods -l app=openshift-apiserver   -n clusters-${hostedcluster}
NAME                                  READY   STATUS    RESTARTS   AGE
openshift-apiserver-dc4c84ff4-566z9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-99zq9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-9xdrz   3/3     Running   0          30m

4. Check generated audit log.
# NOW=$(date -u "+%s"); echo "$NOW"; echo "$NOW" > now
1683711189

# kaspod=$(oc get pods -l app=kube-apiserver -n clusters-${hostedcluster} --no-headers -o=jsonpath={.items[0].metadata.name})                                     

# oc logs $kaspod -c audit-logs -n clusters-${hostedcluster} > kas-audit.log                                                                                      
# cat kas-audit.log | grep -iE '"verb":"(get|list|watch)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0

# cat kas-audit.log | grep -iE '"verb":"(create|delete|patch|update)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0  

All results should not be zero
In backend it should apply the configuration or pod/operator restart after configuration changes.

Actual results:

Config changes not applied in backend.Not operator & pod restart

Expected results:

Configuration should applied and pod & operator should restart after config changes.

Additional info:

https://github.com/openshift/hypershift/pull/2945

Bug OCPBUGS-13764: IPI baremetal install root device hints should accept by-path device alias

View the Description View the linked PRs

In many cases, the /dev/disk/by-path symlink is the only way to stably identify a disk without having prior knowledge of the hardware from some external source (e.g. a spreadsheet of disk serial numbers). It should be possible to specify this path in the root device hints.
Metal³ now allows these paths in the `name` hint (see ~~OCPBUGS-13080~~), so the IPI installer's implementation using terraform must be changed to match.

https://github.com/openshift/installer/pull/7192

Bug OCPBUGS-14674: MCCPoolAlert is not removed when the problem that caused the alert is fixed

View the Description View the linked PRs

Description of problem:

When a MCCPoolAlert is fired and we fix the problem that caused this alert, the alert is not removed.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-06-06-212044   True        False         114m    Cluster version is 4.14.0-0.nightly-2023-06-06-212044

How reproducible:

Always

Steps to Reproduce:

1. Create a custom MCP

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [master,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""


2. Label a master node so that it is included in the new custom MCP

$ oc label node $(oc get nodes -l node-role.kubernetes.io/master -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/infra=""

3. Verify that the alert is fired

alias thanosalerts='curl -s -k -H "Authorization: Bearer $(oc -n openshift-monitoring create token prometheus-k8s)" https://$(oc get route -n openshift-monitoring thanos-querier -o jsonpath={.spec.host})/api/v1/alerts | jq '

$ thanosalerts |grep alertname
  ....
          "alertname": "MCCPoolAlert",


4. Remove the label from the node to fix the problem

$ oc label node $(oc get nodes -l node-role.kubernetes.io/master -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/infra-

Actual results:

The alert is not removed.

When we have a look at the mcc_pool_alert  metric we find 2 values with 2 different "alert" fields.

alias thanosquery='function __lgb() { unset -f __lgb; oc rsh -n openshift-monitoring prometheus-k8s-0 curl -s -k  -H "Authorization: Bearer $(oc -n openshift-monitoring create token prometheus-k8s)" --data-urlencode "query=$1" https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query | jq -c | jq; }; __lgb'

$ thanosquery mcc_pool_alert
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "mcc_pool_alert",
          "alert": "Applying custom label for pool",
          "container": "oauth-proxy",
          "endpoint": "metrics",
          "instance": "10.130.0.86:9001",
          "job": "machine-config-controller",
          "namespace": "openshift-machine-config-operator",
          "node": "ip-10-0-129-20.us-east-2.compute.internal",
          "pod": "machine-config-controller-76dbddff49-75ggr",
          "pool": "infra",
          "prometheus": "openshift-monitoring/k8s",
          "service": "machine-config-controller"
        },
        "value": [
          1686137977.158,
          "0"
        ]
      },
      {
        "metric": {
          "__name__": "mcc_pool_alert",
          "alert": "Given both master and custom pools. Defaulting to master: custom infra",
          "container": "oauth-proxy",
          "endpoint": "metrics",
          "instance": "10.130.0.86:9001",
          "job": "machine-config-controller",
          "namespace": "openshift-machine-config-operator",
          "node": "ip-10-0-129-20.us-east-2.compute.internal",
          "pod": "machine-config-controller-76dbddff49-75ggr",
          "pool": "infra",
          "prometheus": "openshift-monitoring/k8s",
          "service": "machine-config-controller"
        },
        "value": [
          1686137977.158,
          "1"
        ]
      }
    ]
  }
}

Expected results:

The alert should be removed.

Additional info:

If we remove the MCO controller pod, a new mcc_pool_alert data is generated with the right value and the other values are removed. If we execute this workaround the alert is removed.

https://github.com/openshift/machine-config-operator/pull/3733

Bug OCPBUGS-18828: tuned pod in the guest cluster uses control plane release image after controlplane release upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18754~~. The following is the description of the original issue:
—
Description of problem:

After control plane release upgrade, in the guest cluster pod 'tuned' uses control plane release image

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. create a cluster in 4.14.0-0.ci-2023-09-06-180503
2. control plane release upgrade to 4.14-2023-09-07-180503
3. in the guest cluster check container image in pod tuned

Actual results:

pod tuned uses control plane release image 4.14-2023-09-07-180503

Expected results:

pod tuned uses release image 4.14.0-0.ci-2023-09-06-180503

Additional info:

After controlplane release upgrade, in control plane namespace, cluster-node-tuning-operator uses control plane release image:

jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].name}{"\n"}'
cluster-node-tuning-operator
jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].image}{"\n"}'
registry.ci.openshift.org/ocp/4.14-2023-09-07-180503@sha256:60bd6e2e8db761fb4b3b9d68c1da16bf0371343e3df8e72e12a2502640173990

https://github.com/openshift/hypershift/pull/3005

Bug OCPBUGS-11620: [Openshift Pipelines] Stop option for pipelinerun is not working

View the Description View the linked PRs

Description of problem:

Stop option for pipelinerun is not working

Version-Release number of selected component (if applicable):

Openshift Pipelines 1.9.x

How reproducible:

Always

Steps to Reproduce:

1. Create a pipeline and start it
2. From Actions dropdown select  stop option

Actual results:

Pipelinerun is not getting cancelled

Expected results:

Pipelinerun should get cancelled

Additional info:

https://github.com/openshift/console/pull/13020

Bug OCPBUGS-12554: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/69

Bug OCPBUGS-12951: 4.13.0-RC.6 Enter to Cluster status: error while trying to install cluster with agent base installer

View the Description View the linked PRs

Description of problem:

4.13.0-RC.6 Enter to Cluster status: error while trying to install cluster with agent base installer
After the read disk stage the cluster status turn to "error"

Version-Release number of selected component (if applicable):

How reproducible:

Create image with the attached install config and agent config file and boot node with this images

Steps to Reproduce:

1. Create image with the attached install config and agent config file and boot node with this images

Actual results:

Cluster status: error

Expected results:

Should continue with cluster status: installing

Additional info:

https://github.com/openshift/machine-config-operator/pull/3691

Bug OCPBUGS-8692: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

View the Description View the linked PRs

Description of problem:

In hypershift context:
Operands managed by Operators running in the hosted control plane namespace in the management cluster do not honour affinity opinions https://hypershift-docs.netlify.app/how-to/distribute-hosted-cluster-workloads/
https://github.com/openshift/hypershift/blob/main/support/config/deployment.go#L263-L265

These operands running management side should honour the same affinity, tolerations, node selector and priority rules than the operator.
This could be done by looking at the operator deployment itself or at the HCP resource.

multus-admission-controller
cloud-network-config-controller
ovnkube-master

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift cluster.
2. Check affinity rules and node selector of the operands above.
3.

Actual results:

Operands missing affinity rules and node selecto

Expected results:

Operands have same affinity rules and node selector than the operator

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1728

Bug OCPBUGS-11869: Pod Status Overlapping in Sidebar

View the Description View the linked PRs

Description of problem:

Pod Status Overlapping in Sidebar
Status that is breaking the UI: CreateContainerConfigError

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always when the status is CreateContainerConfigError

Steps to Reproduce:

1. Create a Pod that gives CreateContainerConfigError

Sample YAML:

apiVersion: v1
kind: Pod
metadata:
  name: example
  labels:
    app: httpd
  namespace: avik
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: httpd
      image: docker.io/httpd:latest
      ports:
        - containerPort: 80
      securityContext:
        allowPrivilegeEscalation: true
        capabilities:
          drop:
            - ALL

Actual results:

The Pod Status should not overlapping when the status is long.

Expected results:

The Pod Status should not overlap. Also, this error status should look like the other error statuses.

Additional info:

https://github.com/openshift/console/pull/12732

Bug OCPBUGS-13387: Not able to import the repository with .tekton directory and func.yaml file present

View the Description View the linked PRs

Description of problem:

Not able to import the repository with .tekton directory and func.yaml file present. As getting this error `Cannot read properties of undefined (reading 'filter')`

Version-Release number of selected component (if applicable):

4.13, Pipeline and Serverless is installed

How reproducible:

Steps to Reproduce:

1. In import from git form enter the git URL: https://github.com/Lucifergene/oc-pipe-func
2. Pipeline is checked and PAC option is selected by default even if user uncheck the Pipeline option user get the same error
3. click Create button

Actual results:

Not able to import and getting this error `Cannot read properties of undefined (reading 'filter')`

Expected results:

should able to import without any error

Additional info:

https://github.com/openshift/console/pull/13046

Bug OCPBUGS-14637: Leftover IngressController Preventing Clean Uninstall

View the Description View the linked PRs

Description of problem:

An uninstall was started, however it failed due to the hosted-cluster-config-operator being unable to clean up the default ingresscontroller

Version-Release number of selected component (if applicable):

4.12.18

How reproducible:

Unsure - though definitely not 100%

Steps to Reproduce:

1. Uninstall a HyperShift cluster

Actual results:

❯ k logs -n ocm-staging-2439occi66vhbj0pee3s4d5jpi4vpm54-mshen-dr2 hosted-cluster-config-operator-5ccdbfcc4c-9mxfk --tail 10 -f

{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Image registry is removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Ensuring ingress controllers are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Ensuring load balancers are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Load balancers are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Ensuring persistent volumes are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"There are no more persistent volumes. Nothing to cleanup.","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Persistent volumes are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}

After manually connecting to the hostedcluster and deleting the ingresscontroller, the uninstall progressed and succeded

Expected results:

The hosted cluster can cleanup the ingresscontrollers successfully and progress the uninstall

Additional info:

HyperShift dump: https://drive.google.com/file/d/1qqjkG4F_mSUCVMz3GbN-lEoqbshPvQcU/view?usp=sharing

https://github.com/openshift/hypershift/pull/2706

Bug OCPBUGS-15238: OpenShift Installer gets stuck while listing GCP projects

View the Description View the linked PRs

Description of problem:

While trying to deploy OCP on GCP the Installer get stuck on the very first step trying to list all the projects the GCP service account used to deploy OCP can list

Version-Release number of selected component (if applicable):

4.13.3 but also happening on 4.12.5 and I presume other releases as well

How reproducible:

Every time

Steps to Reproduce:

1. Use openshift-install to create a cluster in GCP

Actual results:

$ ./openshift-install-4.13.3 create cluster --dir gcp-doha/ --log-level debug
DEBUG OpenShift Installer 4.13.3                   
DEBUG Built from commit 90bb61f38881d07ce94368f0b34089d152ffa4ef 
DEBUG Fetching Metadata...                         
DEBUG Loading Metadata...                          
DEBUG   Loading Cluster ID...                      
DEBUG     Loading Install Config...                
DEBUG       Loading SSH Key...                     
DEBUG       Loading Base Domain...                 
DEBUG         Loading Platform...                  
DEBUG       Loading Cluster Name...                
DEBUG         Loading Base Domain...               
DEBUG         Loading Platform...                  
DEBUG       Loading Networking...                  
DEBUG         Loading Platform...                  
DEBUG       Loading Pull Secret...                 
DEBUG       Loading Platform...                    
INFO Credentials loaded from environment variable "GOOGLE_CREDENTIALS", file "/home/mak/.gcp/aos-serviceaccount.json"
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: platform.gcp.project: Internal error: context deadline exceeded

Expected results:

The cluster should be deployed with no issues

Additional info:

The GCP user used to deploy OCP has visibility of thousands of projects:

> gcloud projects list | wc -l
  152793

Bug OCPBUGS-7015: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6967

Bug OCPBUGS-10009: CNO doesn't handle nodeSelector in HyperShift

View the Description View the linked PRs

CNO should respect the `nodeSelector` setting in hostecontrolplane:

https://github.com/openshift/hypershift/blob/5f903f2a48ef2abc3045584f646e92ac0f735fad/docs/content/how-to/distribute-hosted-cluster-workloads.md#topology

Affinity and tolerations support is handled here: https://issues.redhat.com/browse/OCPBUGS-8692

https://github.com/openshift/cluster-network-operator/pull/1736

Bug OCPBUGS-11788: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-10738~~.

https://github.com/openshift/installer/pull/7092

Bug OCPBUGS-5940: [CI Watcher]: logs in as 'test' user via htpasswd identity provider: Auth test logs in as 'test' user via htpasswd identity provider

View the Description View the linked PRs

Description of problem:

Tests Failed.expand_lesslogs in as 'test' user via htpasswd identity provider: Auth test logs in as 'test' user via htpasswd identity provider

CI-search
Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12697

Bug OCPBUGS-8421: Hypershift API: documentation is wrong about audit webhook field

View the Description View the linked PRs

Description of problem:

API documentation for HostedCluster states that the webhook kubeconfig field is only supported for IBM Cloud. It should be supported for all platforms.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Review API documentation at https://hypershift-docs.netlify.app/reference/api/

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/2258

Bug MON-3093: End-to-end tests don’t run with HyperShift clusters because they are single node.

View the Description View the linked PRs

While running the e2e test locally with Hypershift cluster from cluster-bot I noticed that it fails on step waiting for 2 prometheus instances.

“wait for prometheus-k8s: expected 2 Prometheus instances but got: 1: timed out waiting for the condition”

Since Hypershift clusters from cluster-bot are single worker node, it will always fail since we are checking it should be always 2 instances in main_test.go.

Ideally we need to check the infrastructureTopology field and adjust the test if the infrastructure is “SingleReplica”

https://github.com/openshift/cluster-monitoring-operator/pull/2060

Bug OCPBUGS-7989: ControlPlaneMachineSet: Machine's Node should be Ready to consider the Machine Ready

View the Description View the linked PRs

Description of problem:

ControlPlaneMachineSet Machines are considered Ready once the underlying MAPI machine is Running.
This should not be a sufficient condition, as the Node linked to that Machine should also be Ready for the overall CPMS Machine to be considered Ready.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/171

Bug OCPBUGS-10161: Update 4.14 cluster-monitoring-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-monitoring-operator/pull/1914

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-monitoring-operator/pull/1914

Bug OCPBUGS-10843: oc debug fails with error "container "container-00" in pod "xiyuan24-f3-h4264-master-0-debug" is waiting to start: ContainerCreating"

View the Description View the linked PRs

Description of problem:

Oc debug fails with error "container "container-00" in pod "xiyuan24-f3-h4264-master-0-debug" is waiting to start: ContainerCreating"I see that above error happens when run via automation and running it locally does not have this issue, also when increased time around the command in the automation script it works fine with out any issues.

Version-Release number of selected component (if applicable):

03-24 17:57:54.649        [12:27:48] INFO> Shell Commands: oc version -o yaml --client --kubeconfig=/tmp/kubeconfig20230324-374-gt1vvm
03-24 17:57:54.649        clientVersion:
03-24 17:57:54.649          buildDate: "2023-03-17T23:32:35Z"
03-24 17:57:54.649          compiler: gc
03-24 17:57:54.649          gitCommit: eed143055ede731029931ad204b19cd2f565ef1a
03-24 17:57:54.649          gitTreeState: clean
03-24 17:57:54.649          gitVersion: 4.13.0-202303172327.p0.geed1430.assembly.stream-eed1430
03-24 17:57:54.649          goVersion: go1.19.4
03-24 17:57:54.649          major: ""
03-24 17:57:54.649          minor: ""
03-24 17:57:54.649          platform: linux/amd64
03-24 17:57:54.649        kustomizeVersion: v4.5.7
03-24 17:57:54.649        [12:27:49] INFO> Exit Status: 0

How reproducible:

Always

Steps to Reproduce:

1.Install latest 4.13 cluster
2. Run script https://github.com/openshift/verification-tests/blob/master/features/upgrade/security_compliance/fips.feature#L66

Actual results:

Test fails with error mentioned in the description

Expected results:

Test should not fail

Additional info:

Adding a link to the conversation which i had with maciej about this issue https://redhat-internal.slack.com/archives/GK58XC2G2/p1679655589922729

Run log with --loglevel=9 -> https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Runner/770180/console

https://github.com/openshift/oc/pull/1393

Bug OCPBUGS-11670: mcc_drain_err metric should not be served for removed nodes

View the Description View the linked PRs

Description of problem:

Seen in 4.13.0-rc.2, mcc_drain_err is being served for nodes that have been deleted, causing un-actionable MCDDrainError.

Version-Release number of selected component (if applicable):

At least 4.13.0-rc.2. Further exposure unclear.

How reproducible:

At least four nodes on build01. Possibly all nodes that are removed while suffering from drain issues on 4.13.0-rc.2.

Steps to Reproduce:

Unclear.

Actual results:

The machine-config controller continues to serve mcc_drain_err for the removed nodes.

Expected results:

The machine-config controller never serves{{mcc_drain_err}} for non-existant nodes.

https://github.com/openshift/machine-config-operator/pull/3689

Bug OCPBUGS-14488: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource-operator/pull/79

Bug OCPBUGS-11038: GCP: add europe-west12 region to the survey as supported region

View the Description View the linked PRs

Description of problem:

Backport support starting in 4.12.z to a new GCP region europe-west12

Version-Release number of selected component (if applicable):

4.12.z and 4.13.z

How reproducible:

Always

Steps to Reproduce:

1. Use openhift-install to deploy OCP in europe-west12

Actual results:

europe-west12 is not available as a supported region in the user survey

Expected results:

europe-west12 to be available as a supported region in the user survey

Additional info:

https://github.com/openshift/installer/pull/7033

Bug OCPBUGS-16688: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1076

Bug OCPBUGS-16925: only attempt to remove finalizers if staticIPFeatureGateEnabled

View the Description View the linked PRs

Description of problem:

On clusters without the TechPreview feature set enabled, machines are failing to delete due to an attempt to list an IPAM that is not installed.

Version-Release number of selected component (if applicable):

4.14 nightly

How reproducible:

consistently

Steps to Reproduce:

1. Create a platform vSphere cluster
2. Scale down a machine

Actual results:

Machine fails to delete

Expected results:

Machine should delete

Additional info:

Fails with unable to list IPAddressClaims: failed to get API group resources: unable to retrieve the complete list of server APIs: ipam.cluster.x-k8s.io/v1alpha1: the server could not find the requested resource

https://github.com/openshift/machine-api-operator/pull/1160

Bug OCPBUGS-4240: assisted-installer-controller job does not complete properly

View the Description View the linked PRs

Description of problem:

After the installation of a cluster, based on the agent installer ISO, is completed, the job assisted-installer-controller remains up

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Generate a valid ISO image using the agent installer. All kind of topologies (compact/ha/sno) and configurations are affect by this problem

Steps to Reproduce:

1.
2.
3.

Actual results:

$ oc get jobs -n assisted-installer
NAME                            COMPLETIONS   DURATION   AGE
assisted-installer-controller   0/1           102m       102m

Expected results:

oc get jobs -n assisted-installer should not return any job

Additional info:

It looks like that the assisted-installer-controller has been designed assuming that Assisted Service (AS) was always available and reachable. This is not necessarily true when using the agent installer, since the AS initially running on the rendezvous node will not be available after the node was rebooted.

The assisted-installer-controller performs a number of different tasks internally, and from the logs not all of them complete successfully (a condition to terminate the job).
It could be useful to perform a deeper troubleshooting on the ApproveCsrs one, as it one that does not terminate properly

https://github.com/openshift/assisted-installer/pull/700

Story TRT-1092: Hypershift CI Failures image-registry is not available

View the Description View the linked PRs

Observing CI Hypershift failures in 4.14.0-0.ci-2023-06-16-074926

Payload includes image-registry/pull/370 which is the current suspected source of the regression

https://github.com/openshift/image-registry/pull/371

Bug OCPBUGS-12074: Update 4.14 ose-cluster-kube-scheduler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-scheduler-operator/pull/478

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-scheduler-operator/pull/479

Bug OCPBUGS-11020: [Hypershift Guest] OperatorHub details page returns error

View the Description View the linked PRs

Description of problem:

Viewing OperatorHub details page will return error page

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-03-28-180259

How reproducible:

Always on Hypershift Guest cluster

Steps to Reproduce:

1. Visit OperatorHub details page via Administration -> Cluster Settings -> Configuration -> OperatorHub 
2.
3.

Actual results:

Cannot read properties of undefined (reading 'sources')

Expected results:

page can be loaded successfully

Additional info:

screenshot one: https://drive.google.com/file/d/12cgpChKYuen2v6DWvmMrir273wONo5oY/view?usp=share_link
screenshot two: https://drive.google.com/file/d/1vVsczu7ScIqznoKNsR8V0w4k9bF1xWhB/view?usp=share_link

https://github.com/openshift/console/pull/12702

Bug OCPBUGS-11439: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2381

Bug OCPBUGS-12901: oc does not preserve a speficic release image provided with --to-image=''

View the Description View the linked PRs

Description of problem:

When a (recommended/conditional) release image is provided with --to-image='', the specified image name is not preserved in the ClusterVersion object.

Version-Release number of selected component (if applicable):

How reproducible:

100% with oc >4.9

Steps to Reproduce:

$ oc version
Client Version: 4.12.2
Kustomize Version: v4.5.7
Server Version: 4.12.2
Kubernetes Version: v1.25.4+a34b9e9

$ oc get clusterversion/version -o jsonpath='{.status.desired}'|jq
{
  "channels": [
    "candidate-4.12",
    "candidate-4.13",
    "eus-4.12",
    "fast-4.12",
    "stable-4.12"
  ],
  "image": "quay.io/openshift-release-dev/ocp-release@sha256:31c7741fc7bb73ff752ba43f5acf014b8fadd69196fc522241302de918066cb1",
  "url": "https://access.redhat.com/errata/RHSA-2023:0569",
  "version": "4.12.2"
}
$ oc adm release info 4.12.3 -o jsonpath='{.image}'
quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36
$ skopeo copy docker://quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 docker://quay.example.com/playground/release-images
Getting image source signatures
Copying blob 64096b96a7b0 done  
Copying blob 0e0550faf8e0 done  
Copying blob 97da74cc6d8f skipped: already exists  
Copying blob d8190195889e skipped: already exists  
Copying blob 17997438bedb done  
Copying blob fdbb043b48dc done  
Copying config b49bc8b603 done  
Writing manifest to image destination
Storing signatures
$ skopeo inspect docker://quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36|jq '.Name,.Digest'
"quay.example.com/playground/release-images"
"sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36"
$ oc adm upgrade --to-image=quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 Requesting update to 4.12.3

Actual results:

$ oc get clusterversion/version -o jsonpath='{.status.desired}'|jq
{
  "channels": [
    "candidate-4.12",
    "candidate-4.13",
    "eus-4.12",
    "fast-4.12",
    "stable-4.12"
  ],
  "image": "quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36",    <--- not quay.example.com
  "url": "https://access.redhat.com/errata/RHSA-2023:0728",
  "version": "4.12.3"
}

$ oc get clusterversion/version -o jsonpath='{.status.history}'|jq
[
  {
    "completionTime": null,
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36",         <--- not quay.example.com
    "startedTime": "2023-04-28T07:39:11Z",
    "state": "Partial",
    "verified": true,
    "version": "4.12.3"
  },
  {
    "completionTime": "2023-04-27T14:48:06Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:31c7741fc7bb73ff752ba43f5acf014b8fadd69196fc522241302de918066cb1",
    "startedTime": "2023-04-27T14:24:29Z",
    "state": "Completed",
    "verified": false,
    "version": "4.12.2"
  }
]

Expected results:

$ oc get clusterversion/version -o jsonpath='{.status.desired}'|jq
{
  "channels": [
    "candidate-4.12",
    "candidate-4.13",
    "eus-4.12",
    "fast-4.12",
    "stable-4.12"
  ],
  "image": "quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 ",
  "url": "https://access.redhat.com/errata/RHSA-2023:0728",
  "version": "4.12.3"
}$ oc get clusterversion/version -o jsonpath='{.status.history}'|jq
[
  {
    "completionTime": null,
    "image": "quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 ",
    "startedTime": "2023-04-28T07:39:11Z",
    "state": "Partial",
    "verified": true,
    "version": "4.12.3"
  },
  {
    "completionTime": "2023-04-27T14:48:06Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:31c7741fc7bb73ff752ba43f5acf014b8fadd69196fc522241302de918066cb1",
    "startedTime": "2023-04-27T14:24:29Z",
    "state": "Completed",
    "verified": false,
    "version": "4.12.2"
  }
]

Additional info:

While in earlier versions (<4.10) we used to preserve the specified image [1], we now (as of 4.10) store the public image as the desired version [2].
[1] https://github.com/openshift/oc/blob/88cfeb4aa2d74ee5f5598c571661622c0034081b/pkg/cli/admin/upgrade/upgrade.go#L278
[2] https://github.com/openshift/oc/blob/5711859fac135177edf07161615bdabe3527e659/pkg/cli/admin/upgrade/upgrade.go#L278

https://github.com/openshift/oc/pull/1416

Bug OCPBUGS-17258: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/747

Bug MGMT-13946: Ignore motherboard serial for Proliant Gen 11

View the Description View the linked PRs

Description of the problem:

Proliant Gen 11 always reports the serial number "PCA_number.ACC", causing all hosts to register with the same UUID.

How reproducible:

100%

Steps to reproduce:

1. Boot two Proliant Gen 11 hosts

2. See that both hosts are updating a single host entry in the service

Actual results:

All hosts with this hardware are assigned the same UUID

Expected results:

Each host should have a unique UUID

https://github.com/openshift/assisted-installer-agent/pull/522

Bug OCPBUGS-11835: Hypershift does not use probes on openshift-route-controller-manager and openshift-controller-manager

View the Description View the linked PRs

Description of problem:

Hypershift does not utilize existing liveness and readiness probes on openshift-route-controller-manager and openshift-controller-manager.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Look at openshift-route-controller-manager and openshift-controller-manager yaml manifests

Actual results:

No probes defined for pods of those two deployments

Expected results:

Probes should be defined because the service implement them

Additional info:

This is the result of a security review for 4.12 Hypershift, original investigation can be found https://github.ibm.com/alchemy-containers/armada-update/issues/4117#issuecomment-53149378

https://github.com/openshift/hypershift/pull/2430

Bug OCPBUGS-7910: OperatorHub UI shows Operator Channels in random order for FBC Catalogs

View the Description View the linked PRs

Description of problem:

Any FBC enabled OLM Catalog displays the Channels in a random order.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a catalog source for icr.io/cpopen/ibm-operator-catalog:latest
2. Navigate to OperatorHub
3. Click on the `ibm-mq` operator
4. Click on the Install button.

Actual results:

The list of channels is in random order. The order changes with each page refresh.

Expected results:

The list of channels should be in lexicographical ascending order as it was for SQLITE based catalogs.

Additional info:

See related operator-registry upstream issue:
https://github.com/operator-framework/operator-registry/issues/1069#top

Note:  I think both `operator-registry` and the OperatorHub should provide deterministic sorting of these channels.

https://github.com/openshift/operator-framework-olm/pull/476

Task IR-389: Bump aws-sdk-go

View the Description View the linked PRs

New regions are added all the time, so it's best to keep it up-to-date.

https://github.com/openshift/cluster-image-registry-operator/pull/860

Task MON-3274: Request for sending data via telemetry (API Streaming feature)

View the Description View the linked PRs

Request for sending data via telemetry

The goal is to collect metrics about the number of LIST and WATCH requests to the apiserver because it will allow to measure the deployment progress of the API streaming feature. The new feature will replace the use of LIST requests with WATCH.

apiserver_list_watch_request_total:rate:sum

apiserver_list_watch_request_total:rate:sum represents the rate of change for the LIST and WATCH requests over a 5 minute period.

Labels

code, all possible values are: 200 (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#success-codes)

verb, possible values are: LIST, WATCH

The cardinality of the metric is at most 2.

https://github.com/openshift/cluster-monitoring-operator/pull/2044

Bug OCPBUGS-14424: OVN Kubernetes multi-homing in CNV: Flat overlay

View the Description View the linked PRs

Description of problem:

This is a clone for https://issues.redhat.com/browse/CNV-26608

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12869

Bug OCPBUGS-14491: Update Jenkins to use 4.13 images

View the Description View the linked PRs

Description of problem:

Update to use Jenkins 4.13 images to address CVEs

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/502

Bug OCPBUGS-16270: Info message below the Form Fields of the PAC section in the Import from Git page is not visible

View the Description View the linked PRs

Description of problem:

Info message below the "Git access token" Field for creating the Pipelines Repository under the Pipelines section in the Import from Git page is falling back to the default text instead of showing the curated ones of each Git provider.

The Info messages are curated for each of the Git Providers when we are creating the Repository from the Pipelines Page.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Go to the Import from Git Page
2. Add a Git URL with PAC ( https://github.com/Lucifergene/oc-pipe )
3. Check the text under the "Git access token" Field

Actual results:

Use your Git Personal token. Create a token with repo, public_repo & admin:repo_hook scopes and give your token an expiration, i.e 30d.

Expected results:

Use your GitHub Personal token. Use this link to create a token with repo, public_repo & admin:repo_hook scopes and give your token an expiration, i.e 30d.

Additional info:

https://github.com/openshift/console/pull/13021

Bug OCPBUGS-18059: After the BMO split, proxy and CA information are no longer passed to BMO

View the linked PRs

https://github.com/openshift/cluster-baremetal-operator/pull/358

Bug OCPBUGS-13552: vSphere: failed to parse ovf: XML syntax error on line 1

View the Description View the linked PRs

This issue has been reported multiple times over the years with no resolution
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-vsphere-zones/1655633815252504576

kubeconfig received!
waiting for api to be available
level=error
level=error msg=Error: failed to parse ovf: failed to parse ovf: XML syntax error on line 1: illegal character code U+0000
level=error
level=error msg= with vsphereprivate_import_ova.import[0],
level=error msg= on main.tf line 70, in resource "vsphereprivate_import_ova" "import":
level=error msg= 70: resource "vsphereprivate_import_ova" "import" {
level=error
level=error
level=error msg=Error: failed to parse ovf: failed to parse ovf: XML syntax error on line 1: illegal character code U+0000

https://issues.redhat.com/browse/OCPQE-13219
https://issues.redhat.com/browse/TRT-741

https://github.com/openshift/installer/pull/7171

Bug OCPBUGS-14010: Increase health probe for openshift apiserver

View the Description View the linked PRs

https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1684775113222139?thread_ts=1684769886.464419&cid=CB48XQ4KZ

Bug OCPBUGS-15327: silence irrelevant "failed to lock file fileutil: file already locked" warnings

View the Description View the linked PRs

Description of problem:

On OpenShift Container Platform, the etcd Pod is showing messages like the following:

2023-06-19T09:10:30.817918145Z {"level":"warn","ts":"2023-06-19T09:10:30.817Z","caller":"fileutil/purge.go:72","msg":"failed to lock file","path":"/var/lib/etcd/member/wal/000000000000bc4b-00000000183620a4.wal","error":"fileutil: file already locked"}


This is described in KCS https://access.redhat.com/solutions/7000327

Version-Release number of selected component (if applicable):

any currently supported version (> 4.10) running with 3.5.x

How reproducible:

always

Steps to Reproduce:

happens after running etcd for a while

This has been discussed in https://github.com/etcd-io/etcd/issues/15360

It's not a harmful error message, it merely indicates that some WALs have not been included in snapshots yet.

This was caused by changing default numbers: https://github.com/etcd-io/etcd/issues/13889

This was fixed in https://github.com/etcd-io/etcd/pull/15408/files but never backported to 3.5.

To mitigate that error and stop confusing people, we should also supply that argument when starting etcd in: https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/pod.yaml#L170-L187

That way we're not surprised by changes of the default values upstream.

https://github.com/openshift/cluster-etcd-operator/pull/1067

Bug OCPBUGS-10313: The agent-tui shows again during the installation

View the Description View the linked PRs

Description of problem:

Agent-tui should show before the installation, but it shows again during the installation and when it quit again, the installation fail to go on.

Version-Release number of selected component (if applicable):

4.13.0-0.ci-2023-03-14-045458

How reproducible:

always

Steps to Reproduce:

1. Make sure the primary check pass, and boot the agent.x86_64.iso file, we can see the agent-tui show before the installation

2. Tracking installation by both wait-for output and console output

3. The agent-tui show again during the installation, wait for the agent-tui quit automatically without any user interruption, the installation quit with failure, and we have the following wait-for output:

DEBUG asset directory: .                           
DEBUG Loading Agent Config...                      
...
DEBUG Agent Rest API never initialized. Bootstrap Kube API never initialized 
INFO Waiting for cluster install to initialize. Sleeping for 30 seconds 
DEBUG Agent Rest API Initialized                   
INFO Cluster is not ready for install. Check validations 
DEBUG Cluster validation: The pull secret is set.  
WARNING Cluster validation: The cluster has hosts that are not ready to install. 
DEBUG Cluster validation: The cluster has the exact amount of dedicated control plane nodes. 
DEBUG Cluster validation: API virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: API virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: The Cluster Network CIDR is defined. 
DEBUG Cluster validation: The base domain is defined. 
DEBUG Cluster validation: Ingress virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: Ingress virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: The Machine Network CIDR is defined. 
DEBUG Cluster validation: The Cluster Machine CIDR is not required: User Managed Networking 
DEBUG Cluster validation: The Cluster Network prefix is valid. 
DEBUG Cluster validation: The cluster has a valid network type 
DEBUG Cluster validation: Same address families for all networks. 
DEBUG Cluster validation: No CIDRS are overlapping. 
DEBUG Cluster validation: No ntp problems found    
DEBUG Cluster validation: The Service Network CIDR is defined. 
DEBUG Cluster validation: cnv is disabled          
DEBUG Cluster validation: lso is disabled          
DEBUG Cluster validation: lvm is disabled          
DEBUG Cluster validation: odf is disabled          
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Valid inventory exists for the host 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient CPU cores 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient minimum RAM 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient disk capacity 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient CPU cores for role master 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient RAM for role master 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Hostname openshift-qe-049.arm.eng.rdu2.redhat.com is unique in cluster 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Hostname openshift-qe-049.arm.eng.rdu2.redhat.com is allowed 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Speed of installation disk has not yet been measured 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host is compatible with cluster platform none 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: VSphere disk.EnableUUID is enabled for this virtual machine 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host agent compatibility checking is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: No request to skip formatting of the installation disk 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: All disks that have skipped formatting are present in the host inventory 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host is connected 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Media device is connected 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: No Machine Network CIDR needed: User Managed Networking 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host belongs to all machine network CIDRs 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host has connectivity to the majority of hosts in the cluster 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Platform PowerEdge R740 is allowed 
WARNING Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host couldn't synchronize with any NTP server 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host clock is synchronized with service 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: All required container images were either pulled successfully or no attempt was made to pull them 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Network latency requirement has been satisfied. 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Packet loss requirement has been satisfied. 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host has been configured with at least one default route. 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Domain name resolution for the api.zniusno.arm.eng.rdu2.redhat.com domain was successful or not required 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Domain name resolution for the api-int.zniusno.arm.eng.rdu2.redhat.com domain was successful or not required 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Domain name resolution for the *.apps.zniusno.arm.eng.rdu2.redhat.com domain was successful or not required 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host subnets are not overlapping 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: No IP collisions were detected by host 7a9649d8-4167-a1f9-ad5f-385c052e2744 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: cnv is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: lso is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: lvm is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: odf is disabled 
WARNING Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host couldn't synchronize with any NTP server) 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host NTP is synced 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from insufficient to known (Host is ready to be installed) 
INFO Cluster is ready for install                 
INFO Cluster validation: All hosts in the cluster are ready to install. 
INFO Preparing cluster for installation           
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation) 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: New image status registry.ci.openshift.org/ocp/4.13-2023-03-14-045458@sha256:b0d518907841eb35adbc05962d4b2e7d45abc90baebc5a82d0398e1113ec04d0. result: success. time: 1.35 seconds; size: 401.45 Megabytes; download rate: 312.54 MBps 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation) 
INFO Cluster installation in progress             
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from preparing-successful to installing (Installation is in progress) 
INFO Host: openshift-qe-049.arm.eng.rdu2.redhat.com, reached installation stage Starting installation: bootstrap 
INFO Host: openshift-qe-049.arm.eng.rdu2.redhat.com, reached installation stage Installing: bootstrap 
INFO Host: openshift-qe-049.arm.eng.rdu2.redhat.com, reached installation stage Failed: failed executing nsenter [--target 1 --cgroup --mount --ipc --pid -- podman run --net host --pid=host --volume /:/rootfs:rw --volume /usr/bin/rpm-ostree:/usr/bin/rpm-ostree --privileged --entrypoint /usr/bin/machine-config-daemon registry.ci.openshift.org/ocp/4.13-2023-03-14-045458@sha256:f85a278868035dc0a40a66ea7eaf0877624ef9fde9fc8df1633dc5d6d1ad4e39 start --node-name localhost --root-mount /rootfs --once-from /opt/install-dir/bootstrap.ign --skip-reboot], Error exit status 255, LastOutput "...  to initialize single run daemon: error initializing rpm-ostree: Error while ensuring access to kublet config.json pull secrets: symlink /var/lib/kubelet/config.json /run/ostree/auth.json: file exists" 
INFO Cluster has hosts in error                   
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation   

4. During the installation, we had NetworkManager-wait-online.service for a while:
-- Logs begin at Wed 2023-03-15 03:06:29 UTC, end at Wed 2023-03-15 03:27:30 UTC. --
Mar 15 03:18:52 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: Starting Network Manager Wait Online...
Mar 15 03:19:55 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Mar 15 03:19:55 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Mar 15 03:19:55 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: Failed to start Network Manager Wait Online.

Expected results:

The TUI should only show once before the installation.

https://github.com/openshift/installer/pull/6977

Bug OCPBUGS-13788: MultiNetworkPolicy IPv4/IPv6 test broke the payload

View the Description View the linked PRs

Description of problem:

The following tests broke the payload for CI and nightly

[sig-network][Feature:MultiNetworkPolicy][Serial] should enforce a network policies on secondary network IPv6 [Suite:openshift/conformance/serial]

[sig-network][Feature:MultiNetworkPolicy][Serial] should enforce a network policies on secondary network IPv4 [Suite:openshift/conformance/serial]

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Test Panicked: runtime error: invalid memory address or nil pointer dereference

Expected results:

Additional info:

Original PR that broke the payload https://github.com/openshift/origin/pull/27795 

Revert to get payloads back to normal https://github.com/openshift/origin/pull/27926

Broken payloads and related jobs and sippy link for additional info

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.ci/release/4.14.0-0.ci-2023-05-17-212447

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1659065324743430144

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-18-040905

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-serial/1659088328617627648
https://sippy.dptools.openshift.org/sippy-ng/tests/4.14?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522current_runs%2522%252C%2522operatorValue%2522%253A%2522%253E%253D%2522%252C%2522value%2522%253A%25227%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522never-stable%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522aggregated%2522%257D%252C%257B%2522id%2522%253A99%252C%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522%255Bsig-network%255D%255BFeature%253AMultiNetworkPolicy%255D%255BSerial%255D%2520should%2520enforce%2520a%2520network%2520policies%2520on%2520secondary%2520network%2520IPv6%2520%255BSuite%253Aopenshift%252Fconformance%252Fserial%255D%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&sort=asc&sortField=current_working_percentage

https://github.com/openshift/origin/pull/27927

Bug OCPBUGS-15349: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/91

Bug OCPBUGS-7777: Azure OpenShiftSDN drop-icmp container uses deprecated oc observe cli arg

View the Description View the linked PRs

Description of problem:

2023-02-20T16:27:58.107800612Z + oc observe pods -n openshift-sdn --listen-addr= -l app=sdn -a '{ .status.hostIP }' -- /var/run/add_iptables.sh
2023-02-20T16:27:58.181727766Z Flag --argument has been deprecated, and will be removed in a future release. Use --template instead.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-17-090603

How reproducible:

Always

Steps to Reproduce:

1. Deploy Azure OpenShiftSDN cluster
2. Check drop-icmp container logs
oc logs -n openshift-sdn -c drop-icmp -l app=sdn --previous
3.

Actual results:

+ true
+ iptables -F AZURE_ICMP_ACTION
+ iptables -A AZURE_ICMP_ACTION -j LOG
+ iptables -A AZURE_ICMP_ACTION -j DROP
+ oc observe pods -n openshift-sdn --listen-addr= -l app=sdn -a '{ .status.hostIP }' -- /var/run/add_iptables.sh
Flag --argument has been deprecated, and will be removed in a future release. Use --template instead.
E0220 16:27:07.553592   27842 memcache.go:238] couldn't get current server API group list: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: connect: connection refused
E0220 16:27:07.553913   27842 memcache.go:238] couldn't get current server API group list: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: connect: connection refused
The connection to the server 172.30.0.1:443 was refused - did you specify the right host or port?
Error from server (BadRequest): previous terminated container "drop-icmp" in pod "sdn-v7gqq" not found

Expected results:

No deprecation warning

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1760

Bug OCPBUGS-14915: Admin Web Console has duplicate nav menu entries under "Observe"

View the Description View the linked PRs

Description of problem:

In the web console Administrator view, the items under "Observe" in the side navigation menu are duplicated.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

This is happening because those menu items are now provided by the `monitoring-plugin` dynamic plugin, so we need to remove them from the web console codebase.

https://github.com/openshift/console/pull/12893

Bug OCPBUGS-16334: CR.status.conditions and CR.status.LastSyncTimestamp are not updated in some code branches

View the Description View the linked PRs

Description of problem:

1. CR.status.LastSyncTimestamp should also be updated in the "else" code branch: 
https://github.com/openshift/cloud-credential-operator/blob/4cb9faca62c31ebea9a11b55f7af764be4ee2cd8/pkg/operator/credentialsrequest/credentialsrequest_controller.go#L1054

2. r.Client.Status().Update is not called on the CR object in memory after this line:
https://github.com/openshift/cloud-credential-operator/blob/4cb9faca62c31ebea9a11b55f7af764be4ee2cd8/pkg/operator/credentialsrequest/credentialsrequest_controller.go#L713
So CR.status.conditions are not updated.

Steps to Reproduce:

This results from a static code check.

https://github.com/openshift/cloud-credential-operator/pull/568

Bug OCPBUGS-10134: Update 4.14 ose-alibaba-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/41

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/41

Bug OCPBUGS-5529: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/564

Bug OCPBUGS-16453: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/190

Bug OCPBUGS-18781: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1106

Bug OCPBUGS-8070: Egress router pods in pending state post upgrading cluster to 4.11

View the Description View the linked PRs

Description of problem:

After upgrading cluster from 4.10.47 to 4.11.25 issue is observed with Egress router pod, pods are in pending state.

Version-Release number of selected component (if applicable):

4.11.25

How reproducible:

Steps to Reproduce:

1. Upgrade from 4.10.47 to 4.11.25
2. Check if co network is in Managed state
3. Verify that egress pods are not created with errors like :
55s         Warning   FailedCreatePodSandBox   pod/******     (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox *******_d6918859-a4e9-4e5b-ba44-acc70499fa7c_0(9c464935ebaeeeab7be0b056c3f7ed1b7279e21445b9febea29eb280f7ee7429): error adding pod ****** to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [ns/pod/d6918859-a4e9-4e5b-ba44-acc70499fa7c:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'could not open netns "/var/run/netns/503fb77f-3b96-4f23-8356-43e7ae1e1b49": unknown FS magic on "/var/run/netns/503fb77f-3b96-4f23-8356-43e7ae1e1b49": 1021994

Actual results:

Egress router pods in pending state with error message as below:
$ omg get events 
...
49s        Warning  FailedCreatePodSandBox  pod/xxxx  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_xxxx_379fa7ec-4702-446c-9162-55c2f76989f6_0(86f8c76e9724216143bef024996cb14a7614d3902dcf0d3b7ea858298766630c): error adding pod xxx to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [xxxx/xxxx/379fa7ec-4702-446c-9162-55c2f76989f6:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'could not open netns "/var/run/netns/0d39f378-29fd-4858-a947-51c5c06f1598": unknown FS magic on "/var/run/netns/0d39f378-29fd-4858-a947-51c5c06f1598": 1021994

Expected results:

Egress router pods in running state

Additional info:

Workaround from https://access.redhat.com/solutions/6986283 works :
Edit sdn DS in openshift-sdn namespace : 
- mountPath: /host/var/run/netns <<<<< /var/run/netns
  mountPropagation: HostToContainer
  name: host-run-netns   
  readOnly: true

https://github.com/openshift/cluster-network-operator/pull/1763

Bug OCPBUGS-8119: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6949

Bug OCPBUGS-1370: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/525

Bug OCPBUGS-14785: update dependencies for ironic-image for OCP 4.14

View the Description View the linked PRs

dependencies for the ironic containers are quite old, we need to upgrade them to the latest available to keep up with upstream requirements

https://github.com/openshift/ironic-image/pull/382

Bug OCPBUGS-18868: 4.14 Latency: fix common failures

View the Description View the linked PRs

Description of problem:

place holder bug to backport common latency failures

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/788

Bug OCPBUGS-10875: [GWAPI] dns_controller error "failed to publish DNS record to zone" (gcp)

View the Description View the linked PRs

Description of problem:

Error message seen during testing:
2023-03-23T22:33:02.507Z	ERROR	operator.dns_controller	dns/controller.go:348	failed to publish DNS record to zone	{"record": {"dnsName":"*.example.com","targets":["34.67.189.132"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"ci-ln-95xvtb2-72292-9jj4w-private-zone"}, "error": "googleapi: Error 400: Invalid value for 'entity.change.additions[*.example.com][A].name': '*.example.com', invalid"}

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. Setup 4.13 gcp cluster, install OSSM using http://pastebin.test.redhat.com/1092754
2. Run gateway api e2e against cluster (or create gateway with listener hostname *.example.com)
3. Check ingress operator logs

Actual results:

DNS record not published, and continous error in log

Expected results:

Should publish DNS record to zone without errors

Additional info:

Miciah: The controller should check ManageDNSForDomain when calling EnsureDNSRecord.

https://github.com/openshift/cluster-ingress-operator/pull/934

Bug MGMT-15023: Missing help text for vCenter cluster field

View the Description View the linked PRs

Description of the problem:

vSphere vCenter cluster field is missing description

How reproducible:

always

Steps to reproduce:

1. install OCP on vSphere platform

2. Go to Overview -> vSphere, configure

Actual results:

vCenter cluster field is missing description

Expected results:

Description is present

https://github.com/openshift/console/pull/12912

Bug OCPBUGS-12341: Update 4.14 marketplace-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/515

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/515

Bug OCPBUGS-15158: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/528

Bug OCPBUGS-16089: Using appsDomain recreating canary route can lead to an degraded ingress operator

View the Description View the linked PRs

Description of problem:

In case the [appsDomain|https://docs.openshift.com/container-platform/4.13/networking/ingress-operator.html#nw-ingress-configuring-application-domain_configuring-ingress] is specified and a cluster-admin is deleting accidentally all routes on a cluster, the route canary in the namespace openshift-ingress-canary is created with the domain specified in the .spec.appsDomain instead of .spec.domain of the definition in Ingress.config.openshift.io.

Additionally the docs are a bit confusing. On one page (https://docs.openshift.com/container-platform/4.13/networking/ingress-operator.html#nw-ingress-configuring-application-domain_configuring-ingress) it's defined as 

{code:none}
As a cluster administrator, you can specify an alternative to the default cluster domain for user-created routes by configuring the appsDomain field. The appsDomain field is an optional domain for OpenShift Container Platform to use instead of the default, which is specified in the domain field. If you specify an alternative domain, it overrides the default cluster domain for the purpose of determining the default host for a new route.

For example, you can use the DNS domain for your company as the default domain for routes and ingresses for applications running on your cluster.

In the API spec (https://docs.openshift.com/container-platform/4.11/rest_api/config_apis/ingress-config-openshift-io-v1.html#spec) the correct behaviour is explained

appsDomain is an optional domain to use instead of the one specified in the domain field when a Route is created without specifying an explicit host. If appsDomain is nonempty, this value is used to generate default host values for Route. Unlike domain, appsDomain may be modified after installation. This assumes a new ingresscontroller has been setup with a wildcard certificate.

It would be nice if the wording could be adjusted as `you can specify an alternative to the default cluster domain for user-created routes by configuring` does not fits good as more or less all new created routes (operator created and so on) getting created with the appsDomain.

Version-Release number of selected component (if applicable):{code:none}
OpenShift 4.12.22

How reproducible:

see steps below

Steps to Reproduce:

1. Install OpenShift
2. define .spec.appsDomain in Ingress.config.openshift.io
3. oc delete route canary -n openshift-ingress-canary
4. wait some seconds to get the route recreated and check cluster-operator

Actual results:

Ingress Operator degraded and route recreated with wrong domain (.spec.appsDomain)

Expected results:

Ingress Operator not degraded and route recreated with the correct domain (.spec.domain)

Additional info:

Please see screenshot

https://github.com/openshift/cluster-ingress-operator/pull/965

Bug OCPBUGS-14917: PowerVS: Cleanup service instances for destroy cluster

View the Description View the linked PRs

Description of problem:

The PowerVS installer will have code which creates a new service instance during installation.  Therefore, we need to delete that service instance upon cluster deletion.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create cluster
2. Delete cluster

Actual results:

No leftover service instance

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7173

Bug OCPBUGS-17219: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/754

Story HOSTEDCP-944: Add more conditions to hypershift_hostedclusters_failure_conditions

View the Description View the linked PRs

DoD:

Add more conditions to hypershift_hostedclusters_failure_conditions so metrics provide more info

https://github.com/openshift/hypershift/pull/2347

Bug OCPBUGS-10342: Installation fails if < 3 workers defined and number of compute replicas not set

View the Description View the linked PRs

Description of problem:

This may be something we want to either add a validation for or document. It was initially found at a customer site but I've also confirmed it happens with just a Compact config with no workers. 

They created an agent-config.yaml with 2 worker nodes but did not set the replicas in install-config.yaml, i.e. they did not set 
compute:
- hyperthreading: Enabled
  name: worker
  replicas: {{ num_workers }} 

This resulted in an install failure as by default 3 worker replicas are created if not defined
https://github.com/openshift/installer/blob/master/pkg/types/defaults/machinepools.go#L11

See the attached console screenshot showing that the expected number of hosts doesn't match the actual.

I've also duplicated this with a compact config. We can see that the install failed as start-cluster-installation.sh is looking for 6 hosts.

[core@master-0 ~]$ sudo systemctl status start-cluster-installation.service
● start-cluster-installation.service - Service that starts cluster installation
   Loaded: loaded (/etc/systemd/system/start-cluster-installation.service; enabled; vendor preset: enabled)
   Active: activating (start) since Wed 2023-03-15 14:40:04 UTC; 3min 41s ago
 Main PID: 3365 (start-cluster-i)
    Tasks: 5 (limit: 101736)
   Memory: 1.7M
   CGroup: /system.slice/start-cluster-installation.service
           ├─3365 /bin/bash /usr/local/bin/start-cluster-installation.sh
           ├─5124 /bin/bash /usr/local/bin/start-cluster-installation.sh
           ├─5132 /bin/bash /usr/local/bin/start-cluster-installation.sh
           └─5138 diff /tmp/tmp.vIq1jH9Vf2 /etc/issue.d/90_start-install.issueMar 15 14:42:54 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:04 master-0 start-cluster-installation.sh[4746]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:04 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:15 master-0 start-cluster-installation.sh[4980]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:15 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:25 master-0 start-cluster-installation.sh[5026]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:25 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:35 master-0 start-cluster-installation.sh[5079]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:35 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:45 master-0 start-cluster-installation.sh[5124]: Hosts known and ready for cluster installation (3/6)

Since the compute section in install-config.yaml is optional we can't assume that it will be there 
https://github.com/openshift/installer/blob/master/pkg/types/installconfig.go#L126

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

1. Remove the compute section from install-config.yaml
2. Do an install
3. See the failure

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7059

Bug OCPBUGS-16770: olm-collect-profiles cronjob pods can't reach mgmt KAS

View the Description View the linked PRs

After https://issues.redhat.com//browse/HOSTEDCP-1062, the `olm-collect-profiles` CronJob pods did not get NeedManagementKASAccessLabel label and thus fail

# oc logs olm-collect-profiles-28171952-2v8gn
Error: Get "https://172.29.0.1:443/api?timeout=32s": dial tcp 172.29.0.1:443: i/o timeout

https://github.com/openshift/hypershift/pull/2854

Bug MGMT-14114: [Staging] - Nutanix - uninitialized set on nodes

View the Description View the linked PRs

Description of the problem:

Staging, BE v2.17.3 - Trying to install OCP 4.13 Nutanix cluster and getting no ingress for host error. Igal saw the error is

Warning  FailedScheduling  98m                 default-scheduler  0/5 nodes are available: 2 node(s) didn't match pod anti-affinity rules, 3 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/5 nodes are available: 2 node(s) didn't match pod anti-affinity rules, 3 Preemption is not helpful for scheduling..

Which comes from

 removeUninitializedTaint := false
if cluster.Platform != nil && *cluster.Platform.Type == models.PlatformTypeVsphere {
   removeUninitializedTaint = true
}

How reproducible:

Steps to reproduce:

Actual results:

Expected results:

https://github.com/openshift/assisted-installer/pull/653

Bug OCPBUGS-14384: The whereabouts-reconciler should not set an hard-coded node selector on the kubernetes.io/architecture label

View the Description View the linked PRs

Description of problem:

When deploying a whereabouts-IPAM-based additional network through the cluster-network-operator, the whereabouts-reconciler daemonset is not deployed on non-amd64 clusters due to an hard-coded nodeSelector introduced by https://github.com/openshift/cluster-network-operator/commit/be095d8c378e177d625a92aeca4e919ed0b5a14f

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

Always. Tested on a connected arm64 AWS cluster using the openshift-sdn network

Steps to Reproduce:

1. oc new-project test1
2. oc patch networks.operator.openshift.io/cluster -p '{"spec":{"additionalNetworks":[{"name":"tertiary-net2","namespace":"test1","rawCNIConfig":"{\n  \"cniVersion\": \"0.3.1\",\n  \"name\": \"test\",\n  \"type\": \"macvlan\",\n  \"master\": \"bond0.100\",\n  \"ipam\": {\n    \"type\": \"whereabouts\",\n    \"range\": \"10.10.10.0/24\"\n  }\n}","type":"Raw"}],"useMultiNetworkPolicy":true}}' --type=merge
3. oc get daemonsets -n openshift-multus

Actual results:

NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR              AGE
whereabouts-reconciler          0         0         0       0            0           kubernetes.io/arch=amd64   7m27s

Expected results:

No kubernetes.io/arch=amd64 set, so that non-amd64 and multi-arch compute clusters can schedule the daemonset on each node, regardless of the architecture.

Additional info:

Same problem on s390x

https://github.com/openshift/cluster-network-operator/pull/1828

Bug OCPBUGS-14633: CPO crash if HO is not upgraded

View the Description View the linked PRs

https://github.com/openshift/hypershift/pull/2437 created a binding between HO and CPO as a CPO that contains this PR crashes when deployed by an HO that does not.

The reason appears to be related to the absence of the OPENSHIFT_IMG_OVERRIDES envvar on the CPO deployment.

{"level":"info","ts":"2023-06-06T16:36:21Z","logger":"setup","msg":"Using CPO image","image":"registry.ci.openshift.org/ocp/4.14-2023-06-06-102645@sha256:2d81c28856f5c0a73e55e7cb6fbc208c738fb3ca7c200cc7eb46efb40c8e10d2"}
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/openshift/hypershift/support/util.ConvertImageRegistryOverrideStringToMap({0x0, 0x0})
        /hypershift/support/util/util.go:237 +0x454
main.NewStartCommand.func1(0xc000d80000, {0xc000a71180, 0x0, 0x8})
        /hypershift/control-plane-operator/main.go:345 +0x2225

      containers:
      - args:
        - run
        - --namespace
        - $(MY_NAMESPACE)
        - --deployment-name
        - control-plane-operator
        - --metrics-addr
        - 0.0.0.0:8080
        - --enable-ci-debug-output=false
        - --registry-overrides==
        command:
        - /usr/bin/control-plane-operator

https://github.com/openshift/hypershift/pull/2660

Bug OCPBUGS-18175: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1953

Bug OCPBUGS-11754: oc-mirror sometimes will leave big data under /tmp

View the Description View the linked PRs

Description of problem:
sometimes the oc-mirror command will leave big data under /tmp dir and run out of disk space.

Version-Release number of selected component (if applicable):
oc mirror version
4.12/4.13

How reproducible:
Always

Steps to Reproduce:
1. Not sure the detail steps , but see logs when run oc-mirror command :

Actual results:

[root@preserve-fedora36 588]# oc-mirror --config config.yaml docker://yinzhou-133.mirror-registry.qe.gcp.devcluster.openshift.com:5000 --dest-skip-tls
Checking push permissions for yinzhou-133.mirror-registry.qe.gcp.devcluster.openshift.com:5000
Creating directory: oc-mirror-workspace/src/publish
Creating directory: oc-mirror-workspace/src/v2
Creating directory: oc-mirror-workspace/src/charts
Creating directory: oc-mirror-workspace/src/release-signatures
No metadata detected, creating new workspace

The rendered catalog is invalid.

Run "oc-mirror list operators --catalog CATALOG-NAME --package PACKAGE-NAME" for more information.

error: error rendering new refs: render reference "registry.redhat.io/redhat/redhat-operator-index:v4.11": write /tmp/render-unpack-2866670795/tmp/cache/cache/red-hat-camel-k_latest_red-hat-camel-k-operator.v1.6.0.json: no space left on device
[root@preserve-fedora36 588]# cd /tmp/
[root@preserve-fedora36 tmp]# ls
imageset-catalog-registry-333402727  render-unpack-2230547823

Expected results:
Always delete the created datas under /tmp at any stations.

Additional info:

https://github.com/openshift/oc-mirror/pull/655

Bug OCPBUGS-18338: Tests failing in CI since base image changes

View the Description View the linked PRs

Description of problem:

Tests like lint and vet used to be ran within a container engine by
default if an engine was detected, both locally and in CI.Up until now no container engine was detected in CI, so tests would run natively there.Now that the base image we use in CI has now started
shipping `podman`, a container engine is detected by default and tests
are run within podman by default. But creating nested containers doesn't
work in CI at the moment and thus results in a test failure.As such we are switching the default behaviour for tests (both locally
and in CI), where now by
default no container engine is used to run tests, even if one is
detected, but instead tests are run natively unless otherwise specifi

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-8483: Origin tests should not specify `readyz` as the health check path

View the Description View the linked PRs

Description of problem:

We merged a change into origin to modify a test so that `/readyz` would be used as the health check path. It turns out this makes things worse because we want to use kube-proxy's health probe endpoint to monitor the node health, and kube-proxy only exposes `/healthz` which is the default path anyway.

We should remove the annotation added to change the path and go back to the defaults.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27771

Bug OCPBUGS-10824: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/27818

Bug OCPBUGS-16912: When using IPv6 DHCP the address provided does not match rendezvousIP

View the Description View the linked PRs

Description of problem:

In an IPv6 environment using DHCP, it may not be possible to configure a rendezvousIP that matches the actual address. This is because by default NetworkManager uses DUID-UUIDs for Client ID in the IPv6 DHCP Soliciation (see https://datatracker.ietf.org/doc/html/rfc6355) which are machine dependent. As a result, the DHCPv6 server cannot be configured with a pre-determined Client ID/IPv6 Address pair that matches the rendezvousIP and the nodes will be assigned random IPv6 addresses from the pool of DHCP addresses.

We can see the flow here (the DUID-UUID has a 00:04 prefix)

DHCPSOLICIT(ostestbm) 00:04:56:d2:b1:0b:ba:ef:8c:1a:00:58:3f:ed:e5:d3:5f:85

The DHCP server therefore assigns a new address from the pool, fd2e:6f44:5dd8:c956::32 in this case:
DHCPREPLY(ostestbm) fd2e:6f44:5dd8:c956::32 00:04:56:d2:b1:0b:ba:ef:8c:1a:00:58:3f:ed:e5:d3:5f:85

NetworkManager needs to be configured to use a deterministic Client ID so that a reliable Client ID/IPv6 address can be added to a DHCP server. The best way to do this is to configure NM for dhcp-duid=ll so that it uses a DUID-LL which based on the interface mac address. This is the approach taken by Baremetal IPI in   https://github.com/openshift/machine-config-operator/pull/1395

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Every time

Steps to Reproduce:

1. In an IPv6 environment set up agent-config.yaml with an expected IPv6 address and create the ISO
2. It's not possible to configure the DHCP server to assign this address since the Client ID that Node0 will use is unknown
3. Boot the nodes using the created ISO. The nodes will get IPv6 addresses from the DHCP server but its not possible to access the RendezvousIP

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7384

Bug OCPBUGS-19649: Introduce a node-identity with a validating webhook

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/2011

Task MGMT-15243: Make sure that empty manifests are not applied

View the Description View the linked PRs

It is possible, due to the way that the UI is currently implemented, that a user may be able to submit a manifest with no content.
We need to filter manifests before they are applied to ensure that any manifests that are empty (lack at least one key/value) are not applied.

A good suggested location to look at might be

https://github.com/openshift/assisted-service/blob/master/internal/ignition/ignition.go#L402-L409

https://github.com/openshift/assisted-service/pull/5355

Bug OCPBUGS-17309: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/43

Bug OCPBUGS-13094: [4.14] bootkube.service failed in disconnected network install

View the Description View the linked PRs

Description of problem:

When installing OCP in a disconnected network which doesn’t have access to the public registry, bootkube.service failed

Version-Release number of selected component (if applicable):

from 4.14.0-0.nightly-2023-04-29-153308

How reproducible:

Always

Steps to Reproduce:

1.Prepare a VPC that doesn’t have the access to the Internet, setup a mirror registry inside the VPC and set related ImageContentSource in the install-config
2.Start the installation
3.

Actual results:

Failed when provisioning masters as it couldn’t get master ignition from bootstrap

May 04 07:31:56 maxu-az-dis-6d74v-bootstrap bootkube.sh[246724]: error: unable to read image registry.ci.openshift.org/ocp/release@sha256:227a73d8ff198a55ca0d3314d8fa94835d90769981d1c951ac741b82285f99fc: Get "https://registry.ci.openshift.org/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
May 04 07:31:56 maxu-az-dis-6d74v-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILUREMay 04 07:31:56 maxu-az-dis-6d74v-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

Expected results:

Installation succeeded.

Additional info:

In disconnected install, we’re using the ICSP to pull image from the mirror registry, but bootkube.service was still trying to access the public registry. Checked the change log of bootkube.sh.template, it seems to be a regression issue of https://github.com/openshift/installer/pull/6990, it’s using “oc adm release info -o 'jsonpath={.metadata.version}' "${RELEASE_IMAGE_DIGEST}"” to get current OCP version in this scenario.

https://github.com/openshift/installer/pull/7178

Bug OCPBUGS-13209: After custom tolerations of dns pod, the new pod stuck in pending state

View the Description View the linked PRs

Description of problem:

 After custom toleration (tainting the dns pod) on master node the dns pod stuck in pending state

Version-Release number of selected component (if applicable):

How reproducible:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-41050

Steps to Reproduce:

1.melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-05-03-163151   True        False         4h5m    Cluster version is 4.14.0-0.nightly-2023-05-03-163151
2.check default dns pods placement
melvinjoseph@mjoseph-mac Downloads % ouf5M-5AVBm-Taoxt-aIgPmoc -n openshift-dns get pod -owide
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
dns-default-6cv9k     2/2     Running   0          4h12m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
dns-default-8g2w8     2/2     Running   0          4h12m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
dns-default-df7zj     2/2     Running   0          4h18m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
dns-default-kmv4c     2/2     Running   0          4h18m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
dns-default-lxxkt     2/2     Running   0          4h18m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
dns-default-mjrnx     2/2     Running   0          4h11m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-5bnjv   1/1     Running   0          4h12m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
node-resolver-7ns8b   1/1     Running   0          4h18m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
node-resolver-bz7k5   1/1     Running   0          4h12m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
node-resolver-c67mw   1/1     Running   0          4h18m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
node-resolver-d8h65   1/1     Running   0          4h12m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-rgb92   1/1     Running   0          4h18m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>

 3.oc -n openshift-dns get ds/dns-default -oyaml
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2023-05-08T00:39:00Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 1
  name: default
  resourceVersion: "22893"
  uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
  cache:
    negativeTTL: 0s
    positiveTTL: 0s
  logLevel: Normal
  nodePlacement: {}
  operatorLogLevel: Normal
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2023-05-08T00:39:25Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2023-05-08T00:39:01Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable


4. config custom tolerations of dns pod (to not tolerate master node taints)
 $ oc edit dns.operator default
 spec:
   nodePlacement:
     tolerations:
     - effect: NoExecute
       key: my-dns-test
       operators: Equal
       value: abc
       tolerationSeconds: 3600 
melvinjoseph@mjoseph-mac Downloads % oc edit dns.operator default
Warning: unknown field "spec.nodePlacement.tolerations[0].operators"
dns.operator.openshift.io/default edited
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
dns-default-6cv9k     2/2     Running   0          5h16m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
dns-default-8g2w8     2/2     Running   0          5h16m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
dns-default-df7zj     2/2     Running   0          5h22m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
dns-default-kmv4c     2/2     Running   0          5h22m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
dns-default-lxxkt     2/2     Running   0          5h22m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
dns-default-mjrnx     2/2     Running   0          5h16m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
dns-default-xqxr9     0/2     Pending   0          7s      <none>        <none>                                                     <none>           <none>
node-resolver-5bnjv   1/1     Running   0          5h17m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
node-resolver-7ns8b   1/1     Running   0          5h22m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
node-resolver-bz7k5   1/1     Running   0          5h16m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
node-resolver-c67mw   1/1     Running   0          5h22m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
node-resolver-d8h65   1/1     Running   0          5h16m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-rgb92   1/1     Running   0          5h22m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>


The dns pod stuck in pending state

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get ds/dns-default -oyaml
<-----snip--->
      tolerations:
      - effect: NoExecute
        key: my-dns-test
        tolerationSeconds: 3600
        value: abc
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile
            path: Corefile
          name: dns-default
        name: config-volume
      - name: metrics-tls
        secret:
          defaultMode: 420
          secretName: dns-default-metrics-tls
  updateStrategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 0
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 3
  numberReady: 3
  observedGeneration: 2


melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2023-05-08T00:39:00Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 2
  name: default
  resourceVersion: "125435"
  uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
  cache:
    negativeTTL: 0s
    positiveTTL: 0s
  logLevel: Normal
  nodePlacement:
    tolerations:
    - effect: NoExecute
      key: my-dns-test
      tolerationSeconds: 3600
      value: abc
  operatorLogLevel: Normal
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-05-08T06:01:52Z"
    message: Have 0 up-to-date DNS pods, want 3.
    reason: Reconciling
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-05-08T00:39:25Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2023-05-08T00:39:01Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable


melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod                  
NAME                  READY   STATUS    RESTARTS   AGE
dns-default-6cv9k     2/2     Running   0          5h18m
dns-default-8g2w8     2/2     Running   0          5h18m
dns-default-df7zj     2/2     Running   0          5h25m
dns-default-kmv4c     2/2     Running   0          5h25m
dns-default-lxxkt     2/2     Running   0          5h25m
dns-default-mjrnx     2/2     Running   0          5h18m
dns-default-xqxr9     0/2     Pending   0          2m12s
node-resolver-5bnjv   1/1     Running   0          5h19m
node-resolver-7ns8b   1/1     Running   0          5h25m
node-resolver-bz7k5   1/1     Running   0          5h19m
node-resolver-c67mw   1/1     Running   0          5h25m
node-resolver-d8h65   1/1     Running   0          5h19m
node-resolver-rgb92   1/1     Running   0          5h25m

Actual results:

The dns pod dns-default-xqxr9  stuck in pending state

Expected results:

There will be reloaded DNS pods

Additional info:

melvinjoseph@mjoseph-mac Downloads % oc describe po/dns-default-xqxr9  -n openshift-dns
Name:                 dns-default-xqxr9
Namespace:            openshift-dns
Priority:             2000001000


<----snip--->
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 my-dns-test=abc:NoExecute for 3600s
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3m45s  default-scheduler  0/6 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 Preemption is not helpful for scheduling, 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) didn't match Pod's node affinity/selector..

https://github.com/openshift/cluster-dns-operator/pull/379

Bug OCPBUGS-14793: Unable to do post-copy migration

View the Description View the linked PRs

This bug is created to get CNV bugzilla bug https://bugzilla.redhat.com/show_bug.cgi?id=2164836 fix into MCO repo.

https://github.com/openshift/machine-config-operator/pull/3724

Bug OCPBUGS-15773: The upgrade Helm Release tab in OpenShift GUI Developer console is not refreshing with updated values.

View the Description View the linked PRs

Description of problem:

The upgrade Helm Release tab in OpenShift GUI Developer console is not refreshing with updated values.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. Add below Helm chart repository from CLI

~~~
apiVersion: helm.openshift.io/v1beta1
kind: HelmChartRepository
metadata:
  name: prometheus-community
spec:
  connectionConfig:
    url: 'https://prometheus-community.github.io/helm-charts'
  name: prometheus-community
~~~
2. Goto GUI and select Developer console --> +Add --> Developer Catalog --> Helm Chart --> Select Prometheus Helm chart --> Install Helm chart --> From dropdown of chart version select 22.3.0 --> Install

3. You will see the image tag as v0.63.0
~~~
    image:
      digest: ''
      pullPolicy: IfNotPresent
      repository: quay.io/prometheus-operator/prometheus-config-reloader
      tag: v0.63.0
~~~ 
4. Once that is installed Goto Helm --> Helm Releases --> Prometheus --> Upgrade --> From dropdown of chart version select 22.4.0 --> the page does not refresh with new value of the tag.

~~~
    image:
      digest: ''
      pullPolicy: IfNotPresent
      repository: quay.io/prometheus-operator/prometheus-config-reloader
      tag: v0.63.0
~~~

NOTE: The same steps before installing the helm chart, when we select different versions the value is being updated.
Goto GUI and select Developer console --> +Add --> Developer Catalog --> Helm Chart --> Select Prometheus Helm chart --> Install Helm chart --> From dropdown of chart version select 22.3.0 --> Now select different chart version like 22.7.0 or 22.4.0

Actual results:

The The yaml view of Upgrade Helm Release tab shows the values of older chart version.

Expected results:

The yaml view of Upgrade Helm Release tab should contain latest values as per selected chart version.

Additional info:

https://github.com/openshift/console/pull/12966

Bug OCPBUGS-11359: Storage CO should clean up the previous CSIDriverOperator's version in status.versions

View the Description View the linked PRs

Description of problem:

Customer upgraded AWS cluster from 4.8 to 4.9. All are update well but When checking the co/storage.status.versions, the AWSEBSCSIDriverOperator version is list but with previous version: 
$ oc get co storage -o json | jq .status.versions
[
  {
    "name": "operator",
    "version": "4.9.50"
  },
  {
    "name": "AWSEBSCSIDriverOperator",
    "version": "4.8.48"
  }
]

From 4.9, seems CSO doesn't report the CSIDriverOperator version, so the previous CSIDriverOperator version which is not correct should be cleaned up in such case.

Version-Release number of selected component (if applicable):

upgrade from 4.8.48 to 4.9.50

How reproducible:

Always

Steps to Reproduce:

1. Install AWS cluster with 4.8
2. Upgrade cluster to 4.9
3. Check co/storage.status.versions

Actual results:

[ { "name": "operator", "version": "4.9.50" }, { "name": "AWSEBSCSIDriverOperator", "version": "4.8.48" } ]

Expected results:

From 4.9. seems CSO doesn't report the CSIDriverOperator version, so the previous CSIDriverOperator version which is not correct should be cleaned up.

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/374

Bug OCPBUGS-12748: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7118

Bug OCPBUGS-14638: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/294

Bug OCPBUGS-15835: 9% of OKD tests failing on error: tag latest failed: Internal error occurred: registry.centos.org/dotnet/dotnet-31-centos7:latest: Get "https://registry.centos.org/v2/": dial tcp: lookup registry.centos.org on 172.30.0.10:53: no such host

View the Description View the linked PRs

Description of problem:

https://search.ci.openshift.org/?search=error%3A+tag+latest+failed%3A+Internal+error+occurred%3A+registry.centos.org&maxAge=48h&context=1&type=build-log&name=okd&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

all currently tested versions

How reproducible:

~ 9% of jobs fail on this test

 ! error: Import failed (InternalError): Internal error occurred: registry.centos.org/dotnet/dotnet-31-runtime-centos7:latest: Get "https://registry.centos.org/v2/": dial tcp: lookup registry.centos.org on 172.30.0.10:53: no such host   782 31 minutes ago

https://github.com/openshift/origin/pull/28029

Bug OCPBUGS-17568: Agent-based install process the container machine-config-controller will be oom

View the Description View the linked PRs

Description of problem:

Customer used Agent-based installer to install 4.13.8 on they CID env, but during install process, the bootstrap machine had oom issue, check sosreport find the init container had oom issue

NOTE: Issue is not see when testing with 4.13.6, per the customer

initContainers:

name: machine-config-controller
image: .Images.MachineConfigOperator
command: ["/usr/bin/machine-config-controller"]
args:
"bootstrap"
"--manifest-dir=/etc/mcc/bootstrap"
"--dest-dir=/etc/mcs/bootstrap"
"--pull-secret=/etc/mcc/bootstrap/machineconfigcontroller-pull-secret"
"--payload-version=.ReleaseVersion"
resources:
limits:
memory: 50Mi

we found the sosreport dmesg and crio logs had oom kill machine-config-controller container issue, the issue was cause by cgroup kill, so looks like the limit 50M is too small

The customer used a physical machine that had 100GB of memory

the customer had some network config in asstant install yaml file, maybe the issue is them had some nic config?

log files:
1. sosreport
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/b5501734-60be-4de4-adcf-da57e22cbb8e?usePresignedUrl=true

2. asstent installer yaml file
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/a32635cf-112d-49ed-828c-4501e95a0e7a?usePresignedUrl=true

3. bootstrap machine oom screenshot
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/eefe2e57-cd23-4abd-9e0b-dd45f20a34d2?usePresignedUrl=true

https://github.com/openshift/machine-config-operator/pull/3862

Bug OCPBUGS-6882: Machine should create failed when availabilityZone and subnet id is mismatch (AWS)

View the Description View the linked PRs

Description of problem:

Machine should create failed when availabilityZone and subnet id is mismatch, 
currently the machine create successfully when availabilityZone and subnet id is mismatch, and the cpms cannot be recreated after deleting.
Another, for the subnet is filter, if availabilityZone and filter is mismatch, the machine will create failed.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-31-072358

How reproducible:

always

Steps to Reproduce:

1.Create a machineset whose availabilityZone and subnet id is mismatch, for example, availabilityZone is us-east-2a, but the subnet id is for us-east-2b

          placement:
            availabilityZone: us-east-2a
            region: us-east-2
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - huliu-aws1w-nk5xd-worker-sg
          subnet:
            id: subnet-0107b4d7cfa35eb9b 

2.Machine created successfully in us-east-2b zone
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                PHASE     TYPE         REGION      ZONE         AGE
huliu-aws1w-nk5xd-master-0                          Running   m6i.xlarge   us-east-2   us-east-2a   62m
huliu-aws1w-nk5xd-master-1                          Running   m6i.xlarge   us-east-2   us-east-2b   62m
huliu-aws1w-nk5xd-master-2                          Running   m6i.xlarge   us-east-2   us-east-2a   62m
huliu-aws1w-nk5xd-windows-worker-us-east-2a-689vq   Running   m5a.large    us-east-2   us-east-2b   37m
huliu-aws1w-nk5xd-windows-worker-us-east-2a-nf9dl   Running   m5a.large    us-east-2   us-east-2b   37m
huliu-aws1w-nk5xd-worker-us-east-2a-8kpht           Running   m6i.xlarge   us-east-2   us-east-2a   59m
huliu-aws1w-nk5xd-worker-us-east-2a-dmtlc           Running   m6i.xlarge   us-east-2   us-east-2a   59m
huliu-aws1w-nk5xd-worker-us-east-2b-kdn75           Running   m6i.xlarge   us-east-2   us-east-2b   59m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine -o yaml |grep "id: subnet"
          id: subnet-0fef0e9e255742f3a
          id: subnet-0107b4d7cfa35eb9b
          id: subnet-0fef0e9e255742f3a
          id: subnet-0107b4d7cfa35eb9b
          id: subnet-0107b4d7cfa35eb9b
          id: subnet-0fef0e9e255742f3a
          id: subnet-0fef0e9e255742f3a
          id: subnet-0107b4d7cfa35eb9b

Actual results:

Machine created successfully in the zone which the subnet id stands for, for the case it created in us-east-2b

huliu-aws1w-nk5xd-windows-worker-us-east-2a-689vq   Running   m5a.large    us-east-2   us-east-2b   37m
huliu-aws1w-nk5xd-windows-worker-us-east-2a-nf9dl   Running   m5a.large    us-east-2   us-east-2b   37m

Expected results:

Machine should create failed as availabilityZone and subnet id is mismatch

Additional info:

1. For the subnet is filter, if availabilityZone and filter is mismatch, the machine will create failed.

huliu-aws1w2-x2tnx-worker-2-m4r8m            Failed                                          4s 
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws1w2-x2tnx-worker-2-m4r8m  -o yaml
…
      placement:
        availabilityZone: us-east-2a
        region: us-east-2
      securityGroups:
      - filters:
        - name: tag:Name
          values:
          - huliu-aws1w2-x2tnx-worker-sg
      spotMarketOptions: {}
      subnet:
        filters:
        - name: tag:Name
          values:
          - huliu-aws1w2-x2tnx-private-us-east-2c
      tags:
      - name: kubernetes.io/cluster/huliu-aws1w2-x2tnx
        value: owned
      userDataSecret:
        name: worker-user-data
status:
  conditions:
  - lastTransitionTime: "2023-02-01T02:45:52Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2023-02-01T02:45:52Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2023-02-01T02:45:52Z"
    status: "True"
    type: Terminable
  errorMessage: 'error getting subnet IDs: no subnet IDs were found'
  errorReason: InvalidConfiguration
  lastUpdated: "2023-02-01T02:45:53Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2023-02-01T02:45:53Z"
      message: 'error getting subnet IDs: no subnet IDs were found'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation

2.For this case, machine create successfully when availabilityZone and subnet id is mismatch, the cpms cannot be recreated after deleting.

liuhuali@Lius-MacBook-Pro huali-test % oc delete controlplanemachineset cluster 
controlplanemachineset.machine.openshift.io "cluster" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset                                
No resources found in openshift-machine-api namespace.

I0201 02:11:07.850022       1 http.go:143] controller-runtime/webhook/webhooks "msg"="wrote response" "UID"="12f118c4-fafe-45f9-bd24-876abdb8ba83" "allowed"=false "code"=403 "reason"="spec.template.machines_v1beta1_machine_openshift_io.failureDomains: Forbidden: no control plane machine is using specified failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0107b4d7cfa35eb9b}}], failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0fef0e9e255742f3a}}] are duplicated within the control plane machines, please correct failure domains to match control plane machines" "webhook"="/validate-machine-openshift-io-v1-controlplanemachineset"
I0201 02:11:07.850787       1 controller.go:144]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="767c4631-ed83-47da-b316-29a21cdba245"
E0201 02:11:07.850828       1 controller.go:326]  "msg"="Reconciler error" "error"="error reconciling control plane machine set: unable to create control plane machine set: unable to create control plane machine set: admission webhook \"controlplanemachineset.machine.openshift.io\" denied the request: spec.template.machines_v1beta1_machine_openshift_io.failureDomains: Forbidden: no control plane machine is using specified failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0107b4d7cfa35eb9b}}], failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0fef0e9e255742f3a}}] are duplicated within the control plane machines, please correct failure domains to match control plane machines" "controller"="controlplanemachinesetgenerator" "reconcileID"="767c4631-ed83-47da-b316-29a21cdba245"

Bug OCPBUGS-11142: CPMS: node readiness transitions not always trigger reconcile

View the Description View the linked PRs

Description of problem:

With the recent update in the logic for considering a CPMS replica Ready only when both the backing Machine is running and the backing Node is Ready: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/171, we now need to watch nodes at all times to detect nodes transitioning in readiness.

The majority of occurrences of this issue have been fixed with: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/177 (https://issues.redhat.com//browse/OCPBUGS-10032) but we also need to watch the control plane nodes at steady state (when they are already Ready), to notice if they go UnReady at any point, as relying on control plane machine events is not enough (they might be Running, while the Node has transitioned to NotReady).

Version-Release number of selected component (if applicable):

4.13, 4.14

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/182

Bug OCPBUGS-11256: Topology UI doesn't recognize Serverless Rust function for proper UI icon

View the Description View the linked PRs

Description of problem:

Topology UI doesn't recognize Serverless Rust function for proper UI icon

Version-Release number of selected component (if applicable):

4.12.0

How reproducible:

Always

Steps to Reproduce:

1. Deploy 3 KNative/Serverless functions: Quarkus, Spring Boot, Rust
2. Observe in Topology UI that only for Quarku and Spring Boot specific icons are used, while for Rust case - regular icon for OpenShift
3. Check each of presented UI snippets/rectangles and find such related labels:
For Quarkus: 
app.openshift.io/runtime=quarkus
function.knative.dev/runtime=rust

For Spring Boot:
app.openshift.io/runtime=spring-boot
function.knative.dev/runtime=springboot

For Rust:
function.knative.dev/runtime=rust (no presence of app.openshift.io/runtime=rust for it)

Actual results:

No specific UI icon for Rust function

Expected results:

Specific UI icon for Rust function

Additional info:

https://github.com/openshift/console/pull/12816

Bug OCPBUGS-11442: User configured In-cluster proxy configuration squashed in hypershift

View the Description View the linked PRs

Description of problem:

Currently: Hypershift is squashing any user configured proxy configuration based on this line: https://github.com/openshift/hypershift/blob/main/support/globalconfig/proxy.go#L21-L28, https://github.com/openshift/hypershift/blob/release-4.11/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L487-L493. Because of this any user changes to the cluster-wide proxy configuration documented here: https://docs.openshift.com/container-platform/4.12/networking/enable-cluster-wide-proxy.html are squashed and not valid for more than a few seconds. That blocks some functionality in the openshift cluster from working including application builds from the openshift samples provided in the cluster.

Version-Release number of selected component (if applicable):

4.13 4.12 4.11

How reproducible:

100%

Steps to Reproduce:

1. Make a change to the Proxy object in the cluster with kubectl edit proxy cluster
2. Save the change
3. Wait a few seconds

Actual results:

HostedClusterConfig operator will go in and squash the value

Expected results:

The value the user provides remains in the configuration and is not squashed to an empty value

Additional info:

https://github.com/openshift/hypershift/pull/2382

Bug OCPBUGS-11649: status of awsendpointservice conditions doesn't reflect status of endpoint and endpointservice on AWS

View the Description View the linked PRs

Description of problem:

In awsendpointservice CR AWSEndpointAvailable is still true when endpoint is deleted on AWS console, and AWSEndpointServiceAvailable is still true when endpoint service is deleted on AWS console.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a PublicAndPrivate or Private cluster, wait for cluster to come up
2. Check conditions in awsendpointservice cr, status of AWSEndpointAvailable and AWSEndpointServiceAvailable should be True
3. On AWS console delete endpoint
4. In awsendpointservice cr, check if condition AWSEndpointAvailable is changed to false 
5. On AWS console delete endpoint service
6. In awsendpointservice cr, check if condition AWSEndpointServiceAvailable is changed to false

Actual results:

status of AWSEndpointAvailable and AWSEndpointServiceAvailable is True

Expected results:

status of AWSEndpointAvailable and AWSEndpointServiceAvailable should be False

Additional info:

https://github.com/openshift/hypershift/pull/2424

Bug OCPBUGS-12579: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/915

Bug OCPBUGS-7395: Users don't know what type of resource is being created by Import from Git or Deploy Image flows

View the Description View the linked PRs

Description of problem

Since resource type option has been moved to an advanced option in both the Deploy Image and Import from Git flows, there is confusion for some existing customers who are using the feature.

The UI no longer provides transparency of the type of resource which is being created.

Version-Release number of selected component (if applicable)

How reproducible

Steps to Reproduce

1.
2.
3.

Actual results

Expected results

Remove Resource type from Adv Options, and place it back where it was previously. Resource type selection is now a dropdown so that we will put it in its previous spot, but it will use a different component from 4.11.

https://github.com/openshift/console/pull/12615

Bug OCPBUGS-11997: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3794

Bug OCPBUGS-12995: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1958

Bug OCPBUGS-17681: clusteroperator/network is degraded because DaemonSet /openshift-multus/dhcp-daemon rollout is not making progress - pod dhcp-daemon-* is in CrashLoopBackOff State

View the Description View the linked PRs

Description of problem:

clusteroperator/network is degraded after running

    FEATURES_ENVIRONMENT="ci" make feature-deploy-on-ci

from openshift-kni/cnf-features-deploy against IPI clusters with OCP 4.13 and 4.14 in CI jobs from Telco 5G DevOps/CI.

Details for a 4.13 job:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/42141/rehearse-42141-periodic-ci-openshift-release-master-nightly-4.13-e2e-telco5g/1689935408508440576

Details for a 4.14 job:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/42141/rehearse-42141-periodic-ci-openshift-release-master-nightly-4.14-e2e-telco5g/1689935408541995008

For example, got to artifacts/e2e-telco5g/telco5g-gather-pao/build-log.txt and it will report:

Error from server (BadRequest): container "container-00" in pod "cnfdu5-worker-0-debug" is waiting to start: ContainerCreating
Running gather-pao for T5CI_VERSION=4.13
Running for CNF_BRANCH=master
Running PAO must-gather with tag pao_mg_tag=4.12
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-kni/performance-addon-operator-must-gather:4.12-snapshot
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 60503edf-ecc6-48f7-b6a6-f4dc34842803
ClusterVersion: Stable at "4.13.0-0.nightly-2023-08-10-021434"
ClusterOperators:
	clusteroperator/network is degraded because DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-7lmlq is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-95tzb is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-hfxkd is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-mhwtp is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-q7gfb is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - last change 2023-08-11T10:54:10Z

Version-Release number of selected component (if applicable):

branch release-4.13 from https://github.com/openshift-kni/cnf-features-deploy.git for OCP 4.13
branch master from https://github.com/openshift-kni/cnf-features-deploy.git for OCP 4.14

How reproducible:

Always.

Steps to Reproduce:

1. Install OCP 4.13 or OCP 4.14 with IPI on 3x masters, 2x workers.
2. Clone https://github.com/openshift-kni/cnf-features-deploy.git
3. FEATURES_ENVIRONMENT="ci" make feature-deploy-on-ci
4. oc wait nodes --all --for=condition=Ready=true --timeout=10m
5. oc wait clusteroperators --all --for=condition=Progressing=false --timeout=10m

Actual results:

See above.

Expected results:

All clusteroperators have finished progressing.

Additional info:

Without 'FEATURES_ENVIRONMENT="ci" make feature-deploy-on-ci' the steps to reproduce above work as expected.

https://github.com/openshift/containernetworking-plugins/pull/116

Bug OCPBUGS-18728: Fail to install with Kuryr due to issue when validating certificate for the API

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18517~~. The following is the description of the original issue:
—
Description of problem:

Installation with Kuryr is failing because multiple components are attempting to connect to the API and fail with the following error:

failed checking apiserver connectivity: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-service-ca/leases/service-ca-controller-lock": tls: failed to verify certificate: x509: cannot validate certificate for 172.30.0.1 because it doesn't contain any IP SANs

$ oc get po -A -o wide |grep -v Running |grep -v Pending |grep -v Completed
NAMESPACE                                          NAME                                                        READY   STATUS             RESTARTS          AGE     IP              NODE                   NOMINATED NODE   READINESS GATES
openshift-apiserver-operator                       openshift-apiserver-operator-559d855c56-c2rdr               0/1     CrashLoopBackOff   42 (2m28s ago)    3h44m   10.128.16.86    kuryr-5sxhw-master-2   <none>           <none>
openshift-apiserver                                apiserver-6b9f5d48c4-bj6s6                                  0/2     CrashLoopBackOff   92 (4m25s ago)    3h36m   10.128.70.10    kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-csi-drivers                      manila-csi-driver-operator-75b64d8797-fckf5                 0/1     CrashLoopBackOff   42 (119s ago)     3h41m   10.128.56.21    kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-csi-drivers                      openstack-cinder-csi-driver-operator-84dfd8d89f-kgtr8       0/1     CrashLoopBackOff   42 (82s ago)      3h41m   10.128.56.9     kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-node-tuning-operator             cluster-node-tuning-operator-7fbb66545c-kh6th               0/1     CrashLoopBackOff   46 (3m5s ago)     3h44m   10.128.6.40     kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 cluster-storage-operator-5545dfcf6d-n497j                   0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.21.175   kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-ddb9469f9-bc4bb                     0/1     CrashLoopBackOff   45 (2m17s ago)    3h41m   10.128.20.106   kuryr-5sxhw-master-1   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-operator-6d7b66dbdd-xdwcs           0/1     CrashLoopBackOff   42 (92s ago)      3h44m   10.128.21.220   kuryr-5sxhw-master-2   <none>           <none>
openshift-config-operator                          openshift-config-operator-c5d5d964-2w2bv                    0/1     CrashLoopBackOff   80 (3m39s ago)    3h44m   10.128.43.39    kuryr-5sxhw-master-2   <none>           <none>
openshift-controller-manager-operator              openshift-controller-manager-operator-754d748cf7-rzq6f      0/1     CrashLoopBackOff   42 (3m6s ago)     3h44m   10.128.25.166   kuryr-5sxhw-master-2   <none>           <none>
openshift-etcd-operator                            etcd-operator-76ddc94887-zqkn7                              0/1     CrashLoopBackOff   49 (30s ago)      3h44m   10.128.32.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-ingress-operator                         ingress-operator-9f76cf75b-cjx9t                            1/2     CrashLoopBackOff   39 (3m24s ago)    3h44m   10.128.9.108    kuryr-5sxhw-master-2   <none>           <none>
openshift-insights                                 insights-operator-776cd7cfb4-8gzz7                          0/1     CrashLoopBackOff   46 (4m21s ago)    3h44m   10.128.15.102   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver-operator                  kube-apiserver-operator-64f4db777f-7n9jv                    0/1     CrashLoopBackOff   42 (113s ago)     3h44m   10.128.18.199   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver                           installer-5-kuryr-5sxhw-master-1                            0/1     Error              0                 3h35m   10.128.68.176   kuryr-5sxhw-master-1   <none>           <none>
openshift-kube-controller-manager-operator         kube-controller-manager-operator-746497b-dfbh5              0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.13.162   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-controller-manager                  installer-4-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.65.186   kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-scheduler-operator                  openshift-kube-scheduler-operator-695fb4449f-j9wqx          0/1     CrashLoopBackOff   42 (63s ago)      3h44m   10.128.44.194   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-scheduler                           installer-5-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.60.44    kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-6c5cd46578-qpk5z     0/1     CrashLoopBackOff   42 (2m18s ago)    3h44m   10.128.4.120    kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              cluster-autoscaler-operator-7b667675db-tmlcb                1/2     CrashLoopBackOff   46 (2m53s ago)    3h45m   10.128.28.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              machine-api-controllers-fdb99649c-ldb7t                     3/7     CrashLoopBackOff   184 (2m55s ago)   3h40m   10.128.29.90    kuryr-5sxhw-master-0   <none>           <none>
openshift-route-controller-manager                 route-controller-manager-d8f458684-7dgjm                    0/1     CrashLoopBackOff   43 (100s ago)     3h36m   10.128.55.11    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca-operator                      service-ca-operator-654f68c77f-g4w55                        0/1     CrashLoopBackOff   42 (2m2s ago)     3h45m   10.128.22.30    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca                               service-ca-5f584b7d75-mxllm                                 0/1     CrashLoopBackOff   42 (45s ago)      3h42m   10.128.49.250   kuryr-5sxhw-master-0   <none>           <none>

$ oc get svc -A |grep  172.30.0.1 
default                                            kubernetes                                       ClusterIP   172.30.0.1       <none>        443/TCP                           3h50m

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1995

Bug MGMT-14283: [Staging] [BE] - ignore validation API - API not accepting "all" to ignore all validation IDs

View the Description View the linked PRs

Description of the problem:

In staging, BE 2.18.0 - Trying to set all validation IDs to be ignored with:

curl -X 'PUT' 'https://api.stage.openshift.com/api/assisted-install/v2/clusters/26a69b99-06a3-441b-be40-73cadbac6b6a/ignored-validations'   --header "Authorization: Bearer $(ocm token)"   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "host-validation-ids": "[]",                          
  "cluster-validation-ids": "[\"all\"]"       
}'

Getting this response:

 {"code":"400","href":"","id":400,"kind":"Error","reason":"cannot proceed due to the following errors: Validation ID 'all' is not a known cluster validation"}

How reproducible:

100%

Steps to reproduce:

Actual results:

Expected results:
All ignorable validations should added to ignore list

https://github.com/openshift/assisted-service/pull/5117

Bug OCPBUGS-12729: Dual stack VIPs incompatible with EnableUnicast setting

View the Description View the linked PRs

Description of problem:

This came out of the investigation of https://issues.redhat.com/browse/OCPBUGS-11691 . The nested node configs used to support dual stack VIPs do not correctly respect the EnableUnicast setting. This is causing issues on EUS upgrades where the unicast migration cannot happen until all nodes are on 4.12. This is blocking both the workaround and the eventual proper fix.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Deploy 4.11 with unicast explicitly disabled (via MCO patch)
2. Write /etc/keepalived/monitor-user.conf to suppress unicast migration
3. Upgrade to 4.12

Actual results:

Nodes come up in unicast mode

Expected results:

Nodes remain in multicast mode until monitor-user.conf is removed

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/237

Bug OCPBUGS-15544: [Reliability] regression: continuously memory increase on a ovnkube-node pod

View the Description View the linked PRs

Description of problem:

In Reliability (loaded longrun) test, the memory of ovnkube-node-xxx pods on all 6 nodes keep increasing. Within 24 hours, increased to about 1.6G. I did not see this issue in previous releases.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-27-000502

How reproducible:

I met this issue the first time

Steps to Reproduce:

1. Install a AWS OVN cluster with 3 masters, 3 workers, vm_type are all m5.xlarge.
2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2 with config: 1 admin, 15 dev-test, 1 dev-prod. The test will long run the configured tasks.
3. Monitor the test failures in and performance dashboard.

Test failures slack notification: https://redhat-internal.slack.com/archives/C0266JJ4XM5/p1687944463913769

Performance dashboard:http://dittybopper-dittybopper.apps.qili-414-haproxy.qe-lrc.devcluster.openshift.com/d/IgK5MW94z/openshift-performance?orgId=1&from=1687944452000&to=now&refresh=1h

Actual results:

The memory of ovnkube-node-xxx pods on all 6 nodes keep increasing.
Within 24 hours, increased to about 1.6G.

Expected results:

The memory of ovnkube-node-xxx pods

Additional info:

% oc adm top pod -n openshift-ovn-kubernetes | grep node
ovnkube-node-4t282     146m         1862Mi          
ovnkube-node-9p462     41m          1847Mi          
ovnkube-node-b6rqj     46m          2032Mi          
ovnkube-node-fp2gn     72m          2107Mi          
ovnkube-node-hxf95     11m          2359Mi          
ovnkube-node-ql9fx     38m          2089Mi

I did a pprof heap on one of the pod and upload to heap-ovnkube-node-4t282.out
Must-gather is uploaded to must-gather.local.1315176578017655774.tar.gz
performance dashboard screenshot for ovnkube-node-memory.png

Bug OCPBUGS-19797: HyperShift guest cluster does not have cloudcredentials instance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17906~~. The following is the description of the original issue:
—
Description of problem:

On Hypershift(Guest) cluster, EFS driver pod stuck at ContainerCreating state

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create Hypershift cluster.    
Flexy template: aos-4_14/ipi-on-aws/versioned-installer-ovn-hypershift-ci

2. Try to install EFS operator and driver from yaml file/web console as mentioned in below steps.  
a) Create iam role from ccoctl tool and will get ROLE ARN value from the output   
b) Install EFS operator using the above ROLE ARN value.   
c) Check EFS operator, node, controller pods are up and running  

// og-sub-hcp.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-cluster-csi-drivers-
  namespace: openshift-cluster-csi-drivers
spec:
  namespaces:
  - ""
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: aws-efs-csi-driver-operator
  namespace: openshift-cluster-csi-drivers
spec:
    channel: stable
    name: aws-efs-csi-driver-operator
    source: qe-app-registry
    sourceNamespace: openshift-marketplace
    config:
      env:
      - name: ROLEARN
        value: arn:aws:iam::301721915996:role/hypershift-ci-16666-openshift-cluster-csi-drivers-aws-efs-cloud-

// driver.yaml
apiVersion: operator.openshift.io/v1
kind: ClusterCSIDriver
metadata:
  name: efs.csi.aws.com
spec:
  logLevel: TraceAll
  managementState: Managed
  operatorLogLevel: TraceAll

Actual results:

aws-efs-csi-driver-controller-699664644f-dkfdk   0/4     ContainerCreating   0          87m

Expected results:

EFS controller pods should be up and running

Additional info:

oc -n openshift-cluster-csi-drivers logs aws-efs-csi-driver-operator-6758c5dc46-b75hb

E0821 08:51:25.160599       1 base_controller.go:266] "AWSEFSDriverCredentialsRequestController" controller failed to sync "key", err: cloudcredential.operator.openshift.io "cluster" not found

Discussion: https://redhat-internal.slack.com/archives/GK0DA0JR5/p1692606247221239
Installation steps epic: https://issues.redhat.com/browse/STOR-1421

https://github.com/openshift/hypershift/pull/3053

Bug OCPBUGS-16204: [CORS-2602]Masters are not attached with the provided custom security groups

View the Description View the linked PRs

Description of problem:

Set custom security group IDs in the following fields of install-config.yaml

installconfig.controlPlane.platform.aws.additionalSecurityGroupIDs installconfig.compute.platform.aws.additionalSecurityGroupIDs

such as: 

apiVersion: v1
 controlPlane:
   architecture: amd64
   hyperthreading: Enabled
   name: master
   platform:
     aws:
       additionalSecurityGroupIDs:
       - sg-0d2f88b2980aa5547
       - sg-01f1d2f60a3b4cf6d
   replicas: 3
 compute:
 - architecture: amd64
   hyperthreading: Enabled
   name: worker
   platform:
     aws:
       additionalSecurityGroupIDs:
       - sg-03418b6e2f68e1f63
       - sg-0376fc68fd4b834a4
   replicas: 3


After installation, check the Security Groups attached to master and worker, master doesn’t have the specified custom security groups attached while workers have. 

For one of the masters:
[root@preserve-gpei-worker ~]# aws ec2 describe-instances --instance-ids i-0cd007cca57c86ee9 --region us-west-2 --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "terraform-20230713031140984600000002",
                "GroupId": "sg-05495718555950f77"
            }
        ]
    ]
]

For one of the workers:
[root@preserve-gpei-worker ~]# aws ec2 describe-instances --instance-ids i-0572b7bde8ff07ac4 --region us-west-2 --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "gpei-0613a-worker-2",
                "GroupId": "sg-0376fc68fd4b834a4"
            },
            {
                "GroupName": "gpei-0613a-worker-1",
                "GroupId": "sg-03418b6e2f68e1f63"
            },
            {
                "GroupName": "terraform-20230713031140982700000001",
                "GroupId": "sg-0ce73044e426fe249"
            }
        ]
    ]
]

Also checked the master’s controlplanemachineset, it does have the custom security groups configured, but they’re not attached to the master instance in the end.

[root@preserve-gpei-worker k_files]# oc get controlplanemachineset -n openshift-machine-api cluster -o yaml |yq .spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.securityGroups
- filters:
    - name: tag:Name
      values:
        - gpei-0613a-pzjbk-master-sg
- id: sg-01f1d2f60a3b4cf6d
- id: sg-0d2f88b2980aa5547

Version-Release number of selected component (if applicable):

registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-07-11-092038

How reproducible:

 Always

Steps to Reproduce:

1. As mentioned above
2.
3.

Actual results:

masters doesn't have custom security groups added

Expected results:

masters should have custom security groups added like workers

Additional info:

https://github.com/openshift/installer/pull/7352

Bug OCPBUGS-17151: Observed a panic: "invalid memory address or nil pointer dereference"

View the Description View the linked PRs

In Hypershift CI, we see nil deref panic

I0801 06:35:38.203019       1 controller.go:182] Assigning key: ip-10-0-132-175.ec2.internal to node workqueue
E0801 06:35:38.567021       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 195 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x28103a0?, 0x47a6400})
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00088f260?})
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x28103a0, 0x47a6400})
	/usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*AWS).getSubnet(0xc000c05220, 0xc000d760b0)
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/aws.go:266 +0x24a
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*AWS).GetNodeEgressIPConfiguration(0x0?, 0x31b8490?, {0x0, 0x0, 0x0})
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/aws.go:200 +0x185
github.com/openshift/cloud-network-config-controller/pkg/controller/node.(*NodeController).SyncHandler(0xc000d526e0, {0xc00005d7e0, 0x1c})
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/node/node_controller.go:129 +0x44f
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc00071f740, {0x25ff720?, 0xc00088f260?})
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc00071f740)
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(...)
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x318e140, 0xc0005aa1e0}, 0x1, 0xc0000c4ba0)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x25
created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x236d14a]

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn/1686255525022404608/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/e2e-clusters-m222b-example-85hhk/core/pods/logs/cloud-network-config-controller-6984cd6dcb-l7pcx-controller-previous.log

https://github.com/openshift/cloud-network-config-controller/blob/master/pkg/cloudprovider/aws.go#L266

Code does an unprotected deref of `networkInterface.SubnetId` which appears to be `nil`, which is probably why multiple subnets are returned in the first place.

https://github.com/openshift/cloud-network-config-controller/pull/120

Bug OCPBUGS-20115: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1736

Bug OCPBUGS-12456: MCO has duplicate RotateKubeletServerCertificate flags

View the Description View the linked PRs

Description of problem:


MCO has duplicate feature flags set for Kubelet causing errors on bringup.

{{code}}
I0421 15:32:04.308472    2135 codec.go:98] "Using lenient decoding as strict decoding failed" err=<
Apr 21 15:32:04 ip-10-0-156-156 kubenswrapper[2135]:         strict decoding error: yaml: unmarshal errors:
Apr 21 15:32:04 ip-10-0-156-156 kubenswrapper[2135]:           line 29: key "RotateKubeletServerCertificate" already set in map
Apr 21 15:32:04 ip-10-0-156-156 kubenswrapper[2135]:  >
{{code}}

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3686

Bug OCPBUGS-19523: sdn container failing to start on okd-scos

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19018~~. The following is the description of the original issue:
—
using metal-ipi on 4.14 the cluster is failing to come up,

the network cluster-operator is failing to start, the sdn pod shows the error

bash: RHEL_VERSION: unbound variable

https://github.com/openshift/cluster-network-operator/pull/2017

Bug OCPBUGS-2130: [vsphere] zone cluster installation fails if vSphere Cluster is embedded in Folder

View the Description View the linked PRs

Description of problem:

create new host and cluster folder qe-cluster under datacenter, and move cluster workloads into that folder.

$ govc find -type r
/OCP-DC/host/qe-cluster/workloads

using below install-config.yaml file to create single zone cluster.

apiVersion: v1
baseDomain: qe.devcluster.openshift.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: 
    vsphere:
      cpus: 4
      memoryMB: 8192
      osDisk:
        diskSizeGB: 60
      zones:
        - us-east-1
  replicas: 2
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    vsphere: 
      cpus: 4
      memoryMB: 16384 
      osDisk:
        diskSizeGB: 60
      zones:
        - us-east-1
  replicas: 3
metadata:
  name: jima-permission
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.19.46.0/24
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  vsphere:
    apiVIP: 10.19.46.99
    cluster: qe-cluster/workloads
    datacenter: OCP-DC
    defaultDatastore: my-nfs
    ingressVIP: 10.19.46.98
    network: "VM Network"
    username: administrator@vsphere.local
    password: xxx
    vCenter: xxx
    vcenters:
    - server: xxx
      user: administrator@vsphere.local
      password: xxx
      datacenters:
      - OCP-DC
    failureDomains:
    - name: us-east-1
      region: us-east
      zone: us-east-1a
      topology:
        datacenter: OCP-DC
        computeCluster: /OCP-DC/host/qe-cluster/workloads
        networks:
        - "VM Network"
        datastore: my-nfs
      server: xxx
pullSecret: xxx

installer get error:

$ ./openshift-install create cluster --dir ipi5 --log-level debug
DEBUG   Generating Platform Provisioning Check...  
DEBUG   Fetching Common Manifests...               
DEBUG   Reusing previously-fetched Common Manifests 
DEBUG Generating Terraform Variables...            
FATAL failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get vSphere network ID: could not find vSphere cluster at /OCP-DC/host//OCP-DC/host/qe-cluster/workloads: cluster '/OCP-DC/host//OCP-DC/host/qe-cluster/workloads' not found

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-10-05-053337

How reproducible:

always

Steps to Reproduce:

1. create new host/cluster folder under datacenter, and move vsphere cluster into that folder
2. prepare install-config with zone configuration
3. deploy cluster

Actual results:

fail to create cluster

Expected results:

succeed to create cluster

Additional info:

https://github.com/openshift/installer/pull/6973

Bug OCPBUGS-11369: CPMS e2e periodics tests timeout failures

View the Description View the linked PRs

Description of problem:

In the control plane machine set operator we perform e2e periodic tests that check the ability to do a rolling update of an entire OCP control plane.

This is a quite involved test as we need to drain and replace all the master machines/nodes, altering operators, waiting for machines to come up + bootstrap and nodes to drain and move their workloads to others while respecting PDBs, and etcd quorum.

As such we need to make sure we are robust to transient issues, occasionaly slow-downs and network errors.

We have investigated these timeout issues and identified some common culprits that we want to address, see: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1678966522151799

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/179

Bug OCPBUGS-13112: Control plane operator hangs during reconcile for a few minutes after/during infrastructure reconciliation

View the Description View the linked PRs

Description of problem:

CPO reconciliation loop hangs after "Reconciling infrastructure status"

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Frequently

Steps to Reproduce:

1.Create a HostedCluster with a recent 4.14 release image
2.Watch CPO logs
3.

Actual results:

Reconcile gets stuck

Expected results:

Reconcile happens fairly quickly

Additional info:

https://github.com/openshift/hypershift/pull/2522

Bug OCPBUGS-13922: Cluster upgrade failed waiting on network

View the Description View the linked PRs

Description of problem:

Cluster upgrade failure has been affecting three consecutive nightly payloads. 

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-20-041508
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-21-120836
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-22-035713

In all three cases, upgrade seems to fail waiting on network. Take this job as an example:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade/1660495736527130624

Cluster version operator complains about network operator has not finished upgrade:

I0522 07:12:58.540244       1 sync_worker.go:1149] Update error 684 of 845: ClusterOperatorUpdating Cluster operator network is updating versions (*errors.errorString: cluster operator network is available and not degraded but has not finished updating to target version)

This log can been seen in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade/1660495736527130624/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-5565f87cc6-6sjqf_cluster-version-operator.log

The network operator keeps waiting with the following log:
I0522 07:12:58.563312       1 connectivity_check_controller.go:166] ConnectivityCheckController is waiting for transition to desired version (4.14.0-0.nightly-2023-05-22-035713) to be completed.

This lasted over 2 hours. The log can be seen in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade/1660495736527130624/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/pods/openshift-network-operator_network-operator-6975b7b8ff-pdxzk_network-operator.log

Compared with a working job, there seems to be an error getting *v1alpha1.PodNetworkConnectivityCheck in the openshift-network-diagnostics_network-check-source:
W0522 04:34:18.527315       1 reflector.go:424] k8s.io/client-go@v12.0.0+incompatible/tools/cache/reflector.go:169: failed to list *v1alpha1.PodNetworkConnectivityCheck: the server could not find the requested resource (get podnetworkconnectivitychecks.controlplane.operator.openshift.io)
E0522 04:34:18.527391       1 reflector.go:140] k8s.io/client-go@v12.0.0+incompatible/tools/cache/reflector.go:169: Failed to watch *v1alpha1.PodNetworkConnectivityCheck: failed to list *v1alpha1.PodNetworkConnectivityCheck: the server could not find the requested resource (get podnetworkconnectivitychecks.controlplane.operator.openshift.io)

It is not clear whether this is really relevant. Also worth mentioning is that, every time when this problem happens, machine-config and dns also stuck with the older version. 

This has been affecting 4.14 nightly payload three times. If it shows more consistency, we might have to increase the severity of the bug. Please ping TRT if any more info is needed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1818

Bug OCPBUGS-18450: AWS Missing Base Permission

View the Description View the linked PRs

Description of problem:

During installation:

level=error msg=Error: reading Security Group (sg-0f07c871bdbd6379f) Rules: UnauthorizedOperation: You are not authorized to perform this operation.
level=error msg=	status code: 403, request id: f3e18ac0-f2fc-471f-8055-7194112c8225 

Users are unable to create the security groups for the bootstrap node

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Warning/Error should come up when the permission does not exist.

Additional info:

https://github.com/openshift/installer/pull/7460

Bug OCPBUGS-8093: OKD 4.13 fails on block volmod tests

View the Description View the linked PRs

Starting with https://amd64.origin.releases.ci.openshift.org/releasestream/4.13.0-0.okd/release/4.13.0-0.okd-2023-02-28-170012 multiple storage tests are failing:

  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with pvc data  source [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with pvc data  source [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with pvc data  source [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block]  [Testpattern: Pre-provisioned PV (block volmode)] volumes should store  data [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block]  [Testpattern: Pre-provisioned PV (block volmode)] volumes should store  data [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block]  [Testpattern: Pre-provisioned PV (block volmode)] volumes should store  data [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and write  from pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and write  from pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and write  from pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with snapshot  data source [Feature:VolumeSnapshotDataSource]  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with snapshot  data source [Feature:VolumeSnapshotDataSource]  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with snapshot  data source [Feature:VolumeSnapshotDataSource]  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume at the same time should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume at the same time should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume at the same time should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and read from  pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and read from  pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and read from  pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume one after the other should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume one after the other should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume one after the other should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV  (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV  (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV  (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern:  Pre-provisioned PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern:  Pre-provisioned PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more

cc Hemant Kumar

https://github.com/openshift/cluster-authentication-operator/pull/628

Bug OCPBUGS-8335: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5022

Bug ACM-7278: hcp with --secrets-creds provided still requires pull secret

View the Description View the linked PRs

Description of problem:

When we try to create a cluster with --secret-creds, an MCE AWS k8s secret that includes aws-creds, pull secret, and base domain, then the binary should not ask for pull secret. However, it does now after changing from hypershift.

Adding pull secret param will allow the command to continue as expected, though I would think whole point of the secret-creds is to reuse what exists.

 /usr/local/bin/hcp create cluster aws --name acmqe-hc-ad5b1f645d93464c --secret-creds test1-cred --region us-east-1 --node-pool-replicas 1 --namespace local-cluster --instance-type m6a.xlarge --release-image quay.io/openshift-release-dev/ocp-release:4.14.0-ec.4-multi --generate-ssh Output:
  Error: required flag(s) "pull-secret" not set
  required flag(s) "pull-secret" not set

Version-Release number of selected component (if applicable):

2.4.0-DOWNANDBACK-2023-08-31-13-34-02 or mce 2.4.0-137

hcp version openshift/hypershift: 8b4b52925d47373f3fe4f0d5684c88dc8a93368a. Latest supported OCP: 4.14.0

How reproducible:

always

Steps to Reproduce:

download hcp cli from mce
run hcp cluster create aws with valid secret-creds param
...

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3023

Bug OCPBUGS-13253: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-13061~~.

https://github.com/openshift/installer/pull/7176

Bug OCPBUGS-15927: Error page when fresh normal user visiting BuildConfigs page of 'default' project

View the Description View the linked PRs

Description of problem:

When fresh normal user visit BuildConfigs page of 'default' project, we can see error page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-05-191022

How reproducible:

Always

Steps to Reproduce:

1. normal user without any projects login to console 
2. switch to Admin perspective
3. Visit workloads page for 'default' project, for example
/k8s/ns/default/route.openshift.io~v1~Route
/k8s/ns/default/core~v1~Service
/k8s/ns/default/apps~v1~Deployment
/k8s/ns/default/build.openshift.io~v1~BuildConfig

Actual results:

3. We can see an error page when visiting BuildConfigs page

Expected results:

3. no error should be shown and show consistent info with other workloads page

Additional info:

https://github.com/openshift/console/pull/13091

Bug OCPBUGS-17341: OCP console mandate secret for repository creation

View the Description View the linked PRs

Description of problem:

Repository creation in console ask for a mandate secret, does not allow to create repository even for public git url which is weird. However it's working fine with ocp cli

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create repository crd via openshift console
2.
3.

Actual results:

It does not allow me to create the repository

Expected results:

We should be able to create repository crd

Additional info:

slack thread: https://redhat-internal.slack.com/archives/C6A3NV5J9/p1691057766516119

https://github.com/openshift/console/pull/13084

Bug OCPBUGS-18097: OCP 4.14 increased rate of patch nodes requests from node SAs

View the Description View the linked PRs

Description of problem:

With 120+ node clusters, we are seeing O(10) larger rate of patch node requests coming from node service accounts.  These higher rate of updates are causing issues where "nodes" watchers are being terminated, causing storm of watch requests that increases CPU load on the cluster.

What I see is node resourceVersions are incremented rapidly and in large bursts and watchers are terminated as a result.

Version-Release number of selected component (if applicable):

4.14.0-ec.4
4.14.0-0.nightly-2023-08-08-222204
4.13.0-0.nightly-2023-08-10-021434

How reproducible:

Repeatable

Steps to Reproduce:

1. Create 4.14 cluster with 120 nodes with m5.8xlarge control plane and c5.4xlarge workers.
2. Run `oc get nodes -w -o custom-columns='NAME:.metadata.name,RV:.metadata.resourceVersion' ` 
3. Wait for a big chunk of nodes to be updated and observe the watch terminate.
4. Optionally run `kube-burner ocp node-density-cni --pods-per-node=100` to generate some load.

Actual results:

kube-apiserver audit events show >1500 node patch requests from a single node SA in a certain amount of time:
   1678 ["system:node:ip-10-0-69-142.us-west-2.compute.internal",null]
   1679 ["system:node:ip-10-0-33-131.us-west-2.compute.internal",null]
   1709 ["system:node:ip-10-0-41-44.us-west-2.compute.internal",null]

Observe that apiserver_terminated_watchers_total{resource="nodes"} starts to increment before 120 node scaleup is even complete.

Expected results:

patch requests in certain amount of time are more aligned with what we see on 4.13*08-10* nightly:
     57 ["system:node:ip-10-0-247-122.us-west-2.compute.internal",null]
     62 ["system:node:ip-10-0-239-217.us-west-2.compute.internal",null]
     63 ["system:node:ip-10-0-165-255.us-west-2.compute.internal",null]
     64 ["system:node:ip-10-0-136-122.us-west-2.compute.internal",null]

Observe that apiserver_terminated_watchers_total{resource="nodes"} does not increment.

Observe that rate of mutating node requests levels off after nodes are created.

Additional info:

Suspecting these updates coming from nodes could be part of a response to the MCO controllerconfigs resource being updated every few minutes or more frequently.

One of the suspected causes of increased kube-apiserer CPU usage investigation of ovn-ic.

https://github.com/openshift/machine-config-operator/pull/3891

Task MGMT-15126: Add missing incompatible features in some of the feature-support feature

View the linked PRs

https://github.com/openshift/assisted-service/pull/5327

Bug OCPBUGS-12362: structured logs are borked in BMO

View the Description View the linked PRs

An upstream partial fix to logging means that the BMO log now contains a mixture of structured and unstructured logs, making it impossible to read with the structured log parsing tool (bmo-log-parse) we use for debugging customer issues.
This is fixed upstream by https://github.com/metal3-io/baremetal-operator/pull/1249, which will get picked up automatically in 4.14 but which needs to be backported to 4.13.

https://github.com/openshift/baremetal-operator/pull/274

Bug OCPBUGS-13300: Converge the masters to use only one ServerGroup

View the Description View the linked PRs

Description of problem:

Currrently, only one ServerGroup is created in OpenStack when 3 masters on 3 AZs are deployed while 3 should have been created (one per AZ). With the work on CPMS, we made the decision to only create one ServerGroup for the masters. However, this will require a change in the installer to reflect this decision.
Indeed, when specifying AZs, the master machines would have their own ServerGroup, while only one actually existed in OpenStack. This was a mistake but instead of fixing that bug, we'll change the behaviour to have only one ServerGroup for masters.

Version-Release number of selected component (if applicable):

latest (4.14)

How reproducible: deploy a control plane with 3 failure domains:

controlPlane:
  name: master
  platform:
    openstack:
      type: m1.xlarge
      failureDomains:
      - computeAvailabilityZone: az0
      - computeAvailabilityZone: az1
      - computeAvailabilityZone: az2

Steps to Reproduce:

1. Deploy the control plane in 3 AZ
2. List OpenStack Compute Server Groups

Actual results:

+--------------------------------------+--------------------------+--------------------+
| ID                                   | Name                     | Policy             |
+--------------------------------------+--------------------------+--------------------+
| 0750c579-d2cf-41b3-9e88-003dcbcad0c5 | refarch-jkn8g-master-az0 | soft-anti-affinity |
| 05715c08-ac2b-439d-9bd5-5803ac40c322 | refarch-jkn8g-worker     | soft-anti-affinity |
+--------------------------------------+--------------------------+--------------------+

Expected results without our work on CPMS:

refarch-jkn8g-master-az1 and refarch-jkn8g-master-az2 should have been created.

This expectation is purely for documentation, QE should ignore it.

Expected results with our work on CPMS (which should be taken in account by QE when testing CPMS):

refarch-jkn8g-master-az0 should not exist, and the ServerGroup should be named refarch-jkn8g-master.
All the masters should use that ServerGroup in both the Nova instance properties and in the MachineSpec once the machines are enrolled by CCPMSO.

https://github.com/openshift/installer/pull/7172

Bug OCPBUGS-18065: [Seems a release blocker] 4.14 nightly HyperShift hosted cluster aws-pod-identity does not work

View the Description View the linked PRs

Description of problem:

4.14 nightly HyperShift hosted cluster aws-pod-identity does not work. Pods are not injected env vars AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE.

In 4.13 HyperShift hosted cluster, it works well, see Additional info.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1.
$ export KUBECONFIG=/path/to/hypershift-hosted-cluster/kubeconfig
$ ogcv
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-08-11-055332   True        False         8h      Cluster version is 4.14.0-0.nightly-2023-08-11-055332
$ oc get mutatingwebhookconfigurations --context admin
NAME               WEBHOOKS   AGE
aws-pod-identity   1          6h5m

$ oc get --raw=/.well-known/openid-configuration | jq -r '.issuer'
https://xxxx.s3.us-east-2.amazonaws.com/hypershift-xxxx

2.
$ oc new-project xxia-proj
$ oc create sa aws-provider
serviceaccount/aws-provider created

3.
$ ccoctl aws create-iam-roles --name=xxia --region=$REGION --credentials-requests-dir=credentialsrequest-dir-aws --identity-provider-arn=arn:aws:iam::xxxx:oidc-provider/xxxx.s3.us-east-2.amazonaws.com/hypershift-xxxx --output-dir=credrequests-ccoctl-output
2023/08/24 17:54:32 Role arn:aws:iam::xxxx:role/xxia-xxia-proj-aws-creds created
2023/08/24 17:54:32 Saved credentials configuration to: credrequests-ccoctl-output/manifests/xxia-proj-aws-creds-credentials.yaml
2023/08/24 17:54:32 Updated Role policy for Role xxia-xxia-proj-aws-creds

4.
$ oc annotate sa/aws-provider eks.amazonaws.com/role-arn="arn:aws:iam::xxxx:role/xxia-xxia-proj-aws-creds"
$ oc create deployment aws-cli --image=amazon/aws-cli --dry-run=client -o yaml -- sleep 360d | sed "/containers/i \      serviceAccountName: aws-provider" | oc create -f -
deployment.apps/aws-cli created
$ oc get po
NAME                               READY   STATUS              RESTARTS   AGE
aws-cli-5c4f6d7d5b-g6d5v           1/1     Running             0          18s

5.
$ oc rsh aws-cli-5c4f6d7d5b-g6d5v
sh-4.2$ env | grep AWS
sh-4.2$ ls /var/run/secrets/eks.amazonaws.com/serviceaccount/token
ls: cannot access /var/run/secrets/eks.amazonaws.com/serviceaccount/token: No such file or directory
sh-4.2$ exit
command terminated with exit code 1

Actual results:

5. No AWS env vars.

Expected results:

5. Should have AWS env vars.

Additional info:

In 4.13 HyperShift hosted cluster, it works well:

1.
$ ogcv    
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-08-11-101506   True        False         10h     Cluster version is 4.13.0-0.nightly-2023-08-11-101506
$ oc get --raw=/.well-known/openid-configuration | jq -r '.issuer'
https://aos-xxxx.s3.us-east-2.amazonaws.com/xxxx
$ oc get no                       
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-139-76.us-east-2.compute.internal   Ready    worker   10h   v1.26.6+6bf3f75
...
$ REGION=us-east-2

2.
$ oc new-project xxia-proj
$ oc create sa aws-provider

3.
$ ccoctl aws create-iam-roles --name=xxia-test --region=$REGION --credentials-requests-dir=credentialsrequest-dir-aws --identity-provider-arn=arn:aws:iam::xxxx:oidc-provider/aos-xxxx.s3.us-east-2.amazonaws.com/xxxx --output-dir=credrequests-ccoctl-output
2023/08/24 20:06:53 Role arn:aws:iam::xxxx:role/xxia-test-xxia-proj-aws-creds created 
2023/08/24 20:06:53 Saved credentials configuration to: credrequests-ccoctl-output/manifests/xxia-proj-aws-creds-credentials.yaml
2023/08/24 20:06:53 Updated Role policy for Role xxia-test-xxia-proj-aws-creds

4.
$ oc annotate sa/aws-provider eks.amazonaws.com/role-arn="arn:aws:iam::xxxx:role/xxia-test-xxia-proj-aws-creds"
$ oc create deployment aws-cli --image=amazon/aws-cli --dry-run=client -o yaml -- sleep 360d | sed "/containers/i \      serviceAccountName: aws-provider" | oc create -f -
$ oc get pod               
NAME                       READY   STATUS    RESTARTS   AGE
aws-cli-84875995cc-svszl   1/1     Running   0          16s

5.
$ oc rsh aws-cli-84875995cc-svszl
sh-4.2$ env | grep AWS
AWS_ROLE_ARN=arn:aws:iam::xxxx:role/xxia-test-xxia-proj-aws-creds
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_DEFAULT_REGION=us-east-2
AWS_REGION=us-east-2

https://github.com/openshift/hypershift/pull/2957

Bug OCPBUGS-15282: Network Operator not setting its version and blocking upgrade completion

View the Description View the linked PRs

Description of problem:

When upgrading a 4.11.33 cluster to 4.12.21, the Cluster Version Operator is stuck waiting for the Network Operator to update:

$ omc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.43   True        True          14m     Working towards 4.12.21: 672 of 831 done (80% complete), waiting on network

CVO pod log states:

2023-06-16T12:07:22.596127142Z I0616 12:07:22.596023       1 metrics.go:490] ClusterOperator network is not setting the 'operator' version

Indeed the NO version is empty:

$ omc get co network -o json|jq '.status.versions'
null

However, it's status is available and not progressing, not degraded: 

$ omc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
network             True        False         False      19m
   
Network operator pod log states:

2023-06-16T12:08:56.542287546Z I0616 12:08:56.542271       1 connectivity_check_controller.go:138] ConnectivityCheckController is waiting for transition to desired version (4.12.21) to be completed.
2023-06-16T12:04:40.584407589Z I0616 12:04:40.584349       1 ovn_kubernetes.go:1437] OVN-Kubernetes master and node already at release version 4.12.21; no changes required

The Network Operator pod, however, has the version correctly:
$ omc get pods -n openshift-network-operator -o jsonpath='{.items[].spec.containers[0].env[?(@.name=="RELEASE_VERSION")]}'|jq
{
  "name": "RELEASE_VERSION",
  "value": "4.12.21"
}

Restarts of the related pods had no effect.  I have trace logs of the Network Operator available.  It looked like it might be related to https://github.com/openshift/cluster-network-operator/pull/1818 but that looks to be code introduced in 4.14.

Version-Release number of selected component (if applicable):

How reproducible:

I have not reproduced.

Steps to Reproduce:

1.  Cluster version began at stable 4.10.56
2.  Upgraded to 4.11.43 successfully
3.  Upgraded to 4.12.21 and is stuck.

Actual results:

CVO Stuck waiting on NO to complete, NO

Expected results:

NO to update its version so the CVO can continue.

Additional info:

Bare Metal IPI cluster with OVN Networking.

https://github.com/openshift/cluster-network-operator/pull/1851

Bug OCPBUGS-19748: CI: MTU migraton failures in 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18396~~. The following is the description of the original issue:
—
CI is almost perma failing on mtu migration in 4.14 (both SDN and OVN-Kubernetes):

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-ovn-ipv4

Looks like the common issue is waiting for MCO times out:

+ echo '[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...'
[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...
+ timeout 900s bash
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO 
...

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1979/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4/1697077984948654080/build-log.txt

https://github.com/openshift/cluster-network-operator/pull/2034

Bug OCPBUGS-13314: [vmware csi driver] vsphere-syncher does not retry populate the CSINodeTopology with topology information when registration fails

View the Description View the linked PRs

Description of problem:

[vmware csi driver] vsphere-syncher does not retry populate the CSINodeTopology with topology information when registration fails

When syncer starts it watches for node events, but it does not retry if registration fails and in the meanwhile any csinodetopoligy requests might not get served, because VM is not found

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-05-04-090524

How reproducible:

Randomly

Steps to Reproduce:

1. Install OCP cluster by UPI with encrypt 
2. Check the cluster storage operator not degrade

Actual results:

cluster storage operator degrade that VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods 

...
2023-05-09T06:06:22.146861934Z I0509 06:06:22.146850       1 main.go:183] ServeMux listening at "0.0.0.0:10300"
2023-05-09T06:07:00.283007138Z E0509 06:07:00.282912       1 main.go:64] failed to establish connection to CSI driver: context canceled
2023-05-09T06:07:07.283109412Z W0509 06:07:07.283061       1 connection.go:173] Still connecting to unix:///csi/csi.sock
...

# Many error logs in csi driver related timed out while waiting for topology labels to be updated in \"compute-2\" CSINodeTopology instance .

...
2023-05-09T06:19:16.499856730Z {"level":"error","time":"2023-05-09T06:19:16.499687071Z","caller":"k8sorchestrator/topology.go:837","msg":"timed out while waiting for topology labels to be updated in \"compute-2\" CSINodeTopology instance.","TraceId":"b8d9305e-9681-4eba-a8ac-330383227a23","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:837\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/node.go:429\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:6231\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:922"}
...

Expected results:

Install vsphere ocp cluster succeed and the cluster storage operator is healthy

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver/pull/78

Bug OCPBUGS-9404: IPI Azure internal (User Defined Routing) clusters create purposeless standard load balancer

View the Description View the linked PRs

Version:

$ openshift-install version
./openshift-install 4.11.0-0.nightly-2022-07-13-131410
built from commit cdb9627de7efb43ad7af53e7804ddd3434b0dc58
release image registry.ci.openshift.org/ocp/release@sha256:c5413c0fdd0335e5b4063f19133328fee532cacbce74105711070398134bb433
release architecture amd64

Platform:

Azure IPI

What happened?
When one creates an IPI Azure cluster with an `internal` publishing method, it creates a standard load balancer with an empty definition. This load balancer doesn't serve a purpose as far as I can tell since the configuration is completely empty. Because it doesn't have a public IP address and backend pools it's not providing any outbound connectivity, and there are no frontend IP configurations for ingress connectivity to the cluster.

Below is the ARM template that is deployed by the installer (through terraform)

```
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"loadBalancers_mgahagan411_7p82n_name":

{ "defaultValue": "mgahagan411-7p82n", "type": "String" }

},
"variables": {},
"resources": [
{
"type": "Microsoft.Network/loadBalancers",
"apiVersion": "2020-11-01",
"name": "[parameters('loadBalancers_mgahagan411_7p82n_name')]",
"location": "northcentralus",
"sku":

{ "name": "Standard", "tier": "Regional" }

,
"properties":

{ "frontendIPConfigurations": [], "backendAddressPools": [], "loadBalancingRules": [], "probes": [], "inboundNatRules": [], "outboundRules": [], "inboundNatPools": [] }

}
]
}
```

What did you expect to happen?

Don't create the standard load balancer on an internal Azure IPI cluster (as it appears to serve no purpose)

How to reproduce it (as minimally and precisely as possible)?
1. Create an IPI cluster with the `publish` installation config set to `Internal` and the `outboundType` set to `UserDefinedRouting`.
```
apiVersion: v1
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
platform:
azure: {}
replicas: 3
compute:

architecture: amd64
hyperthreading: Enabled
name: worker
platform:
azure: {}
replicas: 3
metadata:
name: mgahaganpvt
platform:
azure:
region: northcentralus
baseDomainResourceGroupName: os4-common
outboundType: UserDefinedRouting
networkResourceGroupName: mgahaganpvt-rg
virtualNetwork: mgahaganpvt-vnet
controlPlaneSubnet: mgahaganpvt-master-subnet
computeSubnet: mgahaganpvt-worker-subnet
pullSecret: HIDDEN
networking:
clusterNetwork:
cidr: 10.128.0.0/14
hostPrefix: 23
serviceNetwork:
172.30.0.0/16
machineNetwork:
cidr: 10.0.0.0/16
networkType: OpenShiftSDN
publish: Internal
proxy:
httpProxy: http://proxy-user1:password@10.0.0.0:3128
httpsProxy: http://proxy-user1:password@10.0.0.0:3128
baseDomain: qe.azure.devcluster.openshift.com
```

2. Show the json content of the standard load balancer is completely empty
`az network lb show -g myResourceGroup -n myLbName`

```
{
"name": "mgahagan411-7p82n",
"id": "/subscriptions/00000000-0000-0000-00000000/resourceGroups/mgahagan411-7p82n-rg/providers/Microsoft.Network/loadBalancers/mgahagan411-7p82n",
"etag": "W/\"40468fd2-e56b-4429-b582-6852348b6a15\"",
"type": "Microsoft.Network/loadBalancers",
"location": "northcentralus",
"tags": {},
"properties":

{ "provisioningState": "Succeeded", "resourceGuid": "6fb11ec9-d89f-4c05-b201-a61ea8ed55fe", "frontendIPConfigurations": [], "backendAddressPools": [], "loadBalancingRules": [], "probes": [], "inboundNatRules": [], "inboundNatPools": [] }

,
"sku":

{ "name": "Standard" }

}
```

https://github.com/openshift/installer/pull/7063

Task HOSTEDCP-987: Update go version and dependencies in /hack/tools/go.mod

View the Description View the linked PRs

As a developer, I would like to make sure we are using the latest versions of the dependencies we utilize in the /hack/tools/go.mod file.

https://github.com/openshift/hypershift/pull/2551

Bug OCPBUGS-1062: "remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs

View the Description View the linked PRs

Description of problem:

4.12.0-0.nightly-2022-09-08-114806 AWS cluster, "remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs, should be a regression issue, no such issue in 4.11 and the defect does not block the function, it seems it's from AWS

$ oc -n openshift-monitoring get pod | grep prometheus-operator-admission-webhook
prometheus-operator-admission-webhook-7d8fd8b5bb-kjh4f   1/1     Running   0          3h
prometheus-operator-admission-webhook-7d8fd8b5bb-whl5n   1/1     Running   0          3h

$ oc -n openshift-monitoring logs prometheus-operator-admission-webhook-7d8fd8b5bb-kjh4f
level=info ts=2022-09-08T23:32:53.782445094Z caller=main.go:130 address=[::]:8443 msg="Starting TLS enabled server"
ts=2022-09-08T23:33:09.057366056Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52820: remote error: tls: bad certificate"
ts=2022-09-08T23:33:10.071639453Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52830: remote error: tls: bad certificate"
ts=2022-09-08T23:33:12.07959313Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52842: remote error: tls: bad certificate"
ts=2022-09-08T23:33:31.729332249Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39188: remote error: tls: bad certificate"
ts=2022-09-08T23:33:32.7374936Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39196: remote error: tls: bad certificate"
ts=2022-09-08T23:33:34.745945871Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39206: remote error: tls: bad certificate"
ts=2022-09-08T23:33:57.460069283Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37500: remote error: tls: bad certificate"
ts=2022-09-08T23:33:58.469984958Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37508: remote error: tls: bad certificate"
ts=2022-09-08T23:34:00.479578826Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:40948: remote error: tls: bad certificate"
ts=2022-09-08T23:36:22.861562723Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:53866: remote error: tls: bad certificate"
ts=2022-09-08T23:36:24.870186206Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:53882: remote error: tls: bad certificate"
ts=2022-09-08T23:39:43.613375962Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:38780: remote error: tls: bad certificate"
ts=2022-09-08T23:39:45.621205524Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:38792: remote error: tls: bad certificate"
ts=2022-09-08T23:46:03.653578785Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:57878: remote error: tls: bad certificate"
ts=2022-09-08T23:46:05.662237056Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:57890: remote error: tls: bad certificate"
ts=2022-09-08T23:49:08.643599472Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:48340: remote error: tls: bad certificate"
ts=2022-09-08T23:52:08.809838473Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:51682: remote error: tls: bad certificate"
ts=2022-09-08T23:52:09.817050146Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:51698: remote error: tls: bad certificate"
ts=2022-09-08T23:55:11.862993344Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54280: remote error: tls: bad certificate"
ts=2022-09-08T23:58:15.820629264Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:59462: remote error: tls: bad certificate"
ts=2022-09-09T00:01:17.913920461Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:47320: remote error: tls: bad certificate"
ts=2022-09-09T00:04:21.086495988Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52438: remote error: tls: bad certificate"
ts=2022-09-09T00:07:24.050365477Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55148: remote error: tls: bad certificate"
ts=2022-09-09T00:07:27.066559749Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55168: remote error: tls: bad certificate"
ts=2022-09-09T00:10:28.193017562Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42222: remote error: tls: bad certificate"
ts=2022-09-09T00:10:30.201598245Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:59802: remote error: tls: bad certificate"
ts=2022-09-09T00:13:30.282592276Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45648: remote error: tls: bad certificate"
ts=2022-09-09T00:13:31.290450933Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45654: remote error: tls: bad certificate"
ts=2022-09-09T00:13:33.298604517Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45668: remote error: tls: bad certificate"
ts=2022-09-09T00:16:33.274732648Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56710: remote error: tls: bad certificate"
ts=2022-09-09T00:19:39.47117325Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54978: remote error: tls: bad certificate"
ts=2022-09-09T00:25:43.708275724Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54638: remote error: tls: bad certificate"
ts=2022-09-09T00:28:46.627225713Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:58124: remote error: tls: bad certificate"
ts=2022-09-09T00:28:48.63515681Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39454: remote error: tls: bad certificate"
ts=2022-09-09T00:31:51.728153893Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56894: remote error: tls: bad certificate"
ts=2022-09-09T00:34:52.775067246Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:34884: remote error: tls: bad certificate"
ts=2022-09-09T00:41:00.843743907Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:41784: remote error: tls: bad certificate"
ts=2022-09-09T00:44:00.933970145Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:36150: remote error: tls: bad certificate"
ts=2022-09-09T00:44:03.949135311Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:36166: remote error: tls: bad certificate"
ts=2022-09-09T00:47:03.97630552Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44732: remote error: tls: bad certificate"
ts=2022-09-09T00:47:06.991580657Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44748: remote error: tls: bad certificate"
ts=2022-09-09T00:50:08.31637565Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54092: remote error: tls: bad certificate"
ts=2022-09-09T00:53:11.264559449Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:43144: remote error: tls: bad certificate"
ts=2022-09-09T00:59:16.306282415Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39864: remote error: tls: bad certificate"
ts=2022-09-09T00:59:17.314074479Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39878: remote error: tls: bad certificate"
ts=2022-09-09T00:59:19.32313415Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56104: remote error: tls: bad certificate"
ts=2022-09-09T01:08:25.613927992Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44280: remote error: tls: bad certificate"
ts=2022-09-09T01:08:26.622625145Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44290: remote error: tls: bad certificate"
ts=2022-09-09T01:08:28.631034721Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:48838: remote error: tls: bad certificate"
ts=2022-09-09T01:11:28.704732265Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37372: remote error: tls: bad certificate"
ts=2022-09-09T01:11:31.723552093Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37392: remote error: tls: bad certificate"
ts=2022-09-09T01:17:34.794690109Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46750: remote error: tls: bad certificate"
ts=2022-09-09T01:17:35.803918438Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46752: remote error: tls: bad certificate"
ts=2022-09-09T01:17:37.812700046Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46768: remote error: tls: bad certificate"
ts=2022-09-09T01:20:38.79326772Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:53880: remote error: tls: bad certificate"
ts=2022-09-09T01:23:41.073187846Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46086: remote error: tls: bad certificate"
ts=2022-09-09T01:23:44.088529273Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46090: remote error: tls: bad certificate"
ts=2022-09-09T01:26:44.077154097Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54234: remote error: tls: bad certificate"
ts=2022-09-09T01:26:45.085277729Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54248: remote error: tls: bad certificate"
ts=2022-09-09T01:26:47.092797767Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54254: remote error: tls: bad certificate"
ts=2022-09-09T01:29:48.255127155Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39536: remote error: tls: bad certificate"
ts=2022-09-09T01:29:50.263225272Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56030: remote error: tls: bad certificate"
ts=2022-09-09T01:32:51.618334928Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42836: remote error: tls: bad certificate"
ts=2022-09-09T01:32:53.627565113Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42844: remote error: tls: bad certificate"
ts=2022-09-09T01:35:56.945306145Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:57828: remote error: tls: bad certificate"
ts=2022-09-09T01:38:57.721110974Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54038: remote error: tls: bad certificate"
ts=2022-09-09T01:41:59.901865996Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46096: remote error: tls: bad certificate"
ts=2022-09-09T01:42:00.903596845Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46102: remote error: tls: bad certificate"
ts=2022-09-09T01:45:03.034044637Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55868: remote error: tls: bad certificate"
ts=2022-09-09T01:45:04.042270514Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55874: remote error: tls: bad certificate"
ts=2022-09-09T01:45:06.05067642Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55888: remote error: tls: bad certificate"
ts=2022-09-09T01:48:06.178001976Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56024: remote error: tls: bad certificate"
ts=2022-09-09T01:48:09.192075072Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37562: remote error: tls: bad certificate"
ts=2022-09-09T01:51:10.203900665Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:33016: remote error: tls: bad certificate"
ts=2022-09-09T01:51:12.212458619Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:33022: remote error: tls: bad certificate"
ts=2022-09-09T01:54:13.294550312Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:38042: remote error: tls: bad certificate"
ts=2022-09-09T01:57:15.292731466Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:43838: remote error: tls: bad certificate"
ts=2022-09-09T02:00:19.408152102Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42838: remote error: tls: bad certificate"
ts=2022-09-09T02:00:21.41717724Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42842: remote error: tls: bad certificate"
ts=2022-09-09T02:03:21.342937844Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55026: remote error: tls: bad certificate"
ts=2022-09-09T02:03:22.350450637Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55034: remote error: tls: bad certificate"
ts=2022-09-09T02:06:25.421123942Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:34882: remote error: tls: bad certificate"
ts=2022-09-09T02:06:27.428721002Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:34884: remote error: tls: bad certificate"
ts=2022-09-09T02:09:28.541378288Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52888: remote error: tls: bad certificate"
ts=2022-09-09T02:12:31.610427648Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:47430: remote error: tls: bad certificate"
ts=2022-09-09T02:12:33.618581498Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:47434: remote error: tls: bad certificate"
ts=2022-09-09T02:15:33.601606956Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37706: remote error: tls: bad certificate"
ts=2022-09-09T02:15:36.617807944Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37730: remote error: tls: bad certificate"
ts=2022-09-09T02:18:37.815046583Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45066: remote error: tls: bad certificate"
ts=2022-09-09T02:18:39.822858743Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39614: remote error: tls: bad certificate"
ts=2022-09-09T02:21:40.885368415Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42250: remote error: tls: bad certificate"

Version-Release number of selected component (if applicable):

"remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs

How reproducible:

always

Steps to Reproduce:

1. check prometheus-operator-admission-webhook logs.

Actual results:

"remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs

Expected results:

no error logs

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2065

Bug OCPBUGS-17038: Port 9447 is exposed with a weak cipher and TLS 1.0/TLS 1.1

View the Description View the linked PRs

Description of problem:


Facing the same issue as JIRA[1] in OCP 4.12 and for the backport this bug solution to the OCP 4.12

JIRA[1]: https://issues.redhat.com/browse/OCPBUGS-14064

As port 9447 is exposed from the cluster in one of the control nodes and is using weak cipher and TLS 1.0/ TLS 1.1 , this is incompatible with the security standards for our product release. Either we should be able to disable this port or update the cipher and TLS version as the fix for meeting the security standards as you are aware TLS 1.0 & TLS 1.1 are pretty old and deprecated already.

we confirmed that fips were enabled during cluster deployment by passing the key-value pair in the config file."~~~
fips: true

On JIRA[1] it is suggested to open a separate Bug for backporting.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-operator/pull/293

Bug OCPBUGS-18086: controller logs are unnecessarily noisy

View the Description View the linked PRs

controller: Drop noisy log message about certificates

I often turn to the controller pod logs to debug issues, and
this log message is repeated very often. While it was
probably useful at the time the feature was being developed/tested
I doubt it will be necessary in the future.

In the end, the status really is the debugging frontend I believe.

controller: Drop noisy BaseOSContainerImage log message

In general we should avoid logging unless something changed.
I don't believe we need this log message, we can detect OS
changes from e.g. the MCD logs.

https://github.com/openshift/machine-config-operator/pull/3886

Bug OCPBUGS-19020: Unstable node internal IP causes connection errors for KubeVirt platform

View the Description View the linked PRs

Description of problem:

The HyperShift KubeVirt (openshift virtualization) platform has worker nodes that are hosted by KubeVirt virtual  machines. The worker node's internal IP address is interpreted by inspecting the kubevirt vmi's vmi.status.interface field.

Due to the way the vmi.status.interface field sources its information from the qemu guest agent, that field is not guaranteed to remain static in some scenarios, such as soft reboot or when the qemu agent is temporarily unavailable. During these situations, the interfaces list will be empty.

When the interfaces list is empty on the vmi, there are Hypershift related components (cloud-provider-kubevirt and cluster-api-provider-kubevirt) which strip the worker nodes internal IP. This stripping of the node's internal IP causes unpredictable behavior that results in connectivity failures from the KAS to the worker node kubelets.

To address this, the Hypershift related kubevirt components need to only update the Internal IP of worker nodes when the vmi.status.interfaces list has an IP for the default interface. Othewise these hypershift components should use the last known internal IP address rather than stripping the internal IP address from the node.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100% given enough time and the right environment.

Steps to Reproduce:

1. create a hypershift kubevirt guest cluster
2. run the csi conformance test suite in a loop (this test suite causes the vmi.status.interfaces list to become unstable briefly at times)

Actual results:

the csi test suite will eventually begin failing due to inabiilty to pod exec into worker node pods. This is caused by the node's internal IP being removed.

Expected results:

csi conformance should pass reliably

Additional info:

https://github.com/openshift/cloud-provider-kubevirt/pull/26

Story OTA-941: Require forcing to get the cluster-version operator to accept rollbacks

View the Description View the linked PRs

We have occasional cases where admins attempt a rollback, despite long-standing docs:

Only upgrading to a newer version is supported. Reverting or rolling back your cluster to a previous version is not supported. If your update fails, contact Red Hat support.

Deeper history for that content here, here, and here. We could refuse to accept rollbacks unless the administrator sets Force to waive our guards.

https://github.com/openshift/cluster-version-operator/pull/918

Task HOSTEDCP-979: Re-enable nodepool in-place upgrade tests

View the Description View the linked PRs

From wking:

$ git --no-pager grep ~~OCPBUGS-10218~~
test/e2e/nodepool_test.go: // TODO: (csrwng) Re-enable when https://issues.redhat.com/browse/OCPBUGS-10218
is fixed
test/e2e/nodepool_test.go: // TODO: (jparrill) Re-enable when https://issues.redhat.com/browse/OCPBUGS-10218
is fixed
but https://issues.redhat.com/browse/OCPBUGS-10218 was closed as a dup of https://issues.redhat.com/browse/OCPBUGS-10485 , and ~~OCPBUGS-10485~~ is Verified with happy sounds for both 4.13 and 4.14 nightlies

https://github.com/openshift/hypershift/pull/2960

Bug OCPBUGS-10116: Update 4.14 ose-ibm-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/48

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/48

Bug OCPBUGS-16433: Horizontal Nav component is not updating the selected tab

View the Description View the linked PRs

Description of problem:

When working with Horizontal Nav the component doesn't re-render when location changes. Currently it only updates itself when basePath changes. The location change based re-render was triggered by withRouter HoC previously but was recently removed.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

1/1

Steps to Reproduce:

1. Go to Storage -> ODF (version 4.13-pre-release)
2. Click on Storage System Tab and then Topology tab
3.

Actual results:

The selected tab doesn't get highlighted as active tab.

Expected results:

The selected tab should have the active blue color.

Additional info:

https://github.com/openshift/console/pull/13023

Bug OCPBUGS-14756: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/679

Bug OCPBUGS-18980: spec.containers.image is empty when use 'oc new-app' created deploy when build/deploymentconfig are not installed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18498~~. The following is the description of the original issue:
—
Description of problem:

If not installed capability operator build and deploymentconfig, when use `oc new-app registry.redhat.io/<namespace>/<image>:<tag>` , the created deployment emptied spec.containers[0].image. The deploy will fail to start pod.

Version-Release number of selected component (if applicable):

oc version
Client Version: 4.14.0-0.nightly-2023-08-22-221456
Kustomize Version: v5.0.1
Server Version: 4.14.0-0.nightly-2023-09-02-132842
Kubernetes Version: v1.27.4+2c83a9f

How reproducible:

Always

Steps to Reproduce:

1. Installed cluster without build/deploymentconfig function
Set "baselineCapabilitySet: None" in install-config
2.Create a deploy using 'new-app' cmd
oc new-app registry.redhat.io/ubi8/httpd-24:latest
3.

Actual results:

2.
$oc new-app registry.redhat.io/ubi8/httpd-24:latest
--> Found container image c412709 (11 days old) from registry.redhat.io for "registry.redhat.io/ubi8/httpd-24:latest"    Apache httpd 2.4
    ----------------
    Apache httpd 2.4 available as container, is a powerful, efficient, and extensible web server. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Virtual hosting allows one Apache installation to serve many different Web sites.    Tags: builder, httpd, httpd-24    * An image stream tag will be created as "httpd-24:latest" that will track this image--> Creating resources ...
    imagestream.image.openshift.io "httpd-24" created
    deployment.apps "httpd-24" created
    service "httpd-24" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose service/httpd-24'
    Run 'oc status' to view your app

3. oc get deploy -o yaml
 apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "1"
      image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"httpd-24:latest"},"fieldPath":"spec.template.spec.containers[?(@.name==\"httpd-24\")].image"}]'
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: "2023-09-04T07:44:01Z"
    generation: 1
    labels:
      app: httpd-24
      app.kubernetes.io/component: httpd-24
      app.kubernetes.io/instance: httpd-24
    name: httpd-24
    namespace: wxg
    resourceVersion: "115441"
    uid: 909d0c4e-180c-4f88-8fb5-93c927839903
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        deployment: httpd-24
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        annotations:
          openshift.io/generated-by: OpenShiftNewApp
        creationTimestamp: null
        labels:
          deployment: httpd-24
      spec:
        containers:
        - image: ' '
          imagePullPolicy: IfNotPresent
          name: httpd-24
          ports:
          - containerPort: 8080
            protocol: TCP
          - containerPort: 8443
            protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Created new replica set "httpd-24-7f6b55cc85"
      reason: NewReplicaSetCreated
      status: "True"
      type: Progressing
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: 'Pod "httpd-24-7f6b55cc85-pvvgt" is invalid: spec.containers[0].image:
        Invalid value: " ": must not have leading or trailing whitespace'
      reason: FailedCreate
      status: "True"
      type: ReplicaFailure
    observedGeneration: 1
    unavailableReplicas: 1
kind: List
metadata:

Expected results:

Should set spec.containers[0].image to registry.redhat.io/ubi8/httpd-24:latest

Additional info:

Story MGMT-14769: Enable upgrade agent by default for ACM

View the Description View the linked PRs

Currently the upgrade feature agent is disabled by default and enabled explicitly only for the SaaS environment. This ticket is about enabling it by default also for ACM.

https://github.com/openshift/assisted-service/pull/5276

Bug OCPBUGS-14874: Helm Chart installation form hangs on create if JSON-schema is using 2019-09 or 2020-20 standard revisions

View the Description View the linked PRs

Description of problem:

Deploying a helm chart that features a values.schema.json using either 2019-09 or 2020-20 (latest) revisions of the JSON-Schema results in the UI hanging on create with three dots loading... This is not the case if YAML view is used, since I suppose this view is not trying to be clever and let Helm validate the chart values against the schema itself.

Version-Release number of selected component (if applicable):

Reproduced in 4.13, probably affects other versions as well.

How reproducible:

100%

Steps to Reproduce:

1. Go to Helm tab.
2. Click create in top right and select Repository
3. Paste following into YAML view and click Create:

apiVersion: helm.openshift.io/v1beta1
kind: ProjectHelmChartRepository
metadata:
  name: reproducer
spec:
  connectionConfig:
    url: 'https://raw.githubusercontent.com/tumido/helm-backstage/blog2'

Actual results:

Expected results:

It installs and deploys the chart

Additional info:

This is caused by a JSON Schema containing a $schema key pointing which revision of the JSON Schema standard should be used:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
}

I've managed to trace this back to this react-jsonschema-form issue:

https://github.com/rjsf-team/react-jsonschema-form/issues/2241

It seems the library used here for validation doesn't support 2019-09 draft and the most current revision 2020-20 revision.

It happens only if the chart follows the JSON Schema standard and declares the revision properly.

Workarounds:

force charts to follow an outdated revision
force charts to declare "use the latest revision available". This option was explicitly removed from the standard and users are advised NOT to use this.

IMO best solution:
Helm form renderer should NOT do any validation, since it can't handle the schema properly. Instead, it should leave this job to the Helm backend. Helm validates the values against the schema when installing the chart anyways. The YAML view also does no validation. That one seems to do the job properly.

Currently, there is no formal requirement for charts admitted to the helm curated catalog saying that the most recent JSON Schema revision is 4 years old and later 2 revisions are not supported.

Also, the Form UI should not just hang on submit. Instead, it should at least fail gracefully.

https://github.com/openshift/console/pull/12929

Bug OCPBUGS-17477: "e2e-aws-ovn-serial" CI job fails on "[sig-auth][Feature:OAuthServer] [RequestHeaders] [IdP] test RequestHeaders IdP" test: expected "403 Forbidden", got "401 Unauthorized"

View the Description View the linked PRs

Description of problem

CI is flaky because of test failures such as the following:

{  fail [github.com/openshift/origin/test/extended/oauth/requestheaders.go:218]: full response header: HTTP/1.1 403 Forbidden
Content-Length: 192
Audit-Id: f6026f9b-06c5-4b4a-9414-8dc5c681b45a
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Type: application/json
Date: Tue, 08 Aug 2023 11:26:35 GMT
Expires: 0
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
X-Content-Type-Options: nosniff
X-Dns-Prefetch-Control: off
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/metrics\"","reason":"Forbidden","details":{},"code":403}


Expected
    <string>: 403 Forbidden
to contain substring
    <string>: 401 Unauthorized
Ginkgo exit error 1: exit with code 1}

This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_openshift-apiserver/380/pull-ci-openshift-openshift-apiserver-master-e2e-aws-ovn-serial/1688848417708576768. Search.ci has other similar failures.

Version-Release number of selected component (if applicable)

I have seen this in 4.14 CI jobs and 4.13 CI jobs.

How reproducible

Presently, search.ci shows the following stats for the past 14 days:

Found in 2.41% of runs (4.36% of failures) across 1078 total runs and 58 jobs (55.38% failed)
pull-ci-openshift-openshift-apiserver-master-e2e-aws-ovn-serial (all) - 25 runs, 40% failed, 20% of failures match = 8% impact
openshift-cluster-network-operator-1874-nightly-4.14-e2e-aws-ovn-serial (all) - 42 runs, 67% failed, 14% of failures match = 10% impact
pull-ci-openshift-kubernetes-master-e2e-aws-ovn-serial (all) - 59 runs, 54% failed, 6% of failures match = 3% impact
pull-ci-openshift-origin-master-e2e-aws-ovn-serial (all) - 434 runs, 66% failed, 2% of failures match = 1% impact
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-serial (all) - 55 runs, 49% failed, 7% of failures match = 4% impact
pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-serial (all) - 60 runs, 58% failed, 3% of failures match = 2% impact
pull-ci-operator-framework-operator-marketplace-master-e2e-aws-ovn-serial (all) - 24 runs, 38% failed, 22% of failures match = 8% impact
pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-serial (all) - 81 runs, 58% failed, 4% of failures match = 2% impact
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn-serial (all) - 35 runs, 46% failed, 13% of failures match = 6% impact
rehearse-41872-pull-ci-openshift-ovn-kubernetes-release-4.14-e2e-aws-ovn-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-serial (all) - 72 runs, 49% failed, 3% of failures match = 1% impact
pull-ci-openshift-cluster-kube-apiserver-operator-release-4.13-e2e-aws-ovn-serial (all) - 4 runs, 75% failed, 33% of failures match = 25% impact
pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-serial (all) - 19 runs, 63% failed, 8% of failures match = 5% impact

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check search.ci using the link above.

Actual results

CI fails.

Expected results

CI passes, or fails on some other test failure.

https://github.com/openshift/origin/pull/28161

Story HOSTEDCP-919: 4.14 MCO kubelet config from payload comes with --external

View the Description View the linked PRs

Context:

In 4.14 kubelet config from MCO payload comes with --external, which means node.cloudprovider.kubernetes.io/uninitialized taint is set preventing workloads from being scheduled and only cleaned up by the external cloud provider.

This has come as a result of AWS removing their in-tree provider implementation for K8s 1.27

DoD:

We need to let the CPO run the AWS external cloud provider.

Bug OCPBUGS-12922: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/29

Bug OCPBUGS-12780: kuryr-controller crashes on KuryrPort cleanup when subport is already gone: Request requires an ID but none was found

View the Description View the linked PRs

Description of problem:

023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health [-] Component KuryrPortHandler is dead. Last caught exception below: openstack.exceptions.InvalidRequest: Request requires an ID but none was found
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health Traceback (most recent call last):
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 169, in on_finalize
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     pod = self.k8s.get(f"{constants.K8S_API_NAMESPACES}"
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 121, in get
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     self._raise_from_response(response)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/k8s_client.py", line 99, in _raise_from_response
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     raise exc.K8sResourceNotFound(response.text)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"mygov-tuo-microservice-dev2-59fffbc58c-l5b79\\" not found","reason":"NotFound","details":{"name":"mygov-tuo-microservice-dev2-59fffbc58c-l5b79","kind":"pods"},"code":404}\n'
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health During handling of the above exception, another exception occurred:
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health Traceback (most recent call last):
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 38, in __call__
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     self._handler(event, *args, **kwargs)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 85, in __call__
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     self._handler(event, *args, retry_info=info, **kwargs)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 98, in __call__
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     self.on_finalize(obj, *args, **kwargs)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 184, in on_finalize
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     pod = self._mock_cleanup_pod(kuryrport_crd)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 160, in _mock_cleanup_pod
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     host_ip = utils.get_parent_port_ip(port_id)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/utils.py", line 830, in get_parent_port_ip
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     parent_port = os_net.get_port(port_id)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/openstack/network/v2/_proxy.py", line 1987, in get_port
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     return self._get(_port.Port, port)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 48, in check
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     return method(self, expected, actual, *args, **kwargs)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 513, in _get
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     resource_type=resource_type.__name__, value=value))
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/openstack/resource.py", line 1472, in fetch
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     base_path=base_path)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/openstack/network/v2/_base.py", line 26, in _prepare_request
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     base_path=base_path, params=params)
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health   File "/usr/lib/python3.6/site-packages/openstack/resource.py", line 1156, in _prepare_request
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health     "Request requires an ID but none was found")
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health openstack.exceptions.InvalidRequest: Request requires an ID but none was found
2023-04-20 02:08:09.770 1 ERROR kuryr_kubernetes.controller.managers.health
2023-04-20 02:08:09.918 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopping
2023-04-20 02:08:09.919 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworks'
2023-04-20 02:08:10.026 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/machine.openshift.io/v1beta1/machines'
2023-04-20 02:08:10.152 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/pods'
2023-04-20 02:08:10.174 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/networking.k8s.io/v1/networkpolicies'
2023-04-20 02:08:10.857 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/namespaces'
2023-04-20 02:08:10.877 1 WARNING kuryr_kubernetes.controller.drivers.utils [-] Namespace dev-health-air-ids not yet ready: kuryr_kubernetes.exceptions.K8sResourceNotFound: Resource not found: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"kuryrnetworks.openstack.org \\"dev-health-air-ids\\" not found","reason":"NotFound","details":{"name":"dev-health-air-ids","group":"openstack.org","kind":"kuryrnetworks"},"code":404}\n'
2023-04-20 02:08:11.024 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/services'
2023-04-20 02:08:11.078 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/endpoints'
2023-04-20 02:08:11.170 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrports'
2023-04-20 02:08:11.344 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrnetworkpolicies'
2023-04-20 02:08:11.475 1 INFO kuryr_kubernetes.watcher [-] Stopped watching '/apis/openstack.org/v1/kuryrloadbalancers'
2023-04-20 02:08:11.475 1 INFO kuryr_kubernetes.watcher [-] No remaining active watchers, Exiting...
2023-04-20 02:08:11.475 1 INFO kuryr_kubernetes.controller.service [-] Service 'KuryrK8sService' stopping

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a pod.
2. Stop kuryr-controller.
3. Delete the pod and the finalizer on it.
4. Delete pod's subport.
5. Start the controller.

Actual results:

Crash

Expected results:

Port cleaned up normally.

Additional info:

https://github.com/openshift/kuryr-kubernetes/pull/724

Bug OCPBUGS-12718: Update 4.14 ose-vmware-vsphere-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/75

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/71

Bug OCPBUGS-781: ironic-proxy is using a deprecated field to fetch cluster VIP

View the Description View the linked PRs

Description of problem:

In https://github.com/openshift/cluster-baremetal-operator/blob/master/provisioning/utils.go#L65 we reference .PlatformStatus.BareMetal.APIServerInternalIP attribute from the config API. Meanwhile, a recent change https://github.com/openshift/api/commit/51f399230d604fa013c2bb341040c4ad126e7309 deprecated this field in favour of .APIServerInternalIPs (note plural), this was done to better suit dual stack case.

We need to update the code (trivial) along with vendor dependencies (openshift/api needs a bump to version equal or later to the one including the commit referenced above). Likely there will be code changes required in CBO to adopt to the newer API package.

Slack threads for reference: https://app.slack.com/client/T027F3GAJ/C01RJHA6BRC/thread/C01RJHA6BRC-1661416223.353009 (vendor dependency update)

openshift/api change:
https://coreos.slack.com/archives/C01RJHA6BRC/p1660573560434409?thread_ts=1660229723.998839&cid=C01RJHA6BRC

IMPORTANT NOTE: there is an in-flight PR which is making changes to the CBO code fetching the VIP: https://github.com/openshift/cluster-baremetal-operator/pull/285.

Work done to address this bug needs to be stacked on top of this to avoid duplication of effort (the easiest way is to work on the code from the in-flight PR285 and merge once PR285 merges)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-baremetal-operator/pull/295

Bug OCPBUGS-10076: Update 4.14 openshift-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/95

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/95

Bug OCPBUGS-12196: Update CVO to stable-4.14

View the Description View the linked PRs

Bugs are required for all 4.14 merges right now due to instability. We need to bump the version of the cvo so that the version is consistent with the cluster being installed.

https://github.com/openshift/installer/pull/7114

Bug HOSTEDCP-947: Hosted etcd running out of space on PVC after scale test

View the Description View the linked PRs

After running several scale tests on a large cluster (252 workers), etcd ran out of space and became unavailable.

These tests consisted of running our node-density workload (Creates more than 50k pause pods) and cluster-density 4k several times (creates 4k namespaces with https://github.com/cloud-bulldozer/e2e-benchmarking/tree/master/workloads/kube-burner#cluster-density-variables).

The actions above leaded etcd peers to run out of free space in their 4GiB PVCs presenting the following error trace

{"level":"warn","ts":"2023-03-31T09:50:57.532Z","caller":"rafthttp/http.go:271","msg":"failed to save incoming database snapshot","local-member-id":"b14198cd7f0eebf1","remote-snapshot-sender-id":"a4e894c3f4af1379","incoming-snapshot-index ":19490191,"error":"write /var/lib/data/member/snap/tmp774311312: no space left on device"}

Etcd uses 4GiB PVCs to store its data, which seems to be insufficient for this scenario. In addition, unlike not-hypershift clusters we're not applying any periodic database defragmentation (this is done by cluster-etcd-operator) that could lead to a higher database size

The graph below represents the metrics etcd_mvcc_db_total_size_in_bytes and etcd_mvcc_db_total_size_in_use_in_byte

Bug OCPBUGS-12980: Kubelet CA file not written by MCD firstboot

View the Description View the linked PRs

Description of problem:

In our IBM Cloud use-case of RHCOS, we are seeing 4.13 RHCOS nodes failing to properly bootstrap to a HyperShift 4.13 control plane. RHCOS worker node kubelet is failing with "failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/kubelet-ca.crt: open /etc/kubernetes/kubelet-ca.crt: no such file or directory".

Version-Release number of selected component (if applicable):

4.13.0-rc.6

How reproducible:

100%

Steps to Reproduce:

1. Create a HyperShift 4.13 control plane
2. Boot a RHCOS host outside of cluster
3. After initial RHCOS boot, fetch ignition from control plane
4. Attempt to bootstrap to cluster via `machine-config-daemon firstboot-complete-machineconfig`

Actual results:

Kubelet service fails with "failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/kubelet-ca.crt: open /etc/kubernetes/kubelet-ca.crt: no such file or directory".

Expected results:

RHCOS worker node to properly bootstrap to HyperShift control plane. This has been the supported bootstrapping flow for releases <4.13.

Additional info:

References:
- https://redhat-internal.slack.com/archives/C01C8502FMM/p1682968210631419
- https://github.com/openshift/machine-config-operator/pull/3575
- https://github.com/openshift/machine-config-operator/pull/3654

https://github.com/openshift/machine-config-operator/pull/3694

Bug OCPBUGS-16513: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/91

Bug OCPBUGS-18978: router pod has mgmt KAS access even though it doesn't have NeedManagementKASAccessLabel

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18907~~. The following is the description of the original issue:
—
Description of problem:

From on to https://issues.redhat.com/browse/OCPBUGS-17827

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters
NAME       VERSION                              KUBECONFIG                  PROGRESS    AVAILABLE   PROGRESSING   MESSAGE
jie-test   4.14.0-0.nightly-2023-09-12-024050   jie-test-admin-kubeconfig   Completed   True        False         The hosted control plane is available
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep router
router-78d47f4c69-2mvbp                               1/1     Running            0          68m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get pods router-78d47f4c69-2mvbp -n clusters-jie-test -ojsonpath='{.metadata.labels}' | jq
{
  "app": "private-router",
  "hypershift.openshift.io/hosted-control-plane": "clusters-jie-test",
  "hypershift.openshift.io/request-serving-component": "true",
  "pod-template-hash": "78d47f4c69"
}
jiezhao-mac:hypershift jiezhao$ oc get networkpolicy management-kas  -n clusters-jie-test
NAME             POD-SELECTOR                                                                                   AGE
management-kas   !hypershift.openshift.io/need-management-kas-access,name notin (aws-ebs-csi-driver-operator)   76m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get networkpolicy management-kas  -n clusters-jie-test -o yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
    hypershift.openshift.io/cluster: clusters/jie-test
  creationTimestamp: "2023-09-12T14:43:13Z"
  generation: 1
  name: management-kas
  namespace: clusters-jie-test
  resourceVersion: "54049"
  uid: 72288fed-a1f6-4dc9-bb63-981d7cdd479f
spec:
  egress:
  - to:
    - podSelector: {}
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.46.47/32
        - 10.0.7.159/32
        - 10.0.77.20/32
        - 10.128.0.0/14
  - ports:
    - port: 5353
      protocol: UDP
    - port: 5353
      protocol: TCP
    to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-dns
  podSelector:
    matchExpressions:
    - key: hypershift.openshift.io/need-management-kas-access
      operator: DoesNotExist
    - key: name
      operator: NotIn
      values:
      - aws-ebs-csi-driver-operator
  policyTypes:
  - Egress
status: {}
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes
NAME         ENDPOINTS                                         AGE
kubernetes   10.0.46.47:6443,10.0.7.159:6443,10.0.77.20:6443   150m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  creationTimestamp: "2023-09-12T13:32:47Z"
  labels:
    endpointslice.kubernetes.io/skip-mirror: "true"
  name: kubernetes
  namespace: default
  resourceVersion: "31961"
  uid: bc170a67-018f-4490-a18c-811ebd3f3676
subsets:
- addresses:
  - ip: 10.0.46.47
  - ip: 10.0.7.159
  - ip: 10.0.77.20
  ports:
  - name: https
    port: 6443
    protocol: TCP
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes -ojsonpath='{.subsets[].addresses[].ip}{"\n"}'
10.0.46.47
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes -ojsonpath='{.subsets[].ports[].port}{"\n"}'
6443
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc project clusters-jie-test
Now using project "clusters-jie-test" on server "https://api.jiezhao-091201.qe.devcluster.openshift.com:6443".
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc -n clusters-jie-test rsh pod/router-78d47f4c69-2mvbp curl --connect-timeout 2 -Iks https://10.0.46.47:6443 -v 
* Rebuilt URL to: https://10.0.46.47:6443/
*   Trying 10.0.46.47...
* TCP_NODELAY set
* Connected to 10.0.46.47 (10.0.46.47) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=172.30.0.1
*  start date: Sep 12 13:35:51 2023 GMT
*  expire date: Oct 12 13:35:52 2023 GMT
*  issuer: OU=openshift; CN=kube-apiserver-service-network-signer
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x55c5c46cb990)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> HEAD / HTTP/2
> Host: 10.0.46.47:6443
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 2000)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 403 
HTTP/2 403 
< audit-id: 82d5f3f7-6e5b-4bb5-b846-54df09aefb54
audit-id: 82d5f3f7-6e5b-4bb5-b846-54df09aefb54
< cache-control: no-cache, private
cache-control: no-cache, private
< content-type: application/json
content-type: application/json
< strict-transport-security: max-age=31536000; includeSubDomains; preload
strict-transport-security: max-age=31536000; includeSubDomains; preload
< x-content-type-options: nosniff
x-content-type-options: nosniff
< x-kubernetes-pf-flowschema-uid: 6edd6532-2d15-4d8d-9cea-4dcce99b881f
x-kubernetes-pf-flowschema-uid: 6edd6532-2d15-4d8d-9cea-4dcce99b881f
< x-kubernetes-pf-prioritylevel-uid: 4115bb59-a78d-42ab-9136-37529cf107e1
x-kubernetes-pf-prioritylevel-uid: 4115bb59-a78d-42ab-9136-37529cf107e1
< content-length: 218
content-length: 218
< date: Tue, 12 Sep 2023 16:05:02 GMT
date: Tue, 12 Sep 2023 16:05:02 GMT
< 
* Connection #0 to host 10.0.46.47 left intact
jiezhao-mac:hypershift jiezhao$

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3010

Bug OCPBUGS-19397: baremetal 4.14.0-rc.0 ipv6 sno cluster, no Observe menu on admin console, monitoring-plugin is failed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19059~~. The following is the description of the original issue:
—
Description of problem:

baremetal 4.14.0-rc.0 ipv6 sno cluster, login as admin user to admin console, there is not Observe menu on the left navigation bar, see picture, https://drive.google.com/file/d/13RAXPxtKhAElN9xf8bAmLJa0GI8pP0fH/view?usp=sharing, monitoring-plugin status is Failed, see: https://drive.google.com/file/d/1YsSaGdLT4bMn-6E-WyFWbOpwvDY4t6na/view?usp=sharing, error is

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
r: Bad Gateway

checked console logs, 9443: connect: connection refused

$ oc -n openshift-console logs console-6869f8f4f4-56mbj
...
E0915 12:50:15.498589       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:15 http: panic serving [fd01:0:0:1::2]:39156: runtime error: invalid memory address or nil pointer dereference
goroutine 183760 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0003b5760, 0x2?, {0xc0009bc7d1, 0x11}, {0x3a41fa0, 0xc0002f6c40}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xaa00000000000010?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0002f6c40?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc0001f7500?, {0x3a41fa0?, 0xc0002f6c40?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0x5120938?, {0x3a41fa0?, 0xc0002f6c40?}, 0x7ffb6ea27f18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc001102c00?, {0x3a41fa0?, 0xc0002f6c40?}, 0xc000655a00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0002f6c40}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0002f6c40?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc0008201e0?}, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc0009b4120, {0x3a43e70, 0xc001223500})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed
I0915 12:50:24.267777       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
I0915 12:50:24.267813       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
E0915 12:50:30.155515       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:30 http: panic serving [fd01:0:0:1::2]:42990: runtime error: invalid memory address or nil pointer dereference

9443 port is Connection refused

$ oc -n openshift-monitoring get pod -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP                  NODE    NOMINATED NODE   READINESS GATES
alertmanager-main-0                                      6/6     Running   6          3d22h   fd01:0:0:1::564     sno-2   <none>           <none>
cluster-monitoring-operator-6cb777d488-nnpmx             1/1     Running   4          7d16h   fd01:0:0:1::12      sno-2   <none>           <none>
kube-state-metrics-dc5f769bc-p97m7                       3/3     Running   12         7d16h   fd01:0:0:1::3b      sno-2   <none>           <none>
monitoring-plugin-85bfb98485-d4g5x                       1/1     Running   4          7d16h   fd01:0:0:1::55      sno-2   <none>           <none>
node-exporter-ndnnj                                      2/2     Running   8          7d16h   2620:52:0:165::41   sno-2   <none>           <none>
openshift-state-metrics-78df59b4d5-j6r5s                 3/3     Running   12         7d16h   fd01:0:0:1::3a      sno-2   <none>           <none>
prometheus-adapter-6f86f7d8f5-ttflf                      1/1     Running   0          4h23m   fd01:0:0:1::b10c    sno-2   <none>           <none>
prometheus-k8s-0                                         6/6     Running   6          3d22h   fd01:0:0:1::566     sno-2   <none>           <none>
prometheus-operator-7c94855989-csts2                     2/2     Running   8          7d16h   fd01:0:0:1::39      sno-2   <none>           <none>
prometheus-operator-admission-webhook-7bb64b88cd-bvq8m   1/1     Running   4          7d16h   fd01:0:0:1::37      sno-2   <none>           <none>
thanos-querier-5bbb764599-vlztq                          6/6     Running   6          3d22h   fd01:0:0:1::56a     sno-2   <none>           <none>

$  oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::f735   <none>        9443/TCP   7d16h


$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::f735...
* TCP_NODELAY set
* connect to fd02::f735 port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7

no such issue in other 4.14.0-rc.0 ipv4 cluster, but issue reproduced on other 4.14.0-rc.0 ipv6 cluster.
4.14.0-rc.0 ipv4 cluster,

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         20m     Cluster version is 4.14.0-rc.0

$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-85bfb98485-nh428                       1/1     Running   0          4m      10.128.0.107   ci-ln-pby4bj2-72292-l5q8v-master-0   <none>           <none>

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k  'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
...
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {
      "type": "console.page/route",
      "properties": {
        "exact": true,
        "path": "/monitoring",
        "component": {
          "$codeRef": "MonitoringUI"
        }
      }
    },
...

meet issue "9443: Connection refused" in 4.14.0-rc.0 ipv6 cluster(launched cluster-bot cluster: launch 4.14.0-rc.0 metal,ipv6) and login console

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         44m     Cluster version is 4.14.0-rc.0
$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-bd6ffdb5d-b5csk                        1/1     Running   0          53m   fd01:0:0:4::b             worker-0.ostest.test.metalkube.org   <none>           <none>
monitoring-plugin-bd6ffdb5d-vhtpf                        1/1     Running   0          53m   fd01:0:0:5::9             worker-2.ostest.test.metalkube.org   <none>           <none>
$ oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::402d   <none>        9443/TCP   59m

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::402d...
* TCP_NODELAY set
* connect to fd02::402d port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7$ oc -n openshift-console get pod | grep console
console-5cffbc7964-7ljft     1/1     Running   0          56m
console-5cffbc7964-d864q     1/1     Running   0          56m$ oc -n openshift-console logs console-5cffbc7964-7ljft
...
E0916 14:34:16.330117       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::402d]:9443: connect: connection refused
2023/09/16 14:34:16 http: panic serving [fd01:0:0:4::2]:37680: runtime error: invalid memory address or nil pointer dereference
goroutine 3985 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0008f6780, 0x2?, {0xc000665211, 0x11}, {0x3a41fa0, 0xc0009221c0}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xfe00000000000010?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0009221c0?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc000d8d600?, {0x3a41fa0?, 0xc0009221c0?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0xc000653830?, {0x3a41fa0?, 0xc0009221c0?}, 0x7f824506bf18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc00007e800?, {0x3a41fa0?, 0xc0009221c0?}, 0xc000b2da00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0009221c0}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0009221c0?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc000db9b00?}, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc000653680, {0x3a43e70, 0xc000676f30})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed

Version-Release number of selected component (if applicable):

baremetal 4.14.0-rc.0 ipv6 sno cluster,
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=virt_platform'  | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "virt_platform",
          "baseboard_manufacturer": "Dell Inc.",
          "baseboard_product_name": "01J4WF",
          "bios_vendor": "Dell Inc.",
          "bios_version": "1.10.2",
          "container": "kube-rbac-proxy",
          "endpoint": "https",
          "instance": "sno-2",
          "job": "node-exporter",
          "namespace": "openshift-monitoring",
          "pod": "node-exporter-ndnnj",
          "prometheus": "openshift-monitoring/k8s",
          "service": "node-exporter",
          "system_manufacturer": "Dell Inc.",
          "system_product_name": "PowerEdge R750",
          "system_version": "Not Specified",
          "type": "none"
        },
        "value": [
          1694785092.664,
          "1"
        ]
      }
    ]
  }
}

How reproducible:

only seen on this cluster

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

no Observe menu on admin console, monitoring-plugin is failed

Expected results:

no error

https://github.com/openshift/cluster-monitoring-operator/pull/2091

Bug OCPBUGS-9305: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12685

Bug OCPBUGS-10829: [Reliability]kube-apiserver's memory usage keep increasing to max 3GB in 7 days

View the Description View the linked PRs

Description of problem:

In 7 day's reliability test, kube-apiserver's memory usage keep increasing. Max is over 3GB.
In our 4.12 test result, the kube-apiserver's memory usage was stable around 1.7 GB and not keep increasing. 
I'll redo the test on a 4.12.0 build to see if I can reproduce this issue.

I'll do a longer than 7 days test to see how high the memory can grow up.

About Reliability Test
https://github.com/openshift/svt/tree/master/reliability-v2

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-14-053612

How reproducible:

Always

Steps to Reproduce:

1. Install an AWS cluster with m5.xlarge type
2. Run reliability test for 7 days
Reliability Test Configuration example:
https://github.com/openshift/svt/tree/master/reliability-v2#groups-and-tasks-1
Config used in this test:
admin: 1 user
dev-test: 15 users
dev-prod: 1 user 
3. Use dittybopper dashboard to monitor the kube-apiserver's memory usage

Actual results:

kube-apiserver's memory usage keep increasing. Max is over 3GB

Expected results:

kube-apiserver's memory usage should not keep increasing

Additional info:

Screenshots are uploaded to shared folder OCPBUGS-10829 - Google Drive

413-kube-apiserver-memory.png
413-api-performance-last2d.png - test was stopped on [2023-03-24 04:21:10 UTC]
412-kube-apiserver-memory.png
must-gather.local.525817950490593011.tar.gz - 4.13 cluster's must gather

https://github.com/openshift/kubernetes/pull/1548

Bug OCPBUGS-17515: Console UI is broken due to patternfly/react-core version change

View the Description View the linked PRs

Console UI is broken due to patternfly/react-core version changed to
4.276.11 from 4.276.8

https://github.com/openshift/console/pull/13086

Bug OCPBUGS-12614: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/57

Bug OCPBUGS-6343: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/100

Bug OCPBUGS-17255: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/195

Bug OCPBUGS-7841: hypershift_hostedclusters_failure_conditions metric incorrectly reports multiple clusters for a single cluster

View the Description View the linked PRs

Description of problem:

The hypershift_hostedclusters_failure_conditions metric produced by the HyperShift operator does not report a value of 0 for conditions that no longer apply. The result is that if a hostedcluster had a failure condition at a given point, but that condition has gone away, the metric still reports a count for that condition.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a HostedCluster, watch the hypershift_hostedclusters_failure_conditions metric as failure conditions occur.
2.
3.

Actual results:

A cluster count of 1 with a failure condition is reported even if the failure condition no longer applies.

Expected results:

Once failure conditions no longer apply, 0 clusters with those conditions should be reported.

Additional info:

The metric should report an accurate count for each possible failure condition of all clusters at any given time.

Bug OCPBUGS-9214: Create button is disabled in Git Import form when git repo url has hyphens in owner part of the url

View the Description View the linked PRs

Description of problem:

When adding a repository url that contains hyphens in the <owner> part of the url
(<https://github.com/owner/url> - eg https://github.com/redhat-developer/s2i-dotnetcore-ex.git), then create button stays disabled and validation errors are not presented in the UI.

Version-Release number of selected component (if applicable):
4.9

How reproducible:
Always

Steps to Reproduce:
1. Go to Developer -> Add -> Import from Git page
2. use the repo url https://github.com/redhat-developer/s2i-dotnetcore-ex.git
3. add `/app` in the context dir under advanced git options.

Actual results:

1Once the builder image is detected, then Create button is disabled but no errors in the form. When the user touches the name field and then name validation error message is shown even if the suggested name is valid.

Expected results:

After detecting the builder image, the create button should be enabled.

Additional info:

https://github.com/openshift/console/pull/12652

Bug OCPBUGS-10022: Authorization with ClusterRoleBinding not working as expected when using system:serviceaccounts

View the Description View the linked PRs

Description of problem:

Authorization by OpenShift Container Platform 4 is not working as expected, when using system:serviceaccounts Group in the ClusterRoleBinding.

Here, one would assume that every serviceAccount would be granted the permissions to access the defined resources but actually access is denied.

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:17:45Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-100",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-100:builder"
  },
  "status": {
    "allowed": false
  }
}

When specifying the serviceAccount in the ClusterRoleBinding access is granted:

$ oc get clusterrolebinding shared-secret-cluster-role-binding -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"shared-secret-cluster-role-binding"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"shared-secret-cluster-role"},"subjects":[{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:serviceaccounts"}]}
  creationTimestamp: "2023-03-13T08:59:46Z"
  name: shared-secret-cluster-role-binding
  resourceVersion: "1575464"
  uid: dd11825d-834a-4807-ab82-30dc0a415985
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: shared-secret-cluster-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- kind: ServiceAccount
  name: builder
  namespace: project-101

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:16:47Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-101",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-101:builder"
  },
  "status": {
    "allowed": true,
    "reason": "RBAC: allowed by ClusterRoleBinding \"shared-secret-cluster-role-binding\" of ClusterRole \"shared-secret-cluster-role\" to ServiceAccount \"builder/project-101\""
  }
}

Both namespaces exist and have the serviceAccount automatically created.

$ oc get sa -n project-100
NAME       SECRETS   AGE
builder    1         11m
default    1         11m
deployer   1         11m

$ oc get sa -n project-101
NAME       SECRETS   AGE
builder    1         4m1s
default    1         4m1s
deployer   1         4m

The difference is only how authorization is granted. For project-101 the serviceAccount is explicitly granted while for project-100 authorization should be granted via Group called system:serviceaccounts

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.5

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.12
2. Create SharedSecret CRD using oc apply -f https://raw.githubusercontent.com/openshift/api/master/sharedresource/v1alpha1/0000_10_sharedsecret.crd.yaml
3. Create SharedSecret resource:
$ oc get sharedsecret shared-subscription -o yaml
apiVersion: sharedresource.openshift.io/v1alpha1
kind: SharedSecret
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"sharedresource.openshift.io/v1alpha1","kind":"SharedSecret","metadata":{"annotations":{},"name":"shared-subscription"},"spec":{"secretRef":{"name":"etc-pki-entitlement","namespace":"openshift-config-managed"}}}
  creationTimestamp: "2023-03-13T08:54:48Z"
  generation: 1
  name: shared-subscription
  resourceVersion: "1567499"
  uid: 15c350aa-0de1-4a02-b876-9b822ba0afe5
spec:
  secretRef:
    name: etc-pki-entitlement
    namespace: openshift-config-managed
4. Create ClusterRole to grant access to SharedSecret:
$ oc get clusterrole shared-secret-cluster-role -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"shared-secret-cluster-role"},"rules":[{"apiGroups":["sharedresource.openshift.io"],"resourceNames":["shared-subscription"],"resources":["sharedsecrets"],"verbs":["use"]}]}
  creationTimestamp: "2023-03-13T08:57:24Z"
  name: shared-secret-cluster-role
  resourceVersion: "1568481"
  uid: 99324722-ac62-4bb8-a7fe-7ac915393e19
rules:
- apiGroups:
  - sharedresource.openshift.io
  resourceNames:
  - shared-subscription
  resources:
  - sharedsecrets
  verbs:
  - use
5. Create ClusterRoleBinding to access SharedSecret
$ oc get clusterrolebinding shared-secret-cluster-role-binding -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"shared-secret-cluster-role-binding"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"shared-secret-cluster-role"},"subjects":[{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:serviceaccounts"}]}
  creationTimestamp: "2023-03-13T08:59:46Z"
  name: shared-secret-cluster-role-binding
  resourceVersion: "1575464"
  uid: dd11825d-834a-4807-ab82-30dc0a415985
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: shared-secret-cluster-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- kind: ServiceAccount
  name: builder
  namespace: project-101
6. Run SubjectAccessReview call to validate authoriztion:
$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:17:45Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-100",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-100:builder"
  },
  "status": {
    "allowed": false
  }
}

Actual results:

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:17:45Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-100",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-100:builder"
  },
  "status": {
    "allowed": false
  }
}

Expected results:

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:16:47Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-101",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-101:builder"
  },
  "status": {
    "allowed": true,
    "reason": "RBAC: allowed by ClusterRoleBinding \"shared-secret-cluster-role-binding\" of ClusterRole \"shared-secret-cluster-role\" to ServiceAccount \"builder/project-101\""
  }
}

Additional info:

The goal is to use the Group "system:serviceaccounts" to authorize all serviceAccounts to access the given resources to avoid listing all namespaces specifically and thus have the need to create a controller that needs to update a list or similar.

https://github.com/openshift/csi-driver-shared-resource/pull/130

Bug OCPBUGS-9949: create image command erroneously logs that Base ISO was obtained from release

View the Description View the linked PRs

Description of problem:

When creating an image for arm, i.e. using:
  architecture: arm64

and running
$ ./bin/openshift-install agent create image --dir ./cluster-manifests/ --log-level debug

the output indicates the the correct base iso was extracted from the release:
INFO Extracting base ISO from release payload     
DEBUG Using mirror configuration                   
DEBUG Fetching image from OCP release (oc adm release info --image-for=machine-os-images --insecure=true --icsp-file=/tmp/icsp-file347546417 registry.ci.openshift.org/origin/release:4.13) 
DEBUG extracting /coreos/coreos-aarch64.iso to /home/bfournie/.cache/agent/image_cache, oc image extract --path /coreos/coreos-aarch64.iso:/home/bfournie/.cache/agent/image_cache --confirm --icsp-file=/tmp/icsp-file3609464443 registry.ci.openshift.org/origin/4.13-2023-03-09-142410@sha256:e3c4445cabe16ca08c5b874b7a7c9d378151eb825bacc90e240cfba9339a828c 
INFO Base ISO obtained from release and cached at /home/bfournie/.cache/agent/image_cache/coreos-aarch64.iso 
DEBUG Extracted base ISO image /home/bfournie/.cache/agent/image_cache/coreos-aarch64.iso from release payload 

When in fact the ISO was not extracted from the release image and the command failed:
ERROR failed to write asset (Agent Installer ISO) to disk: cannot generate ISO image due to configuration errors 
FATAL failed to fetch Agent Installer ISO: failed to generate asset "Agent Installer ISO": provided device /home/bfournie/.cache/agent/image_cache/coreos-aarch64.iso does not exist

Version-Release number of selected component (if applicable):

4.13

How reproducible:

every time

Steps to Reproduce:

1. Set architecture: arm64  for all hosts in install-config.yaml 
2. Run the openshift-install command as above
3. See the log messages and the command fails

Actual results:

Invalid messages are logged and command fails

Expected results:

Command succeeds

Additional info:

https://github.com/openshift/installer/pull/6960

Bug OCPBUGS-11197: Rephrase vCenter connection plugin based on feedback

View the Description View the linked PRs

Description of problem:

During the documentation writing phase, we have received suggestions to improve texts in the vSphere Connection modal. We should address them.

https://docs.google.com/document/d/1jLnHuJyOR5nyuFTpSO6LcuHDVrVGUSs2EMpLFey1qDQ/edit

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Deploy OCP cluster on the vSphere platform
2. On the homepage of the Console, see VCenter status plugin
3.

Actual results:

Expected results:

Additional info:

It's about rephrasing only.

https://github.com/openshift/console/pull/12694

Bug OCPBUGS-14121: Agent based IPV6 only install fails when the RendezvousIP is not canonical

View the Description View the linked PRs

Description of problem:

When doing an IPV6 only agent based installer on bare metal this fails if the RendezvousIP value is not canonical.

Version-Release number of selected component (if applicable):

OCP 4.12

How reproducible:

Every time.

Steps to Reproduce:

1. Configure the agent through agen-config.yaml for an IPV6 only install.
2. Set to something that is correct, but not canonical: 
   for example: rendezvousIP: 2a00:8a00:4000:020c:0000:0000:0018:143c 
3. Generate discovery iso and boot nodes.

Actual results:

Installation fails because the set-node-zero.sh script fails to discover that it is running on node zero.

Expected results:

Installation completes.

Additional info:

The code that detects wether a host is node-zero uses this:

is_rendezvous_host=$(ip -j address | jq "[.[].addr_info] | flatten | map(.local==\"$NODE_ZERO_IP\") | any")

This fails in unexpected ways with IPV6 that are not canonical, as the output of ip address is always canonical, but in this case the value for $NODE_ZERO_IP wasn't.

We did test this on the node itself: 

[root@slabnode2290 bin]# ip -j address | jq '[.[].addr_info] | flatten | map(.local=="2a00:8a00:4000:020c:0000:0000:0018:143c") | any' 
false

[root@slabnode2290 bin]# ip -j address | jq '[.[].addr_info] | flatten | map(.local=="2a00:8a00:4000:20c::18:143c") | any'
true

A solution may be to use a tool like ipcalc, once available, to do this test and make it less strict. In the mean time a note in the docs would be a good idea.

https://github.com/openshift/installer/pull/7234

Bug OCPBUGS-19002: cluster-restore.sh does not move static pods back

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18990~~. The following is the description of the original issue:
—
Description of problem:

The script refactoring from https://github.com/openshift/cluster-etcd-operator/pull/1057 introduced a regression. 

Since the static pod list variable was renamed, it is now empty and won't restore the non-etcd pod yamls anymore.

Version-Release number of selected component (if applicable):

4.14 and later

How reproducible:

always

Steps to Reproduce:

1. create a cluster
2. restore using cluster-restore.sh

Actual results:

the apiserver and other static pods are not immediately restored

The script only outputs this log:

removing previous backup /var/lib/etcd-backup/member
Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod

Expected results:

the non-etcd static pods should be immediately restored by moving them into the manifest directory again.

You can see this by the log output:

Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod
starting kube-apiserver-pod.yaml
static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml
starting kube-controller-manager-pod.yaml
static-pod-resources/kube-controller-manager-pod-7/kube-controller-manager-pod.yaml
starting kube-scheduler-pod.yaml
static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1112

Bug OCPBUGS-19479: CNV regression with recent Kubernetes rebase - device plugin

View the Description View the linked PRs

Description of problem:

Pods are being terminated on Kubelet restart if they consume any device.

In case of CNV this Pods are carrying VMs and the assuption is that Kubelet will not terminate the Pod in this case.

Version-Release number of selected component (if applicable):

4.14 / 4.13.z / 4.12.z

How reproducible:

This should be reproducable with any device plugin as far as goes my understanding

Steps to Reproduce:

1. Create Pod requesting device plugin
2. Restart Kubelet
3.

Actual results:

Admission error -> Pod terminates

Expected results:

No error -> Existing & Running Pods will continue running after Kubelet restart

Additional info:

The culprit seems to be https://github.com/kubernetes/kubernetes/pull/116376

https://github.com/openshift/kubernetes/pull/1709

Bug OCPBUGS-10051: Catalogs should not be included in the ImageContentSourcePolicy.yaml

View the Description View the linked PRs

Description of problem:

Currently when the oc-mirror command runs the generated ImageContentSourcePolicy.yaml should not include mirrors for the mirrored operator catalogs

This should be the case for registry located catalogs and oci fbc catalogs (located on disk)
Jennifer Power, Alex Flom can you help us confirm this is the expected behavior?

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1.Run the oc mirror command mirroring the catalog
/bin/oc-mirror --config imageSetConfig.yaml  docker://localhost:5000  --use-oci-feature  --dest-use-http  --dest-skip-tls
with imagesetconfig:
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /tmp/storageBackend
mirror:
  operators:
  - catalog: oci:///home/user/catalogs/rhop4.12
    # copied from registry.redhat.io/redhat/redhat-operator-index:v4.12
    targetCatalog: "mno/redhat-operator-index"
    targetVersion: "v4.12"
    packages:
    - name: aws-load-balancer-operator

Actual results:

Catalog is included in the imageContentSourcePolicy.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: localhost:5000/mno/redhat-operator-index:v4.12
  sourceType: grpc

---
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  labels:
    operators.openshift.org/catalog: "true"
  name: operator-0
spec:
  repositoryDigestMirrors:
  - mirrors:
    - localhost:5000/albo
    source: registry.redhat.io/albo
  - mirrors:
    - localhost:5000/mno
    source: mno
  - mirrors:
    - localhost:5000/openshift4
    source: registry.redhat.io/openshift4

Expected results:

No catalog should be included in the imageContentSourcePolicy.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: localhost:5000/mno/redhat-operator-index:v4.12
  sourceType: grpc

---
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  labels:
    operators.openshift.org/catalog: "true"
  name: operator-0
spec:
  repositoryDigestMirrors:
  - mirrors:
    - localhost:5000/albo
    source: registry.redhat.io/albo
  - mirrors:
    - localhost:5000/openshift4
    source: registry.redhat.io/openshift4

Additional info:

https://github.com/openshift/oc-mirror/pull/586

Bug OCPBUGS-16397: Nutanix: Telemetry “host_type” shows “virt-unknown” in an OCP Nutanix cluster

View the Description View the linked PRs

Description of problem:

Looking at the telemetry data for Nutanix I noticed that the “host_type” for clusters installed with platform nutanix shows as “virt-unknown”. Do you know what needs to happen in the code to tell telemetry about host type being Nutanix? The problem is that we can’t track those installations with platform none, just IPI.

Refer to the slack thread https://redhat-internal.slack.com/archives/C0211848DBN/p1687864857228739.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Create an OCP Nutanix cluster

Actual results:

The telemetry data for Nutanix shows the “host_type” for the nutanix cluster as “virt-unknown”.

Expected results:

The telemetry data for Nutanix shows the “host_type” for the nutanix cluster as "nutanix".

Additional info:

https://github.com/openshift/telemeter/pull/474

Bug OCPBUGS-1829: [Openshift Pipelines] Link to Openshift Route from service is breaking because of hardcoded value of targetPort

View the Description View the linked PRs

Description of problem:

Link to Openshift Route from service is breaking because of hardcoded value of targetPort. If the targetPort gets changed, the route still points to the older value of port as it's hardcoded

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Install the latest available version of Openshift Pipelines
2. Create the pipeline and triggerbinding using the attached files
3. Add trigger to the created pipeline from devconsole UI, select the above created triggerbinding while adding trigger
4. Trigger an event using the curl command curl -X POST -d '{ "url": "https://www.github.com/VeereshAradhya/cli" }' -H 'Content-Type: application/json' <route> and make sure that the pipelinerun gets started
5. Update the tagetPort in the svc from 8080 to 8000
6. Again use the above curl command to trigger one more event

Actual results:

The curl command throws error

Expected results:

The curl command should be successful and the pipelinerun should get started successfully

Additional info:

Error:
curl -X POST -d '{ "url": "https://www.github.com/VeereshAradhya/cli" }' -H 'Content-Type: application/json' http://el-event-listener-3o9zcv-test-devconsole.apps.ve412psi.psi.ospqa.com
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1">    <style type="text/css">
      body {
        font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
        line-height: 1.66666667;
        font-size: 16px;
        color: #333;
        background-color: #fff;
        margin: 2em 1em;
      }
      h1 {
        font-size: 28px;
        font-weight: 400;
      }
      p {
        margin: 0 0 10px;
      }
      .alert.alert-info {
        background-color: #F0F0F0;
        margin-top: 30px;
        padding: 30px;
      }
      .alert p {
        padding-left: 35px;
      }
      ul {
        padding-left: 51px;
        position: relative;
      }
      li {
        font-size: 14px;
        margin-bottom: 1em;
      }
      p.info {
        position: relative;
        font-size: 20px;
      }
      p.info:before, p.info:after {
        content: "";
        left: 0;
        position: absolute;
        top: 0;
      }
      p.info:before {
        background: #0066CC;
        border-radius: 16px;
        color: #fff;
        content: "i";
        font: bold 16px/24px serif;
        height: 24px;
        left: 0px;
        text-align: center;
        top: 4px;
        width: 24px;
      }      @media (min-width: 768px) {
        body {
          margin: 6em;
        }
      }
    </style>
  </head>
  <body>
    <div>
      <h1>Application is not available</h1>
      <p>The application is currently not serving requests at this endpoint. It may not have been started or is still starting.</p>      <div class="alert alert-info">
        <p class="info">
          Possible reasons you are seeing this page:
        </p>
        <ul>
          <li>
            <strong>The host doesn't exist.</strong>
            Make sure the hostname was typed correctly and that a route matching this hostname exists.
          </li>
          <li>
            <strong>The host exists, but doesn't have a matching path.</strong>
            Check if the URL path was typed correctly and that the route was created using the desired path.
          </li>
          <li>
            <strong>Route and path matches, but all pods are down.</strong>
            Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
          </li>
        </ul>
      </div>
    </div>
  </body>
</html>

Note:

The above scenario works fine if we create triggers using the yaml files instead of using devconsole UI

https://github.com/openshift/console/pull/12148

Bug MGMT-14803: Cluster update won't fail on incompatible OLM operator dependency

View the Description View the linked PRs

Description of the problem:

EnsureOperatorPrerequisite is using the cluster CPU architecture while on multi arch cluster the CPU architecture will always be multi. On update clusterm EnsureOperatorPrerequisite will not prevent the cluster from being updated but will fail on the next update request.

Steps to reproduce:

1. Register multi arch cluster (P or Z)

2. Update cluster with ODF operator

3. Update any cluster field

Actual results:

Cluster failed to update on the second time

Expected results:

Not to fail

https://github.com/openshift/assisted-service/pull/5264

Bug OCPBUGS-17191: Missing namespace label for several CMO alerts

View the Description View the linked PRs

Description of problem:

These alerts fire without a namespace label:
* KubeStateMetricsListErrors
* KubeStateMetricsWatchErrors
* KubeletPlegDurationHigh
* KubeletTooManyPods
* KubeNodeReadinessFlapping
* KubeletPodStartUpLatencyHigh

Alerting rules without a namespace label make it harder for cluster admins to route the alerts.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Check the definitions of the said alerting rules.

Actual results:

The PromQL expressions aggregate away the namespace label and there's no static namespace label either.

Expected results:

Static namespace label in the rule definition.

Additional info:

https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

Alerts SHOULD include a namespace label indicating the source of the alert. Many alerts will include this by virtue of the fact that their PromQL expressions result in a namespace label. Others may require a static namespace label

https://github.com/openshift/cluster-monitoring-operator/pull/2058

Bug OCPBUGS-11361: 4.14 cluster installation failed with TECH_PREVIEW featuregate

View the Description View the linked PRs

Description of problem:

4.14 cluster installation failed with TECH_PREVIEW featuregate

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-03-002631

How reproducible:

Always on GCP and Azure platform

Steps to Reproduce:

1. Install 4.14 cluster  with TECH_PREVIEW featuregate

Actual results:

Cluster Installation failed and shows below error

oc get pod -n openshift-kube-apiserver -l apiserver --show-labels

E0404 18:13:56.266461 73688 memcache.go:238] couldn't get current server API group list: Get "https://api.maxu-az-tp1.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 20.253.227.131:6443: i/o timeout

E0404 18:14:26.270883 73688 memcache.go:238] couldn't get current server API group list: Get "https://api.maxu-az-tp1.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 20.253.227.131:6443: i/o timeout

E0404 18:14:56.269363 73688 memcache.go:238] couldn't get current server API group list: Get "https://api.maxu-az-tp1.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 20.253.227.131:6443: i/o timeout

E0404 18:14:58.075111 73688 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

E0404 18:14:58.302392 73688 memcache.go:255] couldn't get resource list for security.openshift.io/v1: the server is currently unable to handle the request

E0404 18:14:58.309541 73688 memcache.go:255] couldn't get resource list for template.openshift.io/v1: the server is currently unable to handle the request

E0404 18:14:58.313497 73688 memcache.go:255] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request

NAME READY STATUS RESTARTS AGE LABELS

kube-apiserver-maxu-az-tp1-86n5v-master-2 4/5 CrashLoopBackOff 7 (2m41s ago) 16m apiserver=true,app=openshift-kube-apiserver,revision=16

Expected results:

Cluster Installation should be success and not show any error

Additional info:

https://issues.redhat.com/browse/OCPQE-14686

https://drive.google.com/file/d/1EHVuPFaSJA50R2k8uVVUVDvGDCfG9ZYN/view?usp=sharing

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/?job=*4.14*-tp-*
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/?job=*4.14*-techpreview*

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1477

Bug OCPBUGS-17693: BMO is not able to reach the IRONIC_ENDPOINT

View the Description View the linked PRs

Description of problem:

When testing AWS on-prem BM expansion, the BMO is not able to reach the IRONIC_ENDPOINT

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-10-021647

How reproducible:

100%

Steps to Reproduce:

1. Install IPI AWS 3-node-compact cluster
2. Deploy BMO via YAML
3. Connect AWS against external on-prem env via VPN (out of scope)
4. Create BMH using "preprovisioningNetworkDataName" to push static IP and routes.

Actual results:

BMO is not able to reach the Ironic endpoint with the following error:

~~~
2023-08-10T16:09:22.216778289Z {"level":"info","ts":"2023-08-10T16:09:22Z","logger":"provisioner.ironic","msg":"error caught while checking endpoint","host":"openshift-machine-api~openshift-qe-065","endpoint":"https://metal3-state.openshift-machine-api.svc.cluster.local:6385/v1/","error":"Get \"https://metal3-state.openshift-machine-api.svc.cluster.local:6385/v1\": dial tcp 172.30.19.119:6385: i/o timeout"}
~~~

Expected results:

Standard deploy

Additional info:

Must-gather provided separatedly

Bug OCPBUGS-6767: Regression: OpenShift Console no-longer filters SecretList when displaying ServiceAccount

View the Description View the linked PRs

Description of problem:

OpenShift Console does not filter the SecretList when displaying the ServiceAccount details page

When reviewing the details page of an OpenShift ServiceAccount, at the bottom of the page there is a SecretsList which is intended to display all of the relevant Secrets that are attached to the ServiceAccount.

In OpenShift 4.8.X, this SecretList only displayed the relevant Secrets. In OpenShift 4.9+ the SecretList now displays all Secrets within the entire Namespace.

Version-Release number of selected component (if applicable):

4.8.57 < Most recent release without issue
4.9.0 < First release with issue 
4.10.46 < Issue is still present

How reproducible:

Everytime

Steps to Reproduce:

1. Deploy a cluster with OpenShift 4.8.57 
      (or replace the OpenShift Console image with `sha256:9dd115a91a4261311c44489011decda81584e1d32982533bf69acf3f53e17540` )
2. Access the ServiceAccounts Page ( User Management -> ServiceAccounts)
3. Click a ServiceAccount to display the Details page
4. Scroll down and review the Secrets section
5. Repeat steps with an OpenShift 4.9 release 
   (or check using image `sha256:fc07081f337a51f1ab957205e096f68e1ceb6a5b57536ea6fc7fbcea0aaaece0` )

Actual results:

All Secrets in the Namespace are displayed

Expected results:

Only Secrets associated with the ServiceAccount are displayed

Additional info:

Lightly reviewing the code, the following links might be a good start:
- https://github.com/openshift/console/blob/master/frontend/public/components/secret.jsx#L126
- https://github.com/openshift/console/blob/master/frontend/public/components/service-account.jsx#L151:L151

https://github.com/openshift/console/pull/12679

Bug OCPBUGS-11143: [Azure] Replace master failed as new master did not add into lb backend

View the Description View the linked PRs

Description of problem:

On azure, delete a master, old machine stuck in Deleting, some pods in cluster are in ImagePullBackOff, check from azure console, new master did not add into lb backend, seems this lead the machine has no internet connection.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-12-024338

How reproducible:

Always

Steps to Reproduce:

1. Set up a cluster on Azure, networkType ovn
2. Delete a master
3. Check master and pod

Actual results:

Old machine stuck in Deleting,  some pods are in ImagePullBackOff.
 $ oc get machine    
NAME                                    PHASE      TYPE              REGION   ZONE   AGE
zhsunaz2132-5ctmh-master-0              Deleting   Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-1              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-2              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-flqqr-0        Running    Standard_D8s_v3   westus          105m
zhsunaz2132-5ctmh-worker-westus-dhwfz   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-dw895   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-xlsgm   Running    Standard_D4s_v3   westus          152m

$ oc describe machine zhsunaz2132-5ctmh-master-flqqr-0  -n openshift-machine-api |grep -i "Load Balancer"
      Internal Load Balancer:  zhsunaz2132-5ctmh-internal
      Public Load Balancer:      zhsunaz2132-5ctmh

$ oc get node            
NAME                                    STATUS     ROLES                  AGE    VERSION
zhsunaz2132-5ctmh-master-0              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-1              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-2              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-flqqr-0        NotReady   control-plane,master   109m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dhwfz   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dw895   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-xlsgm   Ready      worker                 152m   v1.26.0+149fe52
$ oc describe node zhsunaz2132-5ctmh-master-flqqr-0
  Warning  ErrorReconcilingNode       3m5s (x181 over 108m)  controlplane         [k8s.ovn.org/node-chassis-id annotation not found for node zhsunaz2132-5ctmh-master-flqqr-0, macAddress annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0" , k8s.ovn.org/l3-gateway-config annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0"]

$ oc get po --all-namespaces | grep ImagePullBackOf   
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-l8ng4                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-csi-drivers                      azure-file-csi-driver-node-99k82                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-node-tuning-operator             tuned-bvvh7                                                       0/1     ImagePullBackOff        0              113m
openshift-dns                                      node-resolver-2p4zq                                               0/1     ImagePullBackOff        0              113m
openshift-image-registry                           node-ca-vxv87                                                     0/1     ImagePullBackOff        0              113m
openshift-machine-config-operator                  machine-config-daemon-crt5w                                       1/2     ImagePullBackOff        0              113m
openshift-monitoring                               node-exporter-mmjsm                                               0/2     Init:ImagePullBackOff   0              113m
openshift-multus                                   multus-4cg87                                                      0/1     ImagePullBackOff        0              113m
openshift-multus                                   multus-additional-cni-plugins-mc6vx                               0/1     Init:ImagePullBackOff   0              113m
openshift-ovn-kubernetes                           ovnkube-master-qjjsv                                              0/6     ImagePullBackOff        0              113m
openshift-ovn-kubernetes                           ovnkube-node-k8w6j                                                0/6     ImagePullBackOff        0              113m

Expected results:

Replace master successful

Additional info:

Tested payload 4.13.0-0.nightly-2023-02-03-145213, same result.
Before we have tested in 4.13.0-0.nightly-2023-01-27-165107, all works well.

https://github.com/openshift/kubernetes/pull/1569

Bug OCPBUGS-7410: Cleanup of active VPC Endpoint Services

View the Description View the linked PRs

Description of problem:

If the HyperShift operator is installed onto a cluster, it creates VPC Endpoint Services fronting the hosted Kubernetes API Server for downstream HyperShift clusters to connect to. These VPC Endpoint Services are tagged such that the uninstaller would attempt to action them:

"kubernetes.io/cluster/${ID}: owned"

However they cannot be deleted until all active VPC Endpoint Connections are rejected - the uninstaller should be able to do this.

Version-Release number of selected component (if applicable):

4.12 (but shouldn't be version-specific)

How reproducible:

100%

Steps to Reproduce:

1. Create an NLB + VPC Endpoint Service in the same VPC as a cluster
2. Tag it accordingly and create a VPC Endpoint connection to it

Actual results:

The uninstaller will not be able to delete the VPC Endpoint Service + the NLB that the VPC Endpoint Service is fronting

Expected results:

The VPC Endpoint Service can be completely cleaned up, which would allow the NLB to be cleaned up

Additional info:

https://github.com/openshift/installer/pull/7101

Bug OCPBUGS-15060: "Duplicate RoleBinding" leads to "Unsupported value" error

View the Description View the linked PRs

Description of problem:

When clicking on "Duplicate RoleBinding" in the OpenShift Container Platform Web Console, users are taken to a form where they can review the duplicated RoleBinding.

When the RoleBinding has a ServiceAccount as a subject, clicking "Create" leads to the following error:

An error occurred
Error "Unsupported value: "rbac.authorization.k8s.io": supported values: """ for field "subjects[0].apiGroup".

The root cause seems to be that the field "subjects[0].apiGroup" is set to "rbac.authorization.k8s.io" even for "kind: ServiceAccount" subjects. For "kind: ServiceAccount" subjects, this field is not necessary but the "namespace" field should be set instead.

The functionality works as expected for User and Group subjects.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.19

How reproducible:

Always

Steps to Reproduce:

1. In the OpenShift Container Platform Web Console, click on "User Management" => "Role Bindings"
2. Search for a RoleBinding that has a "ServiceAccount" as the subject. On the far right, click on the dots and choose "Duplicate RoleBinding"
3. Review the fields, set a new name for the duplicated RoleBinding, click "Create"

Actual results:

Duplicating fails with the following error message being shown:

An error occurred
Error "Unsupported value: "rbac.authorization.k8s.io": supported values: """ for field "subjects[0].apiGroup".

Expected results:

RoleBinding is duplicated without an error message

Additional info:

Reproduced with OpenShift Container Platform 4.12.18 and 4.12.19

https://github.com/openshift/console/pull/12921

Bug OCPBUGS-14798: Need a little more informative Readme for Builder

View the Description View the linked PRs

Description of problem:

The readme.md of builder is just a one liner overview of project. It would be helpful to have some additional details added for new contributors/visitors of the project.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/346

Bug OCPBUGS-17770: Bootstrap is provisioned failed from azure marketplace image when pulisher is not matched with image plan

View the Description View the linked PRs

Description of problem:

Install IPI cluster where all nodes are provisioned from azure marketplace image with purchase plan.

install-config.yaml:
---------------------------
platform:
  azure:
    region: eastus
    baseDomainResourceGroupName: os4-common
    defaultMachinePlatform:
      osImage:
        publisher: Redhat  <----  contains uppercase letter
        offer: rh-ocp-worker
        sku: rh-ocp-worker
        version: 4.8.2021122100
        plan: WithPurchasePlan

as some marketplace images are free without plan, so pulisher in install-config should come from output of `az vm image list`

# az vm image list --offer rh-ocp-worker --all -otable
Architecture    Offer          Publisher       Sku                 Urn                                                             Version
--------------  -------------  --------------  ------------------  --------------------------------------------------------------  --------------
x64             rh-ocp-worker  redhat-limited  rh-ocp-worker       redhat-limited:rh-ocp-worker:rh-ocp-worker:4.8.2021122100       4.8.2021122100
x64             rh-ocp-worker  RedHat          rh-ocp-worker       RedHat:rh-ocp-worker:rh-ocp-worker:4.8.2021122100               4.8.2021122100
x64             rh-ocp-worker  redhat-limited  rh-ocp-worker-gen1  redhat-limited:rh-ocp-worker:rh-ocp-worker-gen1:4.8.2021122100  4.8.2021122100
x64             rh-ocp-worker  RedHat          rh-ocp-worker-gen1  RedHat:rh-ocp-worker:rh-ocp-worker-gen1:4.8.2021122100          4.8.2021122100

the image plan is as below, its publisher is lowercase.
# az vm image show --urn RedHat:rh-ocp-worker:rh-ocp-worker:4.8.2021122100 --query plan
{
  "name": "rh-ocp-worker",
  "product": "rh-ocp-worker",
  "publisher": "redhat"
}

From installer https://github.com/openshift/installer/blob/master/data/data/azure/bootstrap/main.tf#L243-L246, publisher property in image plan is from pulisher what we set in install-config.yaml, installer should use the publisher property from image plan output.

But image plan is case-sensitive, bootstrap instance is provisioned failed with below error in such case.

Unable to deploy from the Marketplace image or a custom image sourced from Marketplace image. The part number in the purchase information for VM '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima15image1-flg24-rg/providers/Microsoft.Compute/virtualMachines/jima15image1-flg24-bootstrap' is not as expected. Beware that the Plan object's properties are case-sensitive. Learn more about common virtual machine error codes.

similar errors when provisioning worker instances from this image where image publisher contains upper case but publisher in its plan is all lowercase.

worker machineset:
----------------------------
Spec:
  Lifecycle Hooks:
  Metadata:
  Provider ID:  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/ci-op-cc5g2rw8-55267-q66k7-rg/providers/Microsoft.Compute/virtualMachines/ci-op-cc5g2rw8-55267-q66k7-worker-southcentralus1-dq6sp
  Provider Spec:
    Value:
      Accelerated Networking:  true
      API Version:             machine.openshift.io/v1beta1
      Credentials Secret:
        Name:       azure-cloud-credentials
        Namespace:  openshift-machine-api
      Diagnostics:
        Boot:
          Storage Account Type:  AzureManaged
      Image:
        Offer:           rh-ocp-worker
        Publisher:       RedHat
        Resource ID:     
        Sku:             rh-ocp-worker
        Type:            WithPurchasePlan
        Version:         4.8.2021122100
      Kind:              AzureMachineProviderSpec
      Location:          southcentralus
      Managed Identity:  ci-op-cc5g2rw8-55267-q66k7-identity

error when provision worker instance:
Unable to deploy from the Marketplace image or a custom image sourced from Marketplace image. The part number in the purchase information for VM '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/ci-op-cc5g2rw8-55267-q66k7-rg/providers/Microsoft.Compute/virtualMachines/ci-op-cc5g2rw8-55267-q66k7-worker-southcentralus1-mmr2h' is not as expected. Beware that the Plan object's properties are case-sensitive. Learn more about common virtual machine error codes.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always on 4.14 for bootstrap/masters
Always on 4.11+ for workers

Steps to Reproduce:

1. Config osImage for all nodes in install-config, set publisher to RedHat 
2. install cluster.
3.

Actual results:

Bootstrap instance is provisioned failed.

Expected results:

installation is successful.

Additional info:

Installation is successful when setting publisher to "redhat"

https://github.com/openshift/installer/pull/7426

Bug OCPBUGS-7782: build regression on 4.13: ERROR: bash-5.0.11-r1.post-install: script exited with error 127

View the Description View the linked PRs

Description of problem:

A build which works on 4.12 errored out on 4.13.

Version-Release number of selected component (if applicable):

oc --context build02 get clusterversion version
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-ec.3   True        False         4d2h    Cluster version is 4.13.0-ec.3

How reproducible:

Always

Steps to Reproduce:

1. oc new-project hongkliu-test
2. oc create is test-is --as system:admin
3. oc apply -f test-bc.yaml # the file is in the attachment

Actual results:

oc --context build02 logs test-bc-5-build
Defaulted container "docker-build" out of: docker-build, manage-dockerfile (init)
time="2023-02-20T19:13:38Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I0220 19:13:38.405163       1 defaults.go:112] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".Pulling image image-registry.openshift-image-registry.svc:5000/ci/html-proofer@sha256:684aae4e929e596f7042c34a3604c81137860187305f775c2380774bda4b6b08 ...
Trying to pull image-registry.openshift-image-registry.svc:5000/ci/html-proofer@sha256:684aae4e929e596f7042c34a3604c81137860187305f775c2380774bda4b6b08...
Getting image source signatures
Copying blob sha256:aa8ae8202b42d1c70c3a7f65680eabc1c562a29227549b9a1b33dc03943b20d2
Copying blob sha256:31326f32ac37d5657248df0a6aa251ec6a416dab712ca1236ea40ca14322a22c
Copying blob sha256:b21786fe7c0d7561a5b89ca15d8a1c3e4ea673820cd79f1308bdfd8eb3cb7142
Copying blob sha256:68296e6645b26c3af42fa29b6eb7f5befa3d8131ef710c25ec082d6a8606080d
Copying blob sha256:6b1c37303e2d886834dab68eb5a42257daeca973bbef3c5d04c4868f7613c3d3
Copying blob sha256:cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08
Copying blob sha256:46cf6a1965a3b9810a80236b62c42d8cdcd6fb75f9b58d1b438db5736bcf2669
Copying config sha256:9aefe4e59d3204741583c5b585d4d984573df8ff751c879c8a69379c168cb592
Writing manifest to image destination
Storing signatures
Adding transient rw bind mount for /run/secrets/rhsm
STEP 1/4: FROM image-registry.openshift-image-registry.svc:5000/ci/html-proofer@sha256:684aae4e929e596f7042c34a3604c81137860187305f775c2380774bda4b6b08
STEP 2/4: RUN apk add --no-cache bash
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
(1/1) Installing bash (5.0.11-r1)
Executing bash-5.0.11-r1.post-install
ERROR: bash-5.0.11-r1.post-install: script exited with error 127
Executing busybox-1.31.1-r9.trigger
ERROR: busybox-1.31.1-r9.trigger: script exited with error 127
1 error; 21 MiB in 40 packages
error: build error: building at STEP "RUN apk add --no-cache bash": while running runtime: exit status 1

Expected results:

Additional info:

Run the build on build01 (4.12.4) and it works fine.

oc --context build01 get clusterversion version
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.4    True        False         2d11h   Cluster version is 4.12.4

https://github.com/openshift/builder/pull/335

Bug MGMT-13997: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5233

Bug OCPBUGS-10937: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1752

Bug OCPBUGS-12082: Update 4.14 ose-machine-api-provider-aws image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/64

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/67

Bug OCPBUGS-7690: [azure] Public DNS records are leftover without any error when destroying cluster with limited permission

View the Description View the linked PRs

Description of problem:

Following doc[1] to assign custom role with minimum permission for destroying cluster to installer Service Principle.

As read permission misses on public dns zone and private dns zone in that doc for destroying IPI cluster, public dns records have no permission to be removed.

But installer destroy is completed without any warning message.
$ ./openshift-install destroy cluster --dir ipi --log-level debug
DEBUG OpenShift Installer 4.13.0-0.nightly-2023-02-16-120330 
DEBUG Built from commit c0bf49ca9e83fd00dfdfbbdddd47fbe6b5cdd510 
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
DEBUG deleting public records                      
DEBUG deleting resource group                      
INFO deleted                                       resource group=jima-ipi-role-l7qgz-rg
DEBUG deleting application registrations           
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Master Ignition Customization Check" from disk 
DEBUG Purging asset "Worker Ignition Customization Check" from disk 
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
DEBUG Purging asset "Cluster" from disk            
INFO Time elapsed: 6m16s                          
INFO Uninstallation complete!                     

$ az network dns record-set a list --resource-group os4-common --zone-name qe.azure.devcluster.openshift.com  -o table| grep jima-ipi-role
*.apps.jima-ipi-role                                       os4-common       30     A       kubernetes.io_cluster.jima-ipi-role-l7qgz="owned"

$ az network dns record-set cname list --resource-group os4-common --zone-name qe.azure.devcluster.openshift.com  -o table| grep jima-ipi-role
api.jima-ipi-role                 os4-common       300    CNAME   kubernetes.io_cluster.jima-ipi-role-l7qgz="owned"

[1] https://docs.google.com/document/d/1iEs7T09Opj0iMXvpKeSatsAyPoda_gWQvFKQuWA3QdM/edit#

Version-Release number of selected component (if applicable):

4.13 nightly build

How reproducible:

always

Steps to Reproduce:

1. Create custom role with limited permission for destroying cluster, without read permission on public dns zone and private dns zone.
2. Assign the custom role to Service Principal
3. Use this SP to destroy cluster

Actual results:

Although some permissions missed, installer destroy cluster completed without any warning.

Expected results:

Installer should have some warning message that indicate resources leftover with some specific reason, so that user can process further.

Additional info:

https://github.com/openshift/installer/pull/7433

Bug OCPBUGS-17985: Ignition server rendering fails when image mirrors do not include openshift release mirrors

View the Description View the linked PRs

Description of problem:

When creating a hosted cluster on a management cluster that has an imagecontentsourcepolicy that does not include openshift-release-dev or ocp/release images, the control plane operator fails reconciliation with an error:

{"level":"error","ts":"2023-08-22T18:26:07Z","msg":"Reconciler error","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","HostedControlPlane":{"name":"jiezhao-test","namespace":"clusters-jiezhao-test"},"namespace":"clusters-jiezhao-test","name":"jiezhao-test","reconcileID":"9b3c101b-b4d2-4d9e-b71c-ede9e0b55374","error":"failed to update control plane: failed to reconcile ignition server: failed to parse private registry hosted control plane image reference \"\": repository name must have at least one component","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Create an ImageContentSourcePolicy on a management cluster:

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: brew-registry
  resourceVersion: "31794"
  uid: 7231c634-da35-4c56-b2ef-be48c2571a9c
spec:
  repositoryDigestMirrors:
  - mirrors:
    - brew.registry.redhat.io
    source: registry.redhat.io
  - mirrors:
    - brew.registry.redhat.io
    source: registry.stage.redhat.io
  - mirrors:
    - brew.registry.redhat.io
    source: registry-proxy.engineering.redhat.com


2. Install the latest hypershift operator and create a hosted cluster with the latest 4.14 ci build

Actual results:

The hostedcluster never creates machines and never gets to a Complete state

Expected results:

The hostedcluster comes up and gets to a Complete state

Additional info:

https://github.com/openshift/hypershift/pull/2935

Bug OCPBUGS-7581: Deleting unmanaged BMH get stuck

View the Description View the linked PRs

Description of problem:

When trying to delete a BMH object, which is unmanaged, the Metal3 cannot delete. The BMH object is unmanaged because it does not provide information about BMC (neither address, nor credentials).

In this case the Metal 3 tries to delete but fails and never finalizes. The BMH deletion gets stuc.
This is the log from MEtal3

{"level":"info","ts":1676531586.4898946,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676531586.4980938,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676531586.5050912,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676531586.5105371,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                
{"level":"info","ts":1676531586.51569,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org"}                                                                                            
{"level":"info","ts":1676531586.5191178,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                
{"level":"info","ts":1676531586.525755,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                 
{"level":"info","ts":1676531586.5356712,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                
{"level":"info","ts":1676532186.5117555,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532186.5195107,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532186.526355,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org"}                                                                                           
{"level":"info","ts":1676532186.5317476,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532186.5361836,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532186.5404322,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532186.5482726,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532186.555394,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532532.3448665,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/worker-1.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532532.344922,"logger":"controllers.BareMetalHost","msg":"hardwareData is ready to be deleted","baremetalhost":"openshift-machine-api/worker-1.el8k-ztp-1.hpecloud.org"}
{"level":"info","ts":1676532532.3656478,"logger":"controllers.BareMetalHost","msg":"Initiating host deletion","baremetalhost":"openshift-machine-api/worker-1.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged"}
{"level":"error","ts":1676532532.3656952,"msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","bareMetalHost":{"name":"worker-1.el8k-ztp-1.hpecloud.org","namespace":"openshift-machine-api"},
"namespace":"openshift-machine-api","name":"worker-1.el8k-ztp-1.hpecloud.org","reconcileID":"525a5b7d-077d-4d1e-a618-33d6041feb33","error":"action \"unmanaged\" failed: failed to determine current provisioner capacity: failed to parse BMC address informa
tion: missing BMC address","errorVerbose":"missing BMC address\ngithub.com/metal3-io/baremetal-operator/pkg/hardwareutils/bmc.NewAccessDetails\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/github.com/metal3-io/baremetal-operator/pkg/hardwareu
tils/bmc/access.go:145\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).bmcAccess\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:112\ngithub.com/metal3-io/baremetal-operator/pkg/pro
visioner/ironic.(*ironicProvisioner).HasCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:1922\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ensureCapacity\n\t/go/src/githu
b.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:83\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).updateHostStateFrom\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/meta
l3.io/host_state_machine.go:106\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:175\ngithub.com/metal
3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:186\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareM
etalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremet
al-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/contr
oller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/contro
ller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234\nruntime.goexit\
n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594\nfailed to parse BMC address information\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).bmcAccess\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/iro
nic/ironic.go:114\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).HasCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:1922\ngithub.com/metal3-io/baremetal-operator/controlle
rs/metal3%2eio.(*hostStateMachine).ensureCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:83\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).updateHostStateFrom\n
\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:106\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState.func1\n\t/go/src/github.com/metal3-io/baremetal-operator
/controllers/metal3.io/host_state_machine.go:175\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:186\ngithu
b.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controll
er.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/sr
c/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-
operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-
runtime/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594\nfailed to determine current provisioner capacity\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ensur
eCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:85\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).updateHostStateFrom\n\t/go/src/github.com/metal3-io/baremetal
-operator/controllers/metal3.io/host_state_machine.go:106\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machin
e.go:175\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:186\ngithub.com/metal3-io/baremetal-operator/contr
ollers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/gi
thub.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operato
r/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-r
untime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controll
er.go:234\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594\naction \"unmanaged\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operato
r/controllers/metal3.io/baremetalhost_controller.go:230\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/contr
oller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller
-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.
(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594","stacktrace":"sigs.k8s.io/cont
roller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/contr
oller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Provide a BMH object with no BMC credentials. The BMH is set unmanaged.

Steps to Reproduce:

1. delete the object
2. gets stuck
3.

Actual results:

get stuck deletiong

Expected results:

Metal3 detects the BMH is unmanaged, and dont try to do deprovisioning.

Additional info:

https://github.com/openshift/baremetal-operator/pull/280

Bug OCPBUGS-10864: APIServer service isn't selected correctly for PublicAndPrivate cluster when external-dns is not configured

View the Description View the linked PRs

Description of problem:

APIServer service not selected correctly for PublicAndPrivate when external-dns isn't configured. 
Image: 4.14 Hypershift operator + OCP 4.14.0-0.nightly-2023-03-23-050449

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jz-test -n clusters -ojsonpath='{.spec.platform.aws.endpointAccess}{"\n"}'
PublicAndPrivate

    - lastTransitionTime: "2023-03-24T15:13:15Z"
      message: Cluster operators console, dns, image-registry, ingress, insights,
        kube-storage-version-migrator, monitoring, openshift-samples, service-ca are
        not available
      observedGeneration: 3
      reason: ClusterOperatorsNotAvailable
      status: "False"
      type: ClusterVersionSucceeding

services:
  - service: APIServer
   servicePublishingStrategy:
    type: LoadBalancer
  - service: OAuthServer
   servicePublishingStrategy:
    type: Route
  - service: Konnectivity
   servicePublishingStrategy:
    type: Route
  - service: Ignition
   servicePublishingStrategy:
    type: Route
  - service: OVNSbDb
   servicePublishingStrategy:
    type: Route

jiezhao-mac:hypershift jiezhao$ oc get service -n clusters-jz-test | grep kube-apiserver
kube-apiserver            LoadBalancer  172.30.211.131  aa029c422933444139fb738257aedb86-9e9709e3fa1b594e.elb.us-east-2.amazonaws.com  6443:32562/TCP         34m
kube-apiserver-private        LoadBalancer  172.30.161.79  ab8434aa316e845c59690ca0035332f0-d818b9434f506178.elb.us-east-2.amazonaws.com  6443:32100/TCP         34m
jiezhao-mac:hypershift jiezhao$

jiezhao-mac:hypershift jiezhao$ cat hostedcluster.kubeconfig | grep server
  server: https://ab8434aa316e845c59690ca0035332f0-d818b9434f506178.elb.us-east-2.amazonaws.com:6443
jiezhao-mac:hypershift jiezhao$

jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig 
E0324 11:17:44.003589   95300 memcache.go:238] couldn't get current server API group list: Get "https://ab8434aa316e845c59690ca0035332f0-d818b9434f506178.elb.us-east-2.amazonaws.com:6443/api?timeout=32s": dial tcp 10.0.129.24:6443: i/o timeout

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create a PublicAndPrivate cluster without external-dns
2.access the guest cluster (it should fail)
3.

Actual results:

unable to access the guest cluster via 'oc get node --kubeconfig=<guest cluster kubeconfig>', some guest cluster co are not available

Expected results:

The cluster is up and running, the guest cluster can be accessed via 'oc get node --kubeconfig=<guest cluster kubeconfig>'

Additional info:

https://github.com/openshift/hypershift/pull/2328

Bug OCPBUGS-11072: Egress firewall node selector test missing

View the Description View the linked PRs

Dummy bug to track adding the test to openshift/origin.

https://github.com/openshift/origin/pull/27824

Bug OCPBUGS-12772: Cinder CSI metadata requests can be affected by proxy configuration

View the Description View the linked PRs

Description of problem:

Reported upstream in https://github.com/kubernetes/cloud-provider-openstack/issues/2217

Not specifically reproduced in OpenShift, but I have no reason to think we would not be affected, and I know we have users with strict proxy requirements.

The user's configuration requires all OpenStack API requests from the tenant network to go through a proxy. They have configured a proxy 'globally' in their cluster in a manner which also affects the CSI driver.

Attempting to attach a volume to a pod fails. Inspecting the logs we see that cinder attempted to attach the volume to the proxy server, not the node hosting the pod. The reason for this is that the metadata request was also proxied, meaning the returned values relate to the proxy server, not the local server.

Version-Release number of selected component (if applicable):

4.13, but likely all versions since we enabled CSI

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-openstack/pull/192

Bug OCPBUGS-17351: openshift-tests unable to run in microshift because of invariants

View the Description View the linked PRs

Description of problem:

Ever since the introduction of the latest invariants feature in origin, MicroShift is unable to run the conformance tests.
Failing invariants include load balancer, image registry and kube-apiserver (https://github.com/openshift/origin/blob/master/pkg/defaultinvariants/types.go#L48-L52) and they are tested for disruptions. These tests don't apply in MicroShift because some of those components don't exist, and none of them are HA.
Requiring the invariants without checking the platform breaks conformance testing in MicroShift.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Run `openshift-tests run openshift/conformance --provider none` with MicroShift kubeconfig.

Steps to Reproduce:

1. 
2.
3.

Actual results:

KUBECONFIG=~/.kube/config ./openshift-tests run openshift/conformance -v 2 --provider none
  Aug  3 11:37:39.859: INFO: MicroShift cluster with version: 4.14.0_0.nightly_2023_06_30_131338_20230703175041_1b2a630fc
I0803 11:37:39.859929    9250 test_setup.go:94] Extended test version v4.1.0-6883-g6ee9dc5
openshift-tests version: v4.1.0-6883-g6ee9dc5
  Aug  3 11:37:39.898: INFO: Enabling in-tree volume drivers
Attempting to pull tests from external binary...
Falling back to built-in suite, failed reading external test suites: unable to extract k8s-tests binary: failed reading ClusterVersion/version: the server could not find the requested resource (get clusterversions.config.openshift.io version)
  W0803 11:37:40.849399    9250 warnings.go:70] unknown field "spec.tls.externalCertificate"
Suite run returned error: [namespaces "openshift-image-registry" not found, the server could not find the requested resource (get infrastructures.config.openshift.io cluster)]
No manifest filename passed
error running options: [namespaces "openshift-image-registry" not found, the server could not find the requested resource (get infrastructures.config.openshift.io cluster)]error: [namespaces "openshift-image-registry" not found, the server could not find the requested resource (get infrastructures.config.openshift.io cluster)]

Expected results:

Tests running to completion.

Additional info:

A nice addition would be having additional presubmits in origin to run Microshift conformance to catch these things earlier.

https://github.com/openshift/origin/pull/28136

Task HOSTEDCP-1067: Dependency management with dependabot

View the Description View the linked PRs

Adding dependabot to manage to the go module dependencies of the HyperShift repository

https://github.com/openshift/hypershift/pull/2708

Bug MGMT-14395: Day-2 hosts stuck in insufficient due to error creating DNS resolution step

View the Description View the linked PRs

Description of the problem:

Day-2 host stuck in insufficient

How reproducible:

100%

Steps to reproduce:

1. See CI job

Actual results:

Day-2 host stuck in insufficient

Expected results:

Day-2 host becomes known

https://github.com/openshift/assisted-service/pull/5139

Bug OCPBUGS-13696: Warn about CBT enabled VMs via vsphere-problem-detector

View the Description View the linked PRs

We should check if CBT is enabled in cluster's nodes on vSphere platform.

1. Perform a full sweep and log each node which has CBT enabled.
2. Create an alert if some VMs have CBT enabled and other don't.
3. Alert should not be emitted if all VMs in cluster are uniformly CBT enabled.

This will avoid issues like - https://issues.redhat.com/browse/OCPBUGS-12249?filter=12399251

Bug OCPBUGS-14906: update dependencies for ironic-agent-image for OCP 4.14

View the Description View the linked PRs

dependencies for the ironic containers are quite old, we need to upgrade them to the latest available to keep up with upstream requirements

https://github.com/openshift/ironic-agent-image/pull/77

Bug OCPBUGS-18997: "Create StorageClass" form breaks when a dynamic provisioner is selected

View the Description View the linked PRs

Description of problem:

Please check: https://issues.redhat.com/browse/OCPBUGS-18702?focusedId=23021716&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23021716 for more details.

https://drive.google.com/drive/folders/14aSJs-lO6HC-2xYFlOTJtCZIQg3ekE85?usp=sharing (plz check recording "sc_form_typeerror.mp4").

Issues:
1. TypeError mentioned above.
2. Default params added by an extension are not getting added to the created StorageClass.
3. Validation for parameters added by an extension in not working correctly as well.
4. The Provisioner child details will be stuck once user selected 'openshift-storage.cephfs.csi.ceph.com'.

Version-Release number of selected component (if applicable):

4.14 (OCP)

How reproducible:

Steps to Reproduce:

1. Install ODF operator.
2. Create StorageSystem (once dynamic plugin is loaded).
3. Wait for a while for ODF related StorageClasses gets created.
4. Once they are created, go to "Create StorageSystem" form.
5. Switch to provisioners (rbd.csi.ceph) added by ODF dynamic plugin.

Actual results:

Page breaks with an error.

Expected results:

Page should not break.
And functionality should be how it was acting before the refactoring introduced by PR: https://github.com/openshift/console/pull/13036

Additional info:

Stack trace:
Caught error in a child component: TypeError: Cannot read properties of undefined (reading 'parameters')
    at allRequiredFieldsFilled (storage-class-form.tsx:204:1)
    at validateForm (storage-class-form.tsx:235:1)
    at storage-class-form.tsx:262:1
    at invokePassiveEffectCreate (react-dom.development.js:23487:1)
    at HTMLUnknownElement.callCallback (react-dom.development.js:3945:1)
    at Object.invokeGuardedCallbackDev (react-dom.development.js:3994:1)
    at invokeGuardedCallback (react-dom.development.js:4056:1)
    at flushPassiveEffectsImpl (react-dom.development.js:23574:1)
    at unstable_runWithPriority (scheduler.development.js:646:1)
    at runWithPriority$1 (react-dom.development.js:11276:1) {componentStack: '\n    at StorageClassFormInner (http://localhost:90...c03030668ef271da51f.js:491534:20)\n    at Suspense'}

https://github.com/openshift/console/pull/13170

Bug OCPBUGS-13549: Failed to create STS resources on AWS GovCloud regions using ccoctl

View the Description View the linked PRs

Description of problem:

Incorrect AWS ARN [1] is used for GovCloud and AWS China regions, which will cause the command `ccoctl aws create-all` to fail:

Failed to create Identity provider: failed to apply public access policy to the bucket ci-op-bb5dgq54-77753-oidc: MalformedPolicy: Policy has invalid resource
	status code: 400, request id: VNBZ3NYDH6YXWFZ3, host id: pHF8v7C3vr9YJdD9HWamFmRbMaOPRbHSNIDaXUuUyrgy0gKCO9DDFU/Xy8ZPmY2LCjfLQnUDmtQ=

Correct AWS ARN prefix:
GovCloud (us-gov-east-1 and us-gov-west-1): arn:aws-us-gov
AWS China (cn-north-1 and cn-northwest-1): arn:aws-cn

[1] https://github.com/openshift/cloud-credential-operator/pull/526/files#diff-1909afc64595b92551779d9be99de733f8b694cfb6e599e49454b380afc58876R211

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-05-11-024616

How reproducible:

Always

Steps to Reproduce:

1. Run command: `aws create-all --name="${infra_name}" --region="${REGION}" --credentials-requests-dir="/tmp/credrequests" --output-dir="/tmp"` on GovCloud regions
2.
3.

Actual results:

Failed to create Identity provider

Expected results:

Create resources successfully.

Additional info:

Related PRs:
4.10: https://github.com/openshift/cloud-credential-operator/pull/531
4.11: https://github.com/openshift/cloud-credential-operator/pull/530
4.12: https://github.com/openshift/cloud-credential-operator/pull/529
4.13: https://github.com/openshift/cloud-credential-operator/pull/528
4.14: https://github.com/openshift/cloud-credential-operator/pull/526

https://github.com/openshift/cloud-credential-operator/pull/537

Bug OCPBUGS-18266: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2959

Story HOSTEDCP-960: Create e2e for all conditions to match their happy expectation true / false

View the Description View the linked PRs

DoD:

Go through all conditions

https://github.com/openshift/hypershift/blob/main/api/v1beta1/nodepool_conditions.go

https://github.com/openshift/hypershift/blob/main/api/v1beta1/hostedcluster_conditions.go

Add an e2e test that validate all of them match the expected state on cluster creation.

https://github.com/openshift/hypershift/pull/2482

Bug OCPBUGS-13208: The size of PVC mounted by ibm-spectrum-scale-pmcollector-0 pod shows negative value from Openshift WebConsole

View the Description View the linked PRs

Description of problem:
The size of PVC/datadir-ibm-spectrum-scale-pmcollector-0 is displayed incorrectly in Openshift webconsole. The PVC size is shown as (negative) -17.6GiB.
Below is SC, PV and PVC details.

$ oc get storageclass
NAME                            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
ibm-spectrum-fusion-mgmt-sc     spectrumscale.csi.ibm.com      Delete          Immediate              true                   2d
ibm-spectrum-fusion (default)   spectrumscale.csi.ibm.com      Delete          Immediate              true                   2d
ibm-spectrum-scale-internal     kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  2d
ibm-spectrum-scale-sample       spectrumscale.csi.ibm.com      Delete          Immediate              false                  2d


$ oc get pv
control-1.ncw-az1-005.caas.bbtnet.com-pmcollector   25Gi          RWO           Retain           Bound    ibm-spectrum-scale/datadir-ibm-spectrum-scale-pmcollector-0                     ibm-spectrum-scale-internal  

$ oc get pvc  -A
NAMESPACE            NAME                                       STATUS   VOLUME                                              CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
ibm-spectrum-scale   datadir-ibm-spectrum-scale-pmcollector-0   Bound    control-1.ncw-az1-005.caas.bbtnet.com-pmcollector   25Gi       RWO            ibm-spectrum-scale-internal   3d


$ oc get pvc datadir-ibm-spectrum-scale-pmcollector-0 -n ibm-spectrum-scale
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: 'yes'
    pv.kubernetes.io/bound-by-controller: 'yes'
  resourceVersion: '5360546'
  name: datadir-ibm-spectrum-scale-pmcollector-0
  uid: 7a7d0609-0608-409f-91e1-209bb0b3c8d1
  creationTimestamp: '2023-05-01T14:13:40Z'
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2023-05-01T14:13:40Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
          'f:labels':
            .: {}
            'f:app.kubernetes.io/instance': {}
            'f:app.kubernetes.io/name': {}
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              .: {}
              'f:storage': {}
          'f:storageClassName': {}
          'f:volumeMode': {}
          'f:volumeName': {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2023-05-01T14:13:40Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:accessModes': {}
          'f:capacity':
            .: {}
            'f:storage': {}
          'f:phase': {}
      subresource: status
  namespace: ibm-spectrum-scale
  finalizers:
    - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: ibm-spectrum-scale
    app.kubernetes.io/name: pmcollector
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 25Gi
  volumeName: control-1.ncw-az1-005.caas.bbtnet.com-pmcollector
  storageClassName: ibm-spectrum-scale-internal
  volumeMode: Filesystem
status:
  phase: Bound
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 25Gi

==> However, when executing from pod ibm-spectrum-scale-pmcollector-0, the mountPath `/opt/IBM/zimon/data` where PVC/datadir-ibm-spectrum-scale-pmcollector-0 is mounted still shows that only 12K is used so far and 11G is the currently available space.

[C49904@openshift-eng-bastion-vm ~]$ oc rsh ibm-spectrum-scale-pmcollector-0
Defaulted container "pmcollector" out of: pmcollector, sysmon

sh-4.4$ df -Th | grep -iE 'size|zimon'
Filesystem     Type     Size  Used Avail Use% Mounted on
tmpfs          tmpfs     11G   12K   11G   1% /opt/IBM/zimon/config

Version-Release number of selected component (if applicable):

OCP 4.10.21
isf-operator.v2.4.0

How reproducible:

Steps to Reproduce:

1. by installing IBM Spectrum Scale 
2. 
3.

Actual results:

PVC size displayed from Openshift webconsole shows negative size value.

Expected results:

 
PVC size displayed from Openshift webconsole should not show negative size value.

Additional info:

https://github.com/openshift/console/pull/12867

Bug OCPBUGS-17410: Application group can not be deleted

View the Description View the linked PRs

Description of problem:

Application groups can not be deleted in topology

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create an application with an application group
2. Go to topology 
3. Delete the application group containing the application

Actual results:

Application group persists in topology

Expected results:

The application group should be deleted

Additional info:

Pipeline API is giving 404 even if the pipelines operator is not installed

https://github.com/openshift/console/pull/13074

Story TRT-1118: hypershift failing on latest payload

View the Description View the linked PRs

On https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.ci/release/4.14.0-0.ci-2023-06-30-020413, hypershift started permafailing

Example run https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn/1674671927274246144

https://github.com/openshift/hypershift/pull/2757

Bug OCPBUGS-16313: CCO SA/cloud-credential-operator cannot list ConfigMaps at the cluster scope

View the Description View the linked PRs

Description of problem:

CCO's ServiceAccount cannot list ConfigMaps at the cluster scope.

Steps to Reproduce:

1. Install an OCP cluster (4.14.0-0.nightly-2023-07-17-215017, CCO commit id = 0c80cc35f6ee4b45016050b3e5a8710a8ed4dd81) with default configuration (CCO in default mode)

2. Create a dummy CredentialsRequest as follows:
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: test-cr
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - action:
      - ec2:CreateTags
      effect: Allow
      resource: '*'
    stsIAMRoleARN: whatever
  secretRef:
    name: test-secret
    namespace: default
  serviceAccountNames:
  - default 

3. Check CCO Pod logs:
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="adding finalizer: cloudcredential.openshift.io/deprovision" controller=credreq cr=openshift-cloud-credential-operator/test-cr secret=default/test-secret
time="2023-07-18T10:02:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="stsFeatureGateEnabled: false" actuator=aws cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="stsDetected: false" actuator=aws cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="clusteroperator status updated" controller=status
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
W0718 10:02:45.352434       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:45.352460       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:02:46.512738       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:46.512763       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:02:48.859931       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:48.859957       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:02:53.514713       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:53.514798       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:03:03.042040       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:03:03.042068       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:03:25.023729       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:03:25.023758       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
time="2023-07-18T10:04:10Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-07-18T10:04:10Z" level=info msg="reconcile complete" controller=metrics elapsed=4.470475ms
W0718 10:04:11.033286       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:04:11.033311       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:04:42.316200       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:04:42.316223       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:05:40.852983       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:05:40.853008       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-07-18T10:06:10Z" level=info msg="reconcile complete" controller=metrics elapsed=3.531182ms
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
...

Bug OCPBUGS-15135: Topology sidebar doesn't show copy button for Knative routes

View the Description View the linked PRs

Description of problem:
Starting with OpenShift 4.13 we show a copy close to the OpenShift Route URL in the toplogy, the route list and detail page. But the Knative Route URL doesn't show this link as Vikram mentioned in this code review https://github.com/openshift/console/pull/12853#issuecomment-1594829827

Version-Release number of selected component (if applicable):
4.13+

How reproducible:
Always

Steps to Reproduce:

Install OpenShift Serverless operator
Import an application as Knative Service
Open the Service in the topology sidebar

Actual results:
Copy button is not shown

Expected results:
Copy button should be displayed

Additional info:

https://github.com/openshift/console/pull/12908

Bug OCPBUGS-13815: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/162

Bug OCPBUGS-14396: Ingress Operator E2E log.SetLogger was never called error message

View the Description View the linked PRs

Description of problem:

cluster-ingress-operator E2E has an error message:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:

Looks like newClient is called from two places, TestMain and TestIngressStatus

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Run E2E tests that call newClient, such as TestIngressStatus
2. Examine logs

Actual results:

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/924/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1663696029016395776/build-log.txt

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 9120 [running]:
runtime/debug.Stack()
	/usr/lib/golang/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc000113000, {0x1dd106b, 0x14})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x4c
github.com/go-logr/logr.Logger.WithName({{0x21435e0, 0xc000113000}, 0x0}, {0x1dd106b?, 0xe?})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/logr/logr.go:336 +0x46
sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc00086afc0, {0x0, 0xc0001a0fc0, {0x2144930, 0xc00033ac00}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:115 +0xb4
sigs.k8s.io/controller-runtime/pkg/client.New(0xc00086afc0?, {0x0, 0xc0001a0fc0, {0x2144930, 0xc00033ac00}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:101 +0x85
github.com/openshift/cluster-ingress-operator/pkg/operator/client.NewClient(0x0?)
	/go/src/github.com/openshift/cluster-ingress-operator/pkg/operator/client/client.go:83 +0x145
github.com/openshift/cluster-ingress-operator/test/e2e.TestIngressStatus(0xc000503520)
	/go/src/github.com/openshift/cluster-ingress-operator/test/e2e/dns_ingressdegrade_test.go:33 +0x95
testing.tRunner(0xc000503520, 0x1f015a0)
	/usr/lib/golang/src/testing/testing.go:1576 +0x10b
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:1629 +0x3ea

Expected results:

No error message

Additional info:

This is due to 1.27 rebase

https://github.com/openshift/cluster-ingress-operator/pull/946

Task OKD-174: Enable extension container (MCO)

View the Description View the linked PRs

Essentially unmerge Christian's previous merge in the MCO that disabled the extension container.

https://github.com/openshift/machine-config-operator/pull/3741

Bug OCPBUGS-11719: Load balancers/ Ingress controller removal race condition

View the Description View the linked PRs

Description of problem:

According to the slack thread attached: Cluster uninstallation is stuck when load balancers are removed before ingress controllers. This can happen when the ingress controller removal fails and the control plane operator moves on to deleting load balancers without waiting.

Code ref https://github.com/openshift/hypershift/blob/248cea4daef9d8481c367f9ce5a5e0436e0e028a/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L1505-L1520

Version-Release number of selected component (if applicable):

4.12.z 4.13.z

How reproducible:

Whenever the load balancer is deleted before the ingress controller

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Load balancer deletion waits for the ingress controller deletion

Additional info:

Slack: https://redhat-internal.slack.com/archives/C04EUL1DRHC/p1681310121904539?thread_ts=1681216434.676009&cid=C04EUL1DRHC

https://github.com/openshift/hypershift/pull/2444

Bug OCPBUGS-6465: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/16

Bug OCPBUGS-17925: Image registry pruner job fails when cluster was installed without DeploymentConfig capability

View the Description View the linked PRs

Description of problem:

Image registry pruner job fails when cluster was installed without DeploymentConfig capability. 

Cluster was installed only with the following capapbilities:
{\"capabilities\":{\"baselineCapabilitySet\": \"None\", \"additionalEnabledCapabilities\": [ \"marketplace\", \"NodeTuning\" ] }}"

image-pruner pods are failing with the following error:

    state:
      terminated:
        containerID: cri-o://69562d80cafb23a07b9f1d020e1943448916558986092d8540b9a0e1fc3731a1
        exitCode: 1
        finishedAt: "2023-08-21T00:07:37Z"
        message: |
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #1 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #2 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #3 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #4 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #5 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
        reason: Error
        startedAt: "2023-08-21T00:00:05Z"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-16-114741

How reproducible:

100%

Steps to Reproduce:

1. Install SNO cluster withou DeploymentConfig capability
2. Check image pruner jobs status

Actual results:

Image pruner jobs do not complete because deploymentconfigs.apps.openshift.io api is not available.

Expected results:

Image pruner jobs can run without deploymentconfigs api

Additional info:

https://github.com/openshift/oc/pull/1530

Bug OCPBUGS-12544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-cloud-csi-driver/pull/30

Bug OCPBUGS-12669: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vsphere-problem-detector/pull/111

Bug OCPBUGS-15992: OCP 4.14.0-ec.3 machine-api-controller pod crashing

View the Description View the linked PRs

Description of problem:


OCP deployments are failing with machine-api-controller pod crashing.

Version-Release number of selected component (if applicable):

OCP 4.14.0-ec.3

How reproducible:

Always

Steps to Reproduce:

1. Deploy a Baremetal cluster
2. After bootstrap is completed, check the pods running in the openshift-machine-api namespace
3. Check machine-api-controllers-* pod status (it goes from Running to Crashing all the time)
4. Deployment eventually times out and stops with only the master nodes getting deployed.

Actual results:

machine-api-controllers-* pod remains in a crashing loop and OCP 4.14.0-ec.3 deployments fail.

Expected results:

machine-api-controllers-* pod remains running and OCP 4.14.0-ec.3 deployments are completed

Additional info:

Jobs with older nightly releases in 4.14 are passing, but since Saturday Jul 10th, our CI jobs are failing

$ oc version
Client Version: 4.14.0-ec.3
Kustomize Version: v5.0.1
Kubernetes Version: v1.27.3+e8b13aa

$ oc get nodes
NAME       STATUS   ROLES                  AGE   VERSION
master-0   Ready    control-plane,master   37m   v1.27.3+e8b13aa
master-1   Ready    control-plane,master   37m   v1.27.3+e8b13aa
master-2   Ready    control-plane,master   38m   v1.27.3+e8b13aa

$ oc -n openshift-machine-api get pods -o wide
NAME                                                  READY   STATUS             RESTARTS        AGE   IP              NODE       NOMINATED NODE   READINESS GATES
cluster-autoscaler-operator-75b96869d8-gzthq          2/2     Running            0               48m   10.129.0.6      master-0   <none>           <none>
cluster-baremetal-operator-7c9cb8cd69-6bqcg           2/2     Running            0               48m   10.129.0.7      master-0   <none>           <none>
control-plane-machine-set-operator-6b65b5b865-w996m   1/1     Running            0               48m   10.129.0.22     master-0   <none>           <none>
machine-api-controllers-59694ff965-v4kxb              6/7     CrashLoopBackOff   7 (2m31s ago)   46m   10.130.0.12     master-2   <none>           <none>
machine-api-operator-58b54d7c86-cnx4w                 2/2     Running            0               48m   10.129.0.8      master-0   <none>           <none>
metal3-6ffbb8dcd4-drlq5                               6/6     Running            0               45m   192.168.62.22   master-1   <none>           <none>
metal3-baremetal-operator-bd95b6695-q6k7c             1/1     Running            0               45m   10.130.0.16     master-2   <none>           <none>
metal3-image-cache-4p7ln                              1/1     Running            0               45m   192.168.62.22   master-1   <none>           <none>
metal3-image-cache-lfmb4                              1/1     Running            0               45m   192.168.62.23   master-2   <none>           <none>
metal3-image-cache-txjg5                              1/1     Running            0               45m   192.168.62.21   master-0   <none>           <none>
metal3-image-customization-65cf987f5c-wgqs7           1/1     Running            0               45m   10.128.0.17     master-1   <none>           <none>

$ oc -n openshift-machine-api logs machine-api-controllers-59694ff965-v4kxb -c machine-controller | less
...
E0710 15:55:08.230413       1 logr.go:270] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"Metal3Remediation\" in version \"infrastructure.cluster.x-k8s.io/v1beta1\""  "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"Metal3Remediation"}
E0710 15:55:14.019930       1 controller.go:210]  "msg"="Could not wait for Cache to sync" "error"="failed to wait for metal3remediation caches to sync: timed out waiting for cache to be synced" "controller"="metal3remediation" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="Metal3Remediation" 
I0710 15:55:14.020025       1 logr.go:252]  "msg"="Stopping and waiting for non leader election runnables"  
I0710 15:55:14.020054       1 logr.go:252]  "msg"="Stopping and waiting for leader election runnables"  
I0710 15:55:14.020095       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machine-drain-controller" 
I0710 15:55:14.020147       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machineset-controller" 
I0710 15:55:14.020169       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machine-controller" 
I0710 15:55:14.020184       1 controller.go:249]  "msg"="All workers finished" "controller"="machineset-controller" 
I0710 15:55:14.020181       1 controller.go:249]  "msg"="All workers finished" "controller"="machine-drain-controller" 
I0710 15:55:14.020190       1 controller.go:249]  "msg"="All workers finished" "controller"="machine-controller" 
I0710 15:55:14.020209       1 logr.go:252]  "msg"="Stopping and waiting for caches"  
I0710 15:55:14.020323       1 logr.go:252]  "msg"="Stopping and waiting for webhooks"  
I0710 15:55:14.020327       1 reflector.go:225] Stopping reflector *v1alpha1.BareMetalHost (10h53m58.149951981s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020393       1 reflector.go:225] Stopping reflector *v1beta1.Machine (9h40m22.116205595s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020399       1 logr.go:252] controller-runtime/webhook "msg"="shutting down webhook server"  
I0710 15:55:14.020437       1 reflector.go:225] Stopping reflector *v1.Node (10h3m14.461941979s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020466       1 logr.go:252]  "msg"="Wait completed, proceeding to shutdown the manager"  
I0710 15:55:14.020485       1 reflector.go:225] Stopping reflector *v1beta1.MachineSet (10h7m28.391827596s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
E0710 15:55:14.020500       1 main.go:218] baremetal-controller-manager/entrypoint "msg"="unable to run manager" "error"="failed to wait for metal3remediation caches to sync: timed out waiting for cache to be synced"  
E0710 15:55:14.020504       1 logr.go:270]  "msg"="error received after stop sequence was engaged" "error"="leader election lost"

Our CI job logs can be seen here (RedHat SSO): https://www.distributed-ci.io/jobs/7da8ee48-8918-4a97-8e3c-f525d19583b8/files

https://github.com/openshift/cluster-api-provider-baremetal/pull/193

Bug OCPBUGS-497: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12928

Bug OCPBUGS-13535: AdditionalTrustBundle is only included when doing mirroring

View the Description View the linked PRs

Description of problem:

The AdditionalTrustBundle field in install-config.yaml can be used to add additional certs, however these certs are only propagated to the final image when the ImageContentSources field is also set for mirroring. If mirroring is not set then the additional certs will be on the bootstrap but not the final image.

This can cause a problem when user has set up a proxy and wants to add additional certs as described here https://docs.openshift.com/container-platform/4.12/networking/configuring-a-custom-pki.html#installation-configure-proxy_configuring-a-custom-pki

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. In install-config.yaml set additionalTrustBundle and don't set imageContentSources.
2. Do an installation using the install-config.yaml.
3. After the final image is installed and rebooted view the certs in /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt.

Actual results:

The certs defined in additionalTrustBundle are not in /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt.

Expected results:

The certs defined in additionalTrustBundle will be in /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt even when imgeContentSources are not defined.

Additional info:

https://github.com/openshift/installer/pull/7182

Bug OCPBUGS-14254: Restore script improvements

View the Description View the linked PRs

adding two minor flags for improvement in our CI tests:

https://github.com/openshift/cluster-etcd-operator/pull/1057

https://github.com/openshift/cluster-etcd-operator/pull/1057

Bug OCPBUGS-10809: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/107

Bug OCPBUGS-17975: Image Registry Pull through does not support IDMS/ITMS

View the Description View the linked PRs

Description of problem:

Pull-through only checks for ICSP, ignoring IDMS/ITMS.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create an IDMS/ITMS rule (TODO: add specifics)
example IDMS/ITMS specifics:  

apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  name: digest-mirror
spec:
  imageDigestMirrors:
  - mirrors:
    - registry.access.redhat.com/ubi8/ubi-minimal
    source: quay.io/podman/hello
    mirrorSourcePolicy: NeverContactSource

apiVersion: config.openshift.io/v1
kind: ImageTagMirrorSet
metadata:
  name: tag-mirror
spec:
  imageTagMirrors:
  - mirrors:
    - registry.access.redhat.com/ubi8/ubi-minimal
    source: quay.io/podman/hello
    mirrorSourcePolicy: NeverContactSource

2. Create an image stream with `referencePolicy: local`. Example: https://gist.github.com/flavianmissi/0518239edd6f51d54b5633212f2b2ac9 
3. Pull the image from the image stream created above. Example `oc new-app test-1:latest`

Actual results:

Expected results:

Additional info:

https://github.com/openshift/image-registry/pull/375

Bug OCPBUGS-13095: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1960

Bug OCPBUGS-10701: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/357

Bug OCPBUGS-13825: The machine-config-controller pod restart in SNO+1 cause other pods restart

View the Description View the linked PRs

Description of problem:
As a part of Chaos Monkey testing we tried to delete pod machine-config-controller in SNO+1. The pod machine-config-controller restart results in restart of daemonset/sriov-network-config-daemon and linuxptp-daemonpods pods as well.

1m47s       Normal   Killing            pod/machine-config-controller-7f46c5d49b-w4p9s    Stopping container machine-config-controller
1m47s       Normal   Killing            pod/machine-config-controller-7f46c5d49b-w4p9s    Stopping container oauth-proxy

openshift-sriov-network-operator   23m         Normal   Killing            pod/sriov-network-config-daemon-pv4tr   Stopping container sriov-infiniband-cni
openshift-sriov-network-operator   23m         Normal   SuccessfulDelete   daemonset/sriov-network-config-daemon   Deleted pod: sriov-network-config-daemon-pv4tr

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

Restart the machine-config-controller pod in openshift-machine-config-operator namespace. 
1. oc get pod -n openshift-machine-config-operator 
2. oc delete  pod/machine-config-controller-xxx -n openshift-machine-config-operator

Actual results:

It restarting the daemonset/sriov-network-config-daemon and linuxptp-daemonpods pods

Expected results:

It should not restart these pod

Additional info:

logs : https://drive.google.com/drive/folders/1XxYen8tzENrcIJdde8sortpyY5ZFZCPW?usp=share_link

https://github.com/openshift/machine-config-operator/pull/3838

Bug OCPBUGS-17677: [Azure]CNCC failed to assign egressIP to NIC for Azure Workload Identity Cluster

View the Description View the linked PRs

Description of problem:

CNCC failed to assign egressIP to NIC for Azure Workload Identity Cluster

Refer to https://issues.redhat.com/browse/CCO-294

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Created a Azure Workload Identity Cluster by "workflow-launch cucushift-installer-rehearse-azure-ipi-cco-manual-workload-identity-tp 4.14" from cluster-bot
2. Configure egressIP
3.

Actual results:

 % oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-3   10.0.128.100     

% oc get cloudprivateipconfig -o yaml
apiVersion: v1
items:
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-3
    creationTimestamp: "2023-08-14T04:41:05Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.128.100
    resourceVersion: "65159"
    uid: 2b7b1137-0e2e-46e8-9bca-1176330322a9
  spec:
    node: ci-ln-b4tlp9t-1d09d-2chnb-worker-centralus1-jgqp2
  status:
    conditions:
    - lastTransitionTime: "2023-08-14T04:41:17Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed"
        Message="The client ''d367c1b8-9f5d-4257-b5c8-363f61af32c2'' with object id
        ''d367c1b8-9f5d-4257-b5c8-363f61af32c2'' has permission to perform action
        ''Microsoft.Network/networkInterfaces/write'' on scope ''/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-ln-b4tlp9t-1d09d/providers/Microsoft.Network/networkInterfaces/ci-ln-b4tlp9t-1d09d-2chnb-worker-centralus1-jgqp2-nic'';
        however, it does not have permission to perform action ''Microsoft.Network/virtualNetworks/subnets/join/action''
        on the linked scope(s) ''/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-ln-b4tlp9t-1d09d/providers/Microsoft.Network/virtualNetworks/ci-ln-b4tlp9t-1d09d-2chnb-vnet/subnets/ci-ln-b4tlp9t-1d09d-2chnb-worker-subnet''
        or the linked scope(s) are invalid."'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: ci-ln-b4tlp9t-1d09d-2chnb-worker-centralus1-jgqp2
kind: List
metadata:
  resourceVersion: ""

Expected results:

EgressIP can be assigned to egress node

Additional info:

Bug OCPBUGS-19910: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1127

Bug OCPBUGS-4877: MCO warns unknown fields from ControllerConfig

View the Description View the linked PRs

Description of problem:

Upgraded from 4.11.17 -> 4.12.0 rc3 and found (after successful upgrade) this repeating in Machine Config Operator logs:

2022-12-13T23:11:51.511167249Z W1213 23:11:51.511120       1 warnings.go:70] unknown field "spec.dns.metadata.creationTimestamp"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511140       1 warnings.go:70] unknown field "spec.dns.metadata.generation"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511143       1 warnings.go:70] unknown field "spec.dns.metadata.managedFields"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511146       1 warnings.go:70] unknown field "spec.dns.metadata.name"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511148       1 warnings.go:70] unknown field "spec.dns.metadata.resourceVersion"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511151       1 warnings.go:70] unknown field "spec.dns.metadata.uid"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511153       1 warnings.go:70] unknown field "spec.infra.metadata.creationTimestamp"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511155       1 warnings.go:70] unknown field "spec.infra.metadata.generation"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511157       1 warnings.go:70] unknown field "spec.infra.metadata.managedFields"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511159       1 warnings.go:70] unknown field "spec.infra.metadata.name"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511161       1 warnings.go:70] unknown field "spec.infra.metadata.resourceVersion"
2022-12-13T23:11:51.511211644Z W1213 23:11:51.511163       1 warnings.go:70] unknown field "spec.infra.metadata.uid"

Version-Release number of selected component (if applicable):

4.12.0-rc3
Platform agnostic installation

How reproducible:

Just once (working with user outside RH)

Steps to Reproduce:

1. Install 4.11.17
2. Set candidate-4.12 upgrade channel
3. Initiate upgrade (apply admin ack as needed)
4. After upgrade, check Machine Config Operator logs

Actual results:

The upgrade went fine and I don't see any symptoms outside of warnings repeating in MCO log

Expected results:

I don't expect the warnings to be logged repeatedly

Additional info:

https://github.com/openshift/machine-config-operator/pull/3662

Bug OCPBUGS-11921: [gcp] IPI installation to a shared VPC with 'credentialsMode: Manual' failed, due to no IAM service accounts for control-plane machines and compute machines

View the Description View the linked PRs

Description of problem:

IPI installation to a shared VPC with 'credentialsMode: Manual' failed, due to no IAM service accounts for control-plane machines and compute machines

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-04-18-005127

How reproducible:

Always

Steps to Reproduce:

1. "create install-config", and then insert interested settings in install-config.yaml
2. "create manifests"
3. run "ccoctl" to create the required credentials
4. grant the above IAM service accounts the required permissions in the host project (see https://github.com/openshift/openshift-docs/pull/58474)
5. "create cluster"

Actual results:

The installer doesn't create the 2 IAM service accounts, one for control-plane machine and another for compute machine, so that no compute machine getting created, which leads to installation failure.

Expected results:

The installation should succeed.

Additional info:

FYI https://issues.redhat.com/browse/OCPBUGS-11605
$ gcloud compute instances list --filter='name~jiwei-0418'
NAME                        ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP  STATUS
jiwei-0418a-9kvlr-master-0  us-central1-a  n2-standard-4               10.0.0.62                 RUNNING
jiwei-0418a-9kvlr-master-1  us-central1-b  n2-standard-4               10.0.0.58                 RUNNING
jiwei-0418a-9kvlr-master-2  us-central1-c  n2-standard-4               10.0.0.29                 RUNNING
$ gcloud iam service-accounts list --filter='email~jiwei-0418'
DISPLAY NAME                                                     EMAIL                                                                DISABLED
jiwei-0418a-14589-openshift-image-registry-gcs                   jiwei-0418a--openshift-i-zmwwh@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-machine-api-gcp                      jiwei-0418a--openshift-m-5cc5l@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-cloud-credential-operator-gcp-ro-creds         jiwei-0418a--cloud-crede-p8lpc@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-gcp-ccm                              jiwei-0418a--openshift-g-bljz6@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-ingress-gcp                          jiwei-0418a--openshift-i-rm4vz@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-cloud-network-config-controller-gcp  jiwei-0418a--openshift-c-6dk7g@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-gcp-pd-csi-driver-operator           jiwei-0418a--openshift-g-pjn24@openshift-qe.iam.gserviceaccount.com  False
$

https://github.com/openshift/installer/pull/7117

Bug OCPBUGS-16693: Import page create button is disabled due to PAC validation

View the Description View the linked PRs

Description of problem:

When use selects "Use Pipeline from this cluster" oprtion from Add Pipeline section, then Create button should be enabled but due to PAC validation the Create button is disabled

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Always

Steps to Reproduce:

1. Go to Import from Git page
2. Add repository https://bitbucket.org/lokanandap/hello-func
3. Select Use Pipeline from this cluster in Add Pipeline section

Actual results:

Create button is disabled

Expected results:

Create button should be enabled to create the workload

Additional info:

https://github.com/openshift/console/pull/13046

Bug OCPBUGS-4501: IPv6 interface and address missing in all pods - OCP 4.12-ec-2 BM IPI

View the Description View the linked PRs

Description of problem:

IPV6 interface and IP is missing in all pods created in OCP 4.12 EC-2.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Every time

Steps to Reproduce:

We create network-attachment-definitions.k8s.cni.cncf.io in OCP cluster at namespace scope for our software pods to get IPV6 IPs.

Actual results:

Pods do not receive IPv6 addresses

Expected results:

Pods receive IPv6 addresses

Additional info:

This has been working flawlessly till OCP 4.10. 21 however we are trying same code in OCP 4.12-ec2 and we notice all our pods are missing ipv6 address and we have to restart pods couple times for them to get ipv6 address.

https://github.com/openshift/ironic-static-ip-manager/pull/35

Bug OCPBUGS-19771: OCP upgrade 4.13 to 4.14 fails with: an unknown error has occurred: MultipleErrors

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19418~~. The following is the description of the original issue:
—
Description of problem:

OCP Upgrades fail with message "Upgrade error from 4.13.X: Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"

Version-Release number of selected component (if applicable):

Currently 4.14.0-rc.1, but we observed the same issue with previous 4.14 nightlies too: 
4.14.0-0.nightly-2023-09-12-195514
4.14.0-0.nightly-2023-09-02-132842
4.14.0-0.nightly-2023-08-28-154013

How reproducible:

1 out of 2 upgrades

Steps to Reproduce:

1. Deploy OCP 4.13 with latest GA on a baremetal cluster with IPI and OVN-K
2. Upgrade to latest 4.14 available
3. Check cluster version status during the upgrade, at some point upgrade stops with message: "Upgrade error from 4.13.X Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"
4. Check OVN pods "oc get pods -n openshift-ovn-kubernetes", there are pods running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.
5. Check cluster operators "oc get co" mainly dns, network, and machine-config remained in 4.13 and degraded.

Actual results:

Upgrade not completed, and OVN pods remain in a restarting loop with failures.

Expected results:

Upgrade should be completed without issues, and OVN pods should remain in a Running status without restarts.

Additional info:

We have tested this with latest GA versions of 4.13 (as today Sep 19: 4.13.13 to 4.14.0-rc1), but we have been observing this since 20 days ago, with previous versions of 4.13 and 4.14.
Our deployments have single stack IPv4 , one NIC for provisioning and one NIC for baremetal (machine network)

These are the results from our latest test from 4.13.13 to 4.14.0-rc1

$ oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h8m   Unable to apply 4.14.0-rc.1: an unknown error has occurred: MultipleErrors

$ oc get mcp
NAME    CONFIG                                            UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
master  rendered-master-ebb1da47ad5cb76c396983decb7df1ea  True     False     False     3             3                  3                    0                     3h41m
worker  rendered-worker-26ccb35941236935a570dddaa0b699db  False    True      True      3             2                  2                    1                     3h41m

$ oc get co
NAME                                      VERSION      AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.14.0-rc.1  True       False        False     2h21m
baremetal                                 4.14.0-rc.1  True       False        False     3h38m
cloud-controller-manager                  4.14.0-rc.1  True       False        False     3h41m
cloud-credential                          4.14.0-rc.1  True       False        False     2h23m
cluster-autoscaler                        4.14.0-rc.1  True       False        False     2h21m
config-operator                           4.14.0-rc.1  True       False        False     3h40m
console                                   4.14.0-rc.1  True       False        False     2h20m
control-plane-machine-set                 4.14.0-rc.1  True       False        False     3h40m
csi-snapshot-controller                   4.14.0-rc.1  True       False        False     2h21m
dns                                       4.13.13      True       True         True      2h9m
etcd                                      4.14.0-rc.1  True       False        False     2h40m
image-registry                            4.14.0-rc.1  True       False        False     2h9m
ingress                                   4.14.0-rc.1  True       True         True      1h14m
insights                                  4.14.0-rc.1  True       False        False     3h34m
kube-apiserver                            4.14.0-rc.1  True       False        False     2h35m
kube-controller-manager                   4.14.0-rc.1  True       False        False     2h30m
kube-scheduler                            4.14.0-rc.1  True       False        False     2h29m
kube-storage-version-migrator             4.14.0-rc.1  False      True         False     2h9m
machine-api                               4.14.0-rc.1  True       False        False     2h24m
machine-approver                          4.14.0-rc.1  True       False        False     3h40m
machine-config                            4.13.13      True       False        True      59m
marketplace                               4.14.0-rc.1  True       False        False     3h40m
monitoring                                4.14.0-rc.1  False      True         True      2h3m
network                                   4.13.13      True       True         True      2h4m
node-tuning                               4.14.0-rc.1  True       False        False     2h9m
openshift-apiserver                       4.14.0-rc.1  True       False        False     2h20m
openshift-controller-manager              4.14.0-rc.1  True       False        False     2h20m
openshift-samples                         4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager                4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager-catalog        4.14.0-rc.1  True       False        False     2h18m
operator-lifecycle-manager-packageserver  4.14.0-rc.1  True       False        False     2h20m
service-ca                                4.14.0-rc.1  True       False        False     2h23m
storage                                   4.14.0-rc.1  True       False        False     3h40m

Some OVN pods are running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                    READY  STATUS   RESTARTS  AGE    IP             NODE
ovnkube-control-plane-5f5c598768-czkjv  2/2    Running  0         2h16m  192.168.16.32  dciokd-master-1
ovnkube-control-plane-5f5c598768-kg69r  2/2    Running  0         2h16m  192.168.16.31  dciokd-master-0
ovnkube-control-plane-5f5c598768-prfb5  2/2    Running  0         2h16m  192.168.16.33  dciokd-master-2
ovnkube-node-9hjv9                      5/5    Running  1         3h43m  192.168.16.32  dciokd-master-1
ovnkube-node-fmswc                      7/8    Running  19        2h10m  192.168.16.36  dciokd-worker-2
ovnkube-node-pcjhp                      7/8    Running  20        2h15m  192.168.16.35  dciokd-worker-1
ovnkube-node-q7kcj                      5/5    Running  1         3h43m  192.168.16.33  dciokd-master-2
ovnkube-node-qsngm                      5/5    Running  3         3h27m  192.168.16.34  dciokd-worker-0
ovnkube-node-v2d4h                      7/8    Running  20        2h15m  192.168.16.31  dciokd-master-0

$ oc logs ovnkube-node-9hjv9 -c ovnkube-node -n openshift-ovn-kubernetes | less
...
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112660    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Northbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.112699529Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.112699529Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1)
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112677    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1
2023-09-19T03:40:23.114791313Z E0919 03:40:23.114777    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_NORTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.114791313Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.114791313Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116478    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Southbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.116492808Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.116492808Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116488    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1
2023-09-19T03:40:23.118468064Z E0919 03:40:23.118450    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_SOUTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.118468064Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.118468064Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:25.118085671Z E0919 03:40:25.118056    5883 ovn_northd.go:128] Failed to get ovn-northd status stderr() :(failed to run the command since failed to get ovn-northd's pid: open /var/run/ovn/ovn-northd.pid: no such file or directory)

https://github.com/openshift/cluster-network-operator/pull/2035

Bug OCPBUGS-16571: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/71

Bug OCPBUGS-19336: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13159

Bug OCPBUGS-17916: A few fixes are needed for IC configmap lookup in CNO status logic

View the Description View the linked PRs

Description: During an upgrade from non-IC to IC, the CNO status logic looks up a well-known configmap that indicates whether the an upgrade to IC is ongoing in order not to report the new operator version (4.14) until the second and final phase of the IC upgrade is done.

The following corrections are needed:

CNO shouldn't report new version if IC configmap can't be retrieved for whatever reason, as suggested in a review to the CNO PR that enabled IC support: https://github.com/openshift/cluster-network-operator/pull/1874#pullrequestreview-1560616992
looking up the key "ongoing-upgrade" inside the IC configmap is enough; checking for its value to be "true" is not correct, since after code reviews it was decided not to set the value to "true", but to set it to an empty string;
(optimization) CNO shouldn't look up the IC configmap if the cluster is not running ovn kubernetes

https://github.com/openshift/cluster-network-operator/pull/1954

Task ODC-7309: Remove left integration-tests reviewers

View the Description View the linked PRs

Remove Hemant and Sparsh from integration-tests reviewers

https://github.com/openshift/console/pull/12802

Bug OCPBUGS-10181: Update 4.14 openshift-enterprise-egress-dns-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/133

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/133

Bug OCPBUGS-17090: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1164

Bug OCPBUGS-19850: Outgoing traffic throughs EgressRouter is broken

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18003~~. The following is the description of the original issue:
—
Description of problem:

Found auto case OCP-42340 failed in ci job which version is 4.14.0-ec.4 and then reproduced issue in 4.14.0-0.nightly-2023-08-22-221456

Version-Release number of selected component (if applicable):

4.14.0-ec.4 4.14.0-0.nightly-2023-08-22-221456

How reproducible:

Always

Steps to Reproduce:

1. Deploy egressrouter on baremetal with 
{
    "kind": "List",
    "apiVersion": "v1",
    "metadata": {},
    "items": [
        {
            "apiVersion": "network.operator.openshift.io/v1",
            "kind": "EgressRouter",
            "metadata": {
                "name": "egressrouter-42430",
                "namespace": "e2e-test-networking-egressrouter-l4xgx"
            },
            "spec": {
                "addresses": [
                    {
                        "gateway": "192.168.111.1",
                        "ip": "192.168.111.55/24"
                    }
                ],
                "mode": "Redirect",
                "networkInterface": {
                    "macvlan": {
                        "mode": "Bridge"
                    }
                },
                "redirect": {
                    "redirectRules": [
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 80,
                            "protocol": "TCP"
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8080,
                            "protocol": "TCP",
                            "targetPort": 80
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8888,
                            "protocol": "TCP",
                            "targetPort": 80
                        }
                    ]
                }
            }
        }
    ]
}

 % oc get pods -n  e2e-test-networking-egressrouter-l4xgx -o wide
NAME                                           READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
egress-router-cni-deployment-c4bff88cf-skv9j   1/1     Running   0          69m   10.131.0.26   worker-0   <none>           <none>

2. Create service which point to egressrouter
% oc get svc -n e2e-test-networking-egressrouter-l4xgx -o yaml  
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2023-08-23T05:58:30Z"
    name: ovn-egressrouter-multidst-svc
    namespace: e2e-test-networking-egressrouter-l4xgx
    resourceVersion: "50383"
    uid: 07341ff1-6df3-40a6-b27e-59102d56e9c1
  spec:
    clusterIP: 172.30.10.103
    clusterIPs:
    - 172.30.10.103
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: con1
      port: 80
      protocol: TCP
      targetPort: 80
    - name: con2
      port: 5000
      protocol: TCP
      targetPort: 8080
    - name: con3
      port: 6000
      protocol: TCP
      targetPort: 8888
    selector:
      app: egress-router-cni
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""

  3. create a test pod to access the service or curl the egressrouter IP:port directly 
oc rsh -n e2e-test-networking-egressrouter-l4xgx hello-pod1                                  
~ $ curl 172.30.10.103:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
~ $ curl 10.131.0.26:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
 $ curl 10.131.0.26:8080 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms

Actual results:

  connection failed

Expected results:

  connection succeed

Additional info:
Note, the issue didn't exist in 4.13. It passed in 4.13 latest nightly build 4.13.0-0.nightly-2023-08-11-101506

08-23 15:26:16.955  passed: (1m3s) 2023-08-23T07:26:07 "[sig-networking] SDN ConnectedOnly-Author:huirwang-High-42340-Egress router redirect mode with multiple destinations."

https://github.com/openshift/egress-router-cni/pull/78

Bug OCPBUGS-11434: Cluster monitoring operator runs node-exporter with btrfs collector

View the Description View the linked PRs

Description of problem:

node-exporter profiling shows that ~16% of CPU time is spend fetching details about btrfs mounts. RHEL kernel doesn't have btrfs, so its safe to disable this exporter

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1937

Bug OCPBUGS-12345: Update 4.14 telemeter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/460

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/telemeter/pull/464

Bug OCPBUGS-15605: CoreDNS UDP bufsize unnecessarily restricted to 512

View the Description View the linked PRs

Description of problem:

As endorsed at DNS Flag Day, the DNS Community recommends a bufsize setting of 1232 as a safe default that supports larger payloads, while generally avoiding IP fragmentation on most networks. This is particularly relevant for payloads like those generated by DNSSEC, which tend to be larger.

Previously, CoreDNS always used the EDNS0 extension, which enables UDP-based DNS queries to exceed 512 bytes, when CoreDNS forwarded DNS queries to an upstream name server, and so OpenShift specified a bufsize setting of 512 to maintain compatibility with applications and name servers that did not support the EDNS0 extension.

For clients and name servers that do support EDNS0, a bufsize setting of 512 can result in more DNS truncation and unnecessary TCP retransmissions, resulting in worse DNS performance for most users. This is due to the fact that if a response is larger than the bufsize setting, it gets truncated, prompting clients to initiate a TCP retry. In this situation, two DNS requests are made for a single DNS answer, leading to higher bandwidth usage and longer response times.

Currently, CoreDNS no longer uses EDNS0 when forwarding requests if the original client request did not use EDNS0 (ref: coredns/coredns@a5b9749), and so the reasoning for using a bufsize setting of 512 no longer applies. By increasing the bufsize setting to the recommended value of 1232 bytes, we can enhance DNS performance by decreasing the probability of DNS truncations.

Using a larger bufsize setting of 1232 bytes also would potentially help alleviate bugs like https://issues.redhat.com/browse/OCPBUGS-6829 in which a non-compliant upstream DNS is not respecting a bufsize of 512 bytes and sending larger-than-512-bytes responses. A bufsize setting of 1232 bytes doesn't fix the root cause of this issue; rather, it decreases the likelihood of its occurrence by increasing the acceptable size range for UDP responses.

Note that clients that don’t support EDNS0 or TCP, such as applications built using older versions of Alpine Linux, are still subject to the aforementioned truncation issue. To avoid these issues, ensure that your application is built using a DNS resolver library that supports EDNS0 or TCP-based DNS queries.

Brief history of OpenShift's Bufsize changes:

During the development of OpenShift 4.8.0, we updated to 1232 bytes due to Bug - 1949361 and backported to 4.7 and 4.6. However, later on, 4.8.0 (in development), 4.7, and 4.6 were reverted back to 512 bytes due to Bug - 1966116.
Also in OpenShift 4.8.0, we bumped CoreDNS to v1.8.1, and picked up a commit that forced DNS queries that did not have the DO Bit (DNSSEC) set to set bufsize as 2048 bytes despite 512 bytes being set in the configuration.
In OpenShift 4.12.0, we fixed ~~OCPBUGS-240~~ to limit all DNS queries, specifically queries that had DO Bit off, to what is configured in the configuration file (512 bytes) and we backported the fix to 4.11, 4.10, and 4.9.
Now, this PR is changing bufsize to 1232 bytes.

Version-Release number of selected component (if applicable):

4.14, 4.13, 4.12. 4.11

How reproducible:

100%

Steps to Reproduce:

1. oc -n openshift-dns get configmaps/dns-default -o yaml | grep -i bufsize

Actual results:

Bufsize = 512

Expected results:

Bufsize = 1232

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/370

Bug OCPBUGS-20184: network-node-identity-* pods should be run as non-root

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20104~~. The following is the description of the original issue:
—
Description of problem:

The recently introduced node identify feature introduces pods that are running as root. While it's understood there may be situations where that is absolutely required, the goal should be to always run with least privilege / non-root.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Deploy an IBM Managed OpenShift 4.14.0 cluster. I suspect any OpenShift 4.14.0 cluster will have these pods running as root as well.

Actual results:

network-node-identity pods are running as root

Expected results:

network-node-identity pods should be running as non-root

Additional info:

Due to the introduction of these pods running as root in an IBM Managed OpenShift 4.14.0 cluster, we will have to file for a security exception.

https://github.com/openshift/cluster-network-operator/pull/2054

Bug MGMT-14728: [Staging] - Events search message "\" return wrong events

View the Description View the linked PRs

Description of the problem:

Cluster events search for message=\ , or message=%5C returns all writing image to disk messages.
e.g. "Host: test-infra-cluster-f5e3a8e9-master-1, reached installation stage Writing image to disk: 5%"

How reproducible:

100%

Steps to reproduce:

1.Install cluster

2. List events with message=\ , or message=%5C

curl -s -v  --location --request GET 'https://api.stage.openshift.com/api/assisted-install/v2/events?cluster_id=2aa44b94-e533-44fe-9c0f-3b20a3d91b4e&message=%5C' --header "Authorization: Bearer $(ocm token)" | jq '.'

curl -s -v  --location --request GET 'https://api.stage.openshift.com/api/assisted-install/v2/events?cluster_id=2aa44b94-e533-44fe-9c0f-3b20a3d91b4e&message=\' --header "Authorization: Bearer $(ocm token)" | jq '.'

Actual results:

All "writing image to disk" are returns

Expected results:

Only events including '\' returns

https://github.com/openshift/assisted-service/pull/5252

Bug MGMT-15128: Nutanix 4.14 fail to install due to CVO deployment failure

View the Description View the linked PRs

Description of the problem:

CVO 4.14 failed to install when Nutanix platform provider is selected.

{
"cluster_id": "c8359d4e-141b-45ff-9979-d49dd679d56b",
"name": "cvo",
"operator_type": "builtin",
"status": "failed",
"status_updated_at": "2023-06-29T07:40:47.855Z",
"timeout_seconds": 3600,
"version": "4.14.0-0.nightly-2023-06-27-233015"
}

e.g https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-assisted-test-infra-master-e2e-nutanix-assisted-periodic/1674303871989583872

How reproducible:

Steps to reproduce:

Actual results:

Expected results:

Bug OCPBUGS-11640: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2400

Bug OCPBUGS-14025: Improve storage must-gather to collect vSphere CRs

View the Description View the linked PRs

We need to improve our must-gather so as we can collect CRs on which vSphere CSI driver depends.

IMO they contain vital cluster state and not collecting them makes certain part of CSI driver debugging way harder than it needs to be.

https://github.com/openshift/must-gather/pull/363

Bug OCPBUGS-14816: typo in 4.14 CHANGELOG.md and CONTRIBUTING.md

View the Description View the linked PRs

Description of problem:

see issue from: https://github.com/openshift/cluster-monitoring-operator/issues/1992

https://github.com/openshift/cluster-monitoring-operator/pull/1994

Bug OCPBUGS-12629: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/48

Bug OCPBUGS-14811: Update OWNERS and OWNERS_ALIASES in external-provisioner repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-provisioner/pull/66

Task MGMT-14449: [testing] Create hosts in the past to reduce race conditions

View the Description View the linked PRs

Some unit tests are flaky because we check timestamps to have changed.

When creation and test happen very quickly, this might seem to not have changed.

https://redhat-internal.slack.com/archives/C014N2VLTQE/p1681827276489839

We can fix this by simulating host creation to have happened in the past

https://github.com/openshift/assisted-service/pull/5160

Bug OCPBUGS-14484: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/347

Bug OCPBUGS-13718: [azure] Installer doesn't validate diskType on ASH which lead to install fails with unsupported disktype

View the Description View the linked PRs

Description of problem:

IPI install on azure stack failed when setting platform.azure.osDiks.diskType as StandardSSD_LRS in install-config.yaml.

When setting controlPlane.platform.azure.osDisk.diskType as StandardSSD_LRS, get error in terraform log and some resources have been created.

level=error msg=Error: expected storage_os_disk.0.managed_disk_type to be one of [Premium_LRS Standard_LRS], got StandardSSD_LRS
level=error
level=error msg=  with azurestack_virtual_machine.bootstrap,
level=error msg=  on main.tf line 107, in resource "azurestack_virtual_machine" "bootstrap":
level=error msg= 107: resource "azurestack_virtual_machine" "bootstrap" {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: expected storage_os_disk.0.managed_disk_type to be one of [Premium_LRS Standard_LRS], got StandardSSD_LRS
level=error
level=error msg=  with azurestack_virtual_machine.bootstrap,
level=error msg=  on main.tf line 107, in resource "azurestack_virtual_machine" "bootstrap":
level=error msg= 107: resource "azurestack_virtual_machine" "bootstrap" {
level=error
level=error

When setting compute.platform.azure.osDisk.diskType as StandardSSD_LRS, fail to provision compute machines

$ oc get machine -n openshift-machine-api
NAME                                     PHASE     TYPE              REGION   ZONE   AGE
jima414ash03-xkq5x-master-0              Running   Standard_DS4_v2   mtcazs          62m
jima414ash03-xkq5x-master-1              Running   Standard_DS4_v2   mtcazs          62m
jima414ash03-xkq5x-master-2              Running   Standard_DS4_v2   mtcazs          62m
jima414ash03-xkq5x-worker-mtcazs-89mgn   Failed                                      52m
jima414ash03-xkq5x-worker-mtcazs-jl5kk   Failed                                      52m
jima414ash03-xkq5x-worker-mtcazs-p5kvw   Failed                                      52m

$ oc describe machine jima414ash03-xkq5x-worker-mtcazs-jl5kk -n openshift-machine-api
...
  Error Message:           failed to reconcile machine "jima414ash03-xkq5x-worker-mtcazs-jl5kk": failed to create vm jima414ash03-xkq5x-worker-mtcazs-jl5kk: failure sending request for machine jima414ash03-xkq5x-worker-mtcazs-jl5kk: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="Storage account type 'StandardSSD_LRS' is supported by Microsoft.Compute API version 2018-04-01 and above" Target="osDisk.managedDisk.storageAccountType"
...

Based on azure-stack doc[1], supported disk types on ASH are Premium SSD, Standard HDD. It's better to do validation for diskType on Azure Stack to avoid above errors.

[1]https://learn.microsoft.com/en-us/azure-stack/user/azure-stack-managed-disk-considerations?view=azs-2206&tabs=az1%2Caz2#cheat-sheet-managed-disk-differences

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-16-085836

How reproducible:

Always

Steps to Reproduce:

1. Prepare install-config.yaml, set platform.azure.osDiks.diskType as StandardSSD_LRS
2. Install IPI cluster on Azure Stack
3.

Actual results:

Installation failed

Expected results:

Installer validate diskType on AzureStack Cloud, and exit for unsupported disk type with error message

Additional info:

https://github.com/openshift/installer/pull/7194

Bug OCPBUGS-14123: TestBodySizeLimit test is flaky

View the Description View the linked PRs

The TestBodySizeLimit is increasingly flaky. We need to investigate and fix it.
https://search.ci.openshift.org/?search=FAIL%3A+TestBodySizeLimit&maxAge=48h&context=2&type=all&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/cluster-monitoring-operator/pull/1991

Bug OCPBUGS-1662: mcd_update_state metric should have a single time-series per node

View the Description View the linked PRs

Seen in build02, currently running 4.12.0-ec.3:

mcd_update_state{node="build0-gstfj-m-0.c.openshift-ci-build-farm.internal"}

returns:

Those are identical, except:

The first has config populated with rendered-... and has a non-zero value.
The second has config empty and has a zero value.

Looking at the backing code, my guess is that we're doing something like this:

Things are happy; export with a populated config.
Things get sad. Export with an empty config and a new error. But the happy time-series sticks around, and somehow has the value move to zero.
Things get happy again; and we return to setting a value for the happy time-series. But the sad time-series sticks around, and somehow has the value move to zero.

Or something like that. I expect we want to drop the zero-valued time-series, but I'm not clear enough on how the MCO pushes values into the export set to have code suggestions.

https://github.com/openshift/machine-config-operator/pull/3571

Bug OCPBUGS-6770: Pipeline doesn't render correctly when displayed but looks fine in edit mode

View the Description View the linked PRs

When displaying my pipeline it is not rendered correctly with overlapping segments between parallel branches. However if I edit the pipeline then it appears fine. I have attached screenshots showing the issue.

This is a regression from 4.11 where it rendered fine.

https://github.com/openshift/console/pull/12722

Bug OCPBUGS-103: Service Binding Operator installation fails: "A subscription for this operator already exists in namespace ..."

View the Description View the linked PRs

Description of problem:
When "Service Binding Operator" is successfully installed in the cluster for the first time, the page will automatically redirect to Operator installation page with the error message "A subscription for this Operator already exists in Namespace "XXX" "

Notice: This issue only happened when the user installed "Service Binding Operator" for the first time. If the user uninstalls and re-installs the operator again, this issue will be gone

Version-Release number of selected components (if applicable):
4.12.0-0.nightly-2022-08-12-053438

How reproducible:
Always

Steps to Reproduce:

Login to OCP web console. Go to Operators -> OperatorHub page
Install "Service Binding Operator", wait until finish, check the page

Actual results:
The page will redirect to Operator installation page with the error message "A subscription for this Operator already exists in Namespace "XXX" "

Expected results:
The page should stay on the install page, with the message "Installed operator- ready for use"

Additional info:

Please find the attached snap for more details

https://github.com/openshift/console/pull/12704

Bug OCPBUGS-14132: SCOS times out during provisioning of BM nodes

View the Description View the linked PRs

Description of problem:

SCOS times out during provisioning of BM nodes

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-image/pull/377

https://github.com/openshift/ironic-image/pull/377

Bug OCPBUGS-14665: Helm Chart installation screen fails to render if JSON schema contains remote $refs

View the Description View the linked PRs

Description of problem:

In Helm Charts we define a values.schema.json file - a JSON schema for all the possible values the user can set in a chart. This schema needs to follow JSON schema standard. The standard includes something called $ref - a reference to the either local or remote definition. If we use a schema with remote references in OCP, it causes various troubles in OCP. Different OCP versions gives different results, also on the same OCP version you can get different results based on how tight down the cluster networking is.

Prerequisites (if any, like setup, operators/versions):

Tried in Developer Sandbox, OpenShift Local, Baremetal Public Cluster in Operate First, OCP provisioned through clusterbot. It behaves differently in each instance. Individual cases are described below.

Steps to Reproduce

1. Go to the "Helm" tab in Developer Perspective
2. Click "Create" in top right and select "Repository"
3. Use following ProjectHelmChartRepository resource and click "Create" (this repo contains single chart, this chart has values.schema.json with content linked below):

apiVersion: helm.openshift.io/v1beta1
kind: ProjectHelmChartRepository
metadata:
  name: reproducer
spec:
  connectionConfig:
    url: https://raw.githubusercontent.com/tumido/helm-backstage/reproducer

4. Go back the "Helm" tab in Developer Perspective
5. Click "Create" in top right and select "Helm Release"
6. In filters section of the catalog in the "Chart repositories" select "Reproducer"
7. Click on the single tile available (Backstage)
8. Click "Install Helm Chart"
9. Either you will be greeted with various error screens or you see the "YAML view" tab (this tab selection is not the default and is remembered during user session only I suppose)
10. Select "Form view"

Actual results:

Various error screens depending on OCP version and network restrictions. I've attached screen captures how it behaves in different settings.

Expected results:

Either render the form view (resolve remote references) or make it obvious that remote references are not supporter. Optionally fallback to the "YAML view" regarding that user doesn't have full schema available, but the chart is still deployable.

Reproducibility (Always/Intermittent/Only Once):

Depends on the environment
Always in OpenShift Local, Developer Sandbox, cluster bot clusters

Build Details:

Workaround:

1. Select any other chart to install, click "Install Helm Chart"
2. Change the view to "YAML view"
3. Go back to the Helm catalog without actually deploying anything
4. Select the faulty chart and click "Install Helm Chart"
5. Proceed with installation

Additional info:

https://github.com/openshift/console/pull/12929

Bug OCPBUGS-14934: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/682

Bug OCPBUGS-16569: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/78

Bug OCPBUGS-8231: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2243

Story API-1586: Rebase kubernetes and openshift-apiserver

View the Description View the linked PRs

The kubernetes-apiserver and openshift-apiserver need to be rebased to k8s 1.27.x after the o/k rebase is completed.

https://github.com/openshift/kubernetes-apiserver

https://github.com/openshift/openshift-apiserver

https://github.com/openshift/openshift-apiserver/pull/371

Story HOSTEDCP-1003: Make sure AWS specific conditions are only set if platform is AWS

View the Description View the linked PRs

The new test introduced by https://issues.redhat.com/browse/HOSTEDCP-960 fails for platforms other than AWS because some AWS specific conditions like `ValidAWSIdentityProvider` are always set regardless of the platform.

https://github.com/openshift/hypershift/pull/2604

Bug OCPBUGS-9378: OpenShift IPI installer uses BIOS instead of UEFI as the boot option on VMware

View the Description View the linked PRs

OCP Version at Install Time: 4.11-fc.3
RHCOS Version at Install Time: 411.86.202206172255-0
Platform: vSphere
Architecture: x86_64

I'm trying to verify that the IPI installer uses UEFI when creating VMs on VMware, following https://github.com/coreos/coreos-assembler/pull/2762 (merged Mar 19).

However, the 4.11.0-fc.3 installer taken from https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.11.0-fc.3/openshift-install-linux.tar.gz still seems to use BIOS.

Reproducing:

1. Run openshift-install against a VMware vSphere cluster.
2. Wait for an OpenShift VM (bootstrap, control, or worker node) to show up in vCenter.
3. Go to the VM's boot options - the firmware is set to BIOS instead of UEFI, which was supposed to be set by default.

https://github.com/openshift/installer/pull/7154

Bug OCPBUGS-11958: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1970

Bug OCPBUGS-14489: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/139

Bug OCPBUGS-9969: 4.1 born cluster fails to scale-up due to podman run missing `--authfile` flag

View the Description View the linked PRs

Description of problem:

OCP cluster born on 4.1 fails to scale-up node due to older podman version 1.0.2 present in 4.1 bootimage. This was observed while testing bug https://issues.redhat.com/browse/OCPBUGS-7559?focusedCommentId=21889975&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21889975

Journal log:
- Unit machine-config-daemon-update-rpmostree-via-container.service has finished starting up.
--
-- The start-up result is RESULT.
Mar 10 10:41:29 ip-10-0-218-217 podman[18103]: flag provided but not defined: -authfile
Mar 10 10:41:29 ip-10-0-218-217 podman[18103]: See 'podman run --help'.
Mar 10 10:41:29 ip-10-0-218-217 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Main process exited, code=exited, status=125/n/a
Mar 10 10:41:29 ip-10-0-218-217 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Failed with result 'exit-code'.
Mar 10 10:41:29 ip-10-0-218-217 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Consumed 24ms CPU time

Version-Release number of selected component (if applicable):

OCP 4.12 and later

Steps to Reproduce:

1.Upgrade a 4.1 based cluster to 4.12 or later version
2. Try to Scale up node
3. Node will fail to join

Additional info: https://issues.redhat.com/browse/OCPBUGS-7559?focusedCommentId=21890647&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21890647

https://github.com/openshift/machine-config-operator/pull/3611

Story OPECO-2737: [DOWNSTREAM] rename `veneer` to `template` for all code, docs references

View the Description View the linked PRs

This is the downstreaming issue for the upstream operator-registry changes. Upstream olm-docs repo will be downstreamed as part of later docs updates.
https://docs.google.com/document/d/139yXeOqAJbV1ndC7Q4NbaOtzbSdNpcuJan0iemORd3g/

-------------------------------------------

Veneer is viewed as a confusing and counter-intuitive term. PM floated `catalog template` (`template` for short) as a replacement and it's resonated sufficiently with folks that we want to update references/commands to use the new term.

A/C:

updates to all upstream docs (olm.operatorframework.io)
updates to hackmd references (hierarchy head at https://hackmd.io/O-DelGCnRbSmioFYnuBqkA)
updates to operator-registry commands (strongly prefer to also make changes to code paths, module names, etc. to make the change consistently)
updates to the generated demo for semver (or deletion.... really, the thing here is to be consistent)
Docs audit (collaboration with docs Michael Peter and Alex Dellapenta )
creation of a new downstreaming story to populate the changes to master, 4.12 so that early adopters aren't ambushed by what is merely a name change.

https://github.com/openshift/operator-framework-olm/pull/461

Bug OCPBUGS-14341: Deleting CR's from any operator page doesn't indicate is the resource being deleted or not

View the Description View the linked PRs

Description of problem:

When we delete any CR from the common OCP operator page, it would be good to add a indication that resource being deleted or atleast to grey out the dot at the right corner as the user perspective.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. Go to Operators -> installed operators -> click any installed operator -> click CRD name from header tab -> delete any CR from list page using kebab menu.
2. No indication about deletion, user can do any action even after deletion is triggered.

Actual results:

 No indication about deletion on kebab menu

Expected results:

grey out the dot and display the tooltip about deletion.

Additional info:

https://github.com/openshift/console/pull/11860 is not fixing this issue for operator page.

https://github.com/openshift/console/pull/13042

Bug OCPBUGS-17496: Bridge NAD should set "preserveDefaultVlan": false - Clone

View the Description View the linked PRs

Description of problem:

This ticket was created to track: https://issues.redhat.com/browse/CNV-31770

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13076

Bug OCPBUGS-19380: Hide the Builds NavItem if BuildConfig is not installed in the cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18464~~. The following is the description of the original issue:
—
Description of problem:

Hide the Builds NavItem if BuildConfig is not installed in the cluster

https://github.com/openshift/console/pull/13167

Bug OCPBUGS-19849: [vsphere] dual-stack install fails nodes stuck in node.cloudprovider.kubernetes.io/uninitialized

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18641~~. The following is the description of the original issue:
—
Description of problem:

vSphere Dual-stack install fails in bootstrap.
All nodes are node.cloudprovider.kubernetes.io/uninitialized

cloud-controller-manager can't find the nodes?

I0906 15:05:22.922183       1 search.go:49] WhichVCandDCByNodeID called but nodeID is empty
E0906 15:05:22.922187       1 nodemanager.go:197] shakeOutNodeIDLookup failed. Err=nodeID is empty

Version-Release number of selected component (if applicable):

4.14.0-0.ci.test-2023-09-06-141839-ci-ln-98f4iqb-latest

How reproducible:

Always

Steps to Reproduce:

1. Install vSphere IPI with OVN Dual-stack

platform:
  vsphere:
    apiVIPs:
      - 192.168.134.3
      - fd65:a1a8:60ad:271c::200
    ingressVIPs:
      - 192.168.134.4
      - fd65:a1a8:60ad:271c::201
networking:
  networkType: OVNKubernetes
  machineNetwork:
  - cidr: 192.168.0.0/16
  - cidr: fd65:a1a8:60ad:271c::/64
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd65:10:128::/56
    hostPrefix: 64
  serviceNetwork:
  - 172.30.0.0/16
  - fd65:172:16::/112

Actual results:

Install fails in bootstrap

Expected results:

Install succeeds

Additional info:

I0906 15:03:21.393629       1 search.go:69] WhichVCandDCByNodeID by UUID
I0906 15:03:21.393632       1 search.go:76] WhichVCandDCByNodeID nodeID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406797       1 search.go:208] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406816       1 search.go:210] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2, UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406830       1 nodemanager.go:159] Discovered VM using normal UUID format
I0906 15:03:21.416168       1 nodemanager.go:268] Adding Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2
I0906 15:03:21.416218       1 nodemanager.go:438] Adding Internal IP: 192.168.134.60
I0906 15:03:21.416229       1 nodemanager.go:443] Adding External IP: 192.168.134.60
I0906 15:03:21.416244       1 nodemanager.go:349] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416266       1 nodemanager.go:351] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2 UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416278       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 421b78c3-f8bb-970c-781b-76827306e89e
E0906 15:03:21.416326       1 node_controller.go:236] error syncing 'ci-ln-bllxr6t-c1627-5p7mq-master-2': failed to get node modifiers from cloud provider: provided node ip for node "ci-ln-bllxr6t-c1627-5p7mq-master-2" is not valid: failed to get node address from cloud provider that matches ip: fd65:a1a8:60ad:271c::70, requeuing
I0906 15:03:21.623573       1 instances.go:102] instances.InstanceID() CACHED with ci-ln-bllxr6t-c1627-5p7mq-master-1

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/283

Bug OCPBUGS-14622: Do not fail creating cgroups

View the Description View the linked PRs

Description of problem:

Upgrade from 4.12 > 4.13 will cause the cpuset-configure.service to faile, because `mkdir` wasn't persistent for `/sys/fs/cgroup/cpuset/system` and `/sys/fs/cgroup/cpuset/machine.slice`.

Version-Release number of selected component (if applicable):

How reproducible:

Extremely (probably for every upgrade to the NTO)

Steps to Reproduce:

1. Upgrade from 4.12
2. Service will fail...

Actual results:

Expected results:

Service should start/finish correctly

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/683

Bug OCPBUGS-14714: Cluster network operator in IBM ROKS is crashlooping

View the Description View the linked PRs

Description of problem:

The cluster network operator crashes in an IBM ROKS with the following error:
2023-06-07T12:21:37.402285420-05:00 stderr F I0607 17:21:37.402108       1 log.go:198] Failed to render: failed to render multus admission controller manifests: failed to render file bindata/network/multus-admission-controller/admission-controller.yaml: failed to render manifest bindata/network/multus-admission-controller/admission-controller.yaml: template: bindata/network/multus-admission-controller/admission-controller.yaml:199:12: executing "bindata/network/multus-admission-controller/admission-controller.yaml" at <.HCPNodeSelector>: map has no entry for key "HCPNodeSelector"

Version-Release number of selected component (if applicable):

4.13.1

How reproducible:

Always

Steps to Reproduce:

1. Run a ROKS cluster with OCP 4.13.1
2.
3.

Actual results:

CNO crashes

Expected results:

CNO functions normally

Additional info:

ROKS worked ok with 4.13.0
This change was introduced in 4.13.1:
https://github.com/openshift/cluster-network-operator/pull/1802

https://github.com/openshift/cluster-network-operator/pull/1835

Bug OCPBUGS-14936: baremetal-runtimecfg: remove duplicate word in log message

View the Description View the linked PRs

Description of problem:
Duplicate using in log message.

log.Infof("For node %s selected peer address %s using using OVN annotations.", node.Name, addr)

Version-Release number of selected component (if applicable):

4.14

How reproducible:
always

Steps to Reproduce:

1. code review
2.
3.

Actual results:

log.Infof("For node %s selected peer address %s using using OVN annotations.", node.Name, addr)

Expected results:

log.Infof("For node %s selected peer address %s using OVN annotations.", node.Name, addr)

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/260

Bug OCPBUGS-7692: Helm detail page never stops loading when helm API fails to load

View the Description View the linked PRs

Description of problem:
Installed and uninstalled some helm charts, and got an issue that the helm details page couldn't be loaded successfully. This issue exists also in old versions and is aligned with ~~OCPBUGS-7517~~.

If the backend fails to load the frontend never stops loading the helm details page.

Version-Release number of selected component (if applicable):
Details page never stops loading

How reproducible:
Always with the Helm chart secret below.

Steps to Reproduce:
Unable to reproduce this manually again.

But you can apply the Secret at the end to any namespace.

You can create this in any namespace, but because it contains a namespace info "christoph" the helm list page links to an non existing URL. You can fix that manually or use the namespace "christoph".

Actual results:

Helm release detail page never finishes loading

Expected results:

Helm release detail page should load fine

Additional info:

Secret to reproduce this issue:

kind: Secret
apiVersion: v1
metadata: 
  name: sh.helm.release.v1.dotnet.v1
  labels: 
    name: dotnet
    owner: helm
    status: deployed
    version: '1'
data: 
  release: >-
    SDRzSUFBQUFBQUFDLytTOWEzT2JUTkl3L0ZmMDZ2NzRPZ2tnS3h1NWFqOFlZaUVVaVVUSTRyVFoybUlHREVqRDRSRWdHZTJULy83VXpBQUNoR3pMY1pMcjNyMnFyb3JGWWVqcGMvZDB6L3k3SDFxQjA3L3AyMUVhT21uL3F1K0hEMUgvNXQvOUIzK2JwUCt5blJoRnVXUDNiL29jd3czZU1kdzc5dnFlRzl4Y2oyNVk3djNINFhBMFpKa2g5Lzh6N0EzRDlLLzZ5SHJOVzdhRG5KUThUMzRrY092SHFSK0YvWnUrRkNhcGhWQVBSa0dNSCtwZjlaUFVTck1FQTExKzU2b2ZScW1ETDMwUGpTamI5dDdMZC9jOUs0NTdmdElEbVk5c1AzVC92OTU5MU52NXpyNlhlZzY5MmtPUm0xejF0bGw0OHozOEhrYVFYT2dCK0lIaW8vZnUzVU9FVUxUSGQrVW9kWHFwWjZXOUhIL2lNL2w0NElScGIrOGoxTnM2Y2JSTmU5LzdkOXV0RkZpdTh5MUQ2SHUvWjRWMjczdS91c0piY1BQMTRlRjd2NWVGcVk5cXNQaEpOY24zdmE4aGRMcnZYZEhQKzNoQSttWGc5S3dzalFJcjlhR0ZVTjdiUmdnNWRpL0swdmY5SDFkOTZGbmJGTk0wY0ZMTHRsSUwvOTJtKzg3WkpoVGp6SHZtUFh0Q2g5dmV4RUZCajR6VlM2TUNManc1U29VSzVjaUhGbjRuNlYvMU4wNitqN1oyMHIvNVIzK0w1eHM0K0hMeDBYOWU5YTNZVjZzUDc3aitWZDhLd3lndEJyajVONFg5WDlrVzlXLzZYcHJHeWMySEQ2NmZlaGw0RDZQZ1F4UTdZZUw1RCtrN3owSEJPL0owOHFINForc2d4MHFjNUlNZDdVTVVXZmFIcldON1Z2cU9mdjhkbVdqWHRmZXBlK2ovK0hIVlJ4SGM5Ry9DREtHcmZ1b0VNYklJbC8yalFsOTE4WVA4OWY1dStUNTl4TGlrT080N2d5U1ZSQlJJd25CcGFvL0kwR1UwMjZERFVoc2ViSEdjQUlFZlBTeGlFd3pVWEJLR1h4VjE0Um82djVkRWRKREVLV3Rwanh0TEc0YlNrbCtCbk9jc1RSMUlFeVV5bDd4dmF5Z3hCVDRCbkgyWUNYeHVhOWNmQlRmZUdUbTlKb25UOVd5US9rMFNVV1p3ajZ3cHJsd3BVSGEyT0VTMk1Nd01qVVdTZjV0SkUzWWtDVXhxQnFNRWlLT0I0TVpmd1VCQitEdUd2bkFkYmNSQ243OHpkVDRCQTVTYTJwQ1JKbllNeEwwTEEzVVBCbE5HRXFaakdFNm5RQnVIcHNxelFOejdrampPVE9IV1gycXNaM0xxd3RZZWswVXdYbHZNS0RCOXliVzFJV05wZTljV1BWVE9sYzViM2dHZFQweGRRVE9mL3dZQ0diWG1ITU9jWHdPTzNRTlJaY3psdm9ReEp0OWY4Z05MZTB3a2NZb2tjY3phNGlnMWRDVTJ1SEVDSmhzWGtubXFHMGtjc2Jady9ZWFNTTTFNUW91Ly83My80NnFEdVAveUhCUTcyK1I5R3FNMmZSVmtCaWd6bDdlK0tZNFlFS2pNTEJoNlFGdjVrc0JpK3Y3TnlmbU5xWm1lclQweVRWNGd6N2t6WkhwZ29pS1lEME1nam54RDIyY2dHS2ZtYXNTWitqUzNORXdQZGlTRTZkOW1TeDZCWU9IT2RIWWt1SGhzeGpWRk5iQzBJWktFNlFZTWxNelVGeGtReDc2cFBSNGsvelo5MEprdmxxZ21ZRGs4V01Kb2JZbmozUDRjdWM0Z2NXY2JPVEwwS1ZQQ2dwL0YxeTF0dUFZVGRkT1lWeWdqSUtwcld4emw5OGZ4c3hwc3NlbmZaZ3ZPODJDNHlCWTZ2MWNETlljYzJnR2Y4TG9ISjdlWk5WQjlVNTltbU1Zd0g4WWdKL004V05vbysrcnpmM1B5czJOOGtpWmpsdkpuRXg4WWJpdzdzeUJsalVETk1ieW1QczhzN2RNT2FPUE0wR3hrQ3F6djNCZnpSbE04Rnc5eXEyekZxYmtkb0xXNVBNSUlCandDb1J4Wm1zbk1BclNiRGFZc0NKVVlhS3VQa2xqSTBXMkJmMjI0SWZBOFFRL0lxWW1weVF3WVRPZUdOa1ZnTWkvNTR4eE9pSXhTZlBBeENPVEV4bnhRcHpIbWthWGt6cDdHYlF4Q21URzA0ZHJzbVBzOUdZTXYrTFNZQzRFcituS2V2MUdLOFhsZmZsK250S0RQamxrd1diaGZudEU3WDVhM21ScU1FMXRURCtWNGhkeGdXbjZrY0ZaeVVjajJrREUwV1BKb0liZUV2OC9JTFRGU01Bb2ZmUGQ5YmdXb1V6ZHJodmJJWWw0eFFqVUc0aUl6dGFGbkJJK0k2b1RZZ3lMU2F2eFo2S0hoRG9wcUJqa3ZOc01GNFRON2ZmZkY0bEJtZm83Y0JSbEwrUXk0WVdCcDhBdlFWTWJRRk04Vzd6NEsvcTFMYUZmUW8xUFdLaDF5VGVZckNYeFM4QTE1WHhKNFFxL3VkeDg5STFBVG1CUGUrQ1NKd3hnRUNnTGh3cFhwbkE1UVZOZGYzY2dsZW5EQ3MvYm42SXNrM0xyU1JObVI2d0w1eHRiU2hwdFNKby8wb3ZwNkZoVHZDa1B5SEpFQkF0dXRLNGtFL28vUzVLd05valJkTmVkWjBZWGFqOGpVNytwOFVPSGUxcFc5clM4eWdtL2gxbGZFMGRyaTFKemFtNVhmNUs4VGVQZTJMa2NyVGwzRFFHV09jUE9ONnpVODFHVHh3bkZiT2tvUytBTzI5d2EzS3VuSU9EcVB4NTVZK29MU1FMVGppaDdDcld2cjAvanN0ME0xdDRqOFZyRG1wbVpRdkhmd05nelVvSzJ2VCtjajcwQ29JR2VpM0ZtNlZNQ040V3BjUC9zTmd4dGx0cWhlMjNkS0RQMldiaWx3RFFkS2J1Z0tNZ2ViTmg3dUMvd1UvQ2p2YkgyNk5sV1pnY0dZTVRWN1dLTkxBSU5SZXZ5TllVeGpFQ3crU25kVXA2d0dTbTVxNDFRVngySEZtMEpUL2pyNGkvSW0rYWJxQVZYeHpMelFYUVg4VCtrUGkvbzg5L1praWd5TlhSa2F6R3hkUnFzQTI0RHhnZks4ZW9EaXVMQTVkZmhyOTg3cGExM2VHNXJjZ2tWTklMYzYwcXJHdDNEQWU1amZ6dEdyQzE1dy9qdUZyeFM1ZUx5elBCUmlQL0R4M3RUazNOUVhEYmpnUkUzQVdFYkdZSXJxZlA2c25IV095Wi93RnNXam10bnI2TXRSVHhwZGRFWWdOTm80STgvYkV6NlJCSThCTFBLQXRqLzNia3owbHNCbldQOFIzL2l6K3pSY3dxMDdXNWJ6dlBXVnU5SHFmcU91ZEZaZUxkVHBTbFg1aDlWNCttMjVVVCtyZ2xTREM4c0NoZUU4Zm1RRyszSzJ6aTlnTVBvLzJOK1FKbnNYNnVyT0ZsZE4vZHF0R3dyZHBCNHFPYTFkSytXTWpERlJkcG8yVG9IUUJjY1VRVzdFd2tCR01PKzBQeWU0c1NmVDJPUnNCTVBTdnQzaWJ3eWh1UG9vM2NrN0VKaXh5Y2lSb1ExRHMvVit0KzJuWmordzRoZFlmbE5VOTBBY0RXZkJlQS9GUnh3dE1OamFyeVpUYk9WelcwU1k4Z2dFWTU5RUR4anFZTHkzVkJQQlVJNEJkLzFSbWhpUFFsQnFueEppMW9PM2NXcnFpbWVLWThhNEp4eDVvV2VIclhSaDBRK2xsYWFTMS9sdXd6UGZ1d0I3YjFnYWhGdHFrWUtqRjRJOVppQ2lOWTZRQUhlZHdmcDhENUg3SUREMTd6RlFiRWpDaGthRm02dzV6aEJ6Mzk3VXA0eUZ1U0xrYyt4TncxQ0pUWDEremlONUFVWHRLdVBTSXFtaDgzRVZKS3dqTWkyWWo3ajVJaTRkbUVZQUt3UXNzc1h4eHRBVmp6cEJ6em9yallDWk9IQUZtaHRDMGYxdWNuVDQyOHBpMGVYMGRCaGtOVE8wYVdKcTJMRldIN3VFRnd4VUJrNnc4MGRZMEpVMnB3WlE4amVsY3ZKQU1OelpJbVh6aXEzRTBoRWY3VTF0ZUxCRUZCQkhMUjh4TUVDaHlhbDVneTJFVzFmYjE0elhKR2txTEdGS0RMUzBqdjlXVjRERlBlbzBycU15U1ZBM1FQN3NObW85ZitzWFl2RlJDaTl5S3YzbXQvblJ5ZGhZVkxYSHpRcmpROERqeTN0VG0yTW5Kb1hpbzJlTHF3d09lR1Rrd3pYZ2NCQ0NNaHdRYUFmZUxvTVhxZTZFVEk3NDBkdktqbzc5c1ZDdWhkak1UNHh6cFpLd01xVXE2YWlVcTJCU0twMm4xTkNWdFhYWFVoTlA4K1dCSkNJR3lnNXVuZ2dZU2hVMFVSRFErUVE3YlNYUHQ0TWF5a09uTUx3MldQa3FISjBqZ3Y5RDlPVnA5WTB5U3lkQlYwV2pwbE53ZXIvaFBOYlUzSmRPQTZkZ1d1eWNKWVlSTVF2aTZJNWpnSFZQdmprRGYzekdRU0hPdEdkcHc3clJhemtJL01EVVdrNUFJYU5QbVkvQ29mdmExbGsxZXBERUhTenhPWmw2SUxCUk4vL3hPeGdxaDlNeDhQOU0wNUUrQnZBdFBVQytXWmVkQmY1KzZjalk0amczT1pWWmlhUGNGbG9PY1VVYmJFYVVuY0dOa3ZJOWJLNXNjYlFHM0w2VkZEaml2ZVg0VlNhcnk5bXB3WnFiZWhGNDZFM2ExUG1rd3prOE0zN0RERC9PdTRLaXpvQ3M0cmZFMGswRUF2VUFXVDRIM0JSMXdIenlUSU8zWCtiY1Z2QURFWEVtNXMyQmpNMjVieTVQK1h0K0w3MEc3NTRwWWg2UUQ5aTlNb0lPZmlGMlFJbmZhaTRkMyt4dzNPL3lyb0Q5YVgyalpyWi95cSttTnVSK0JsNzgvcGZsaWZ2MkdyNUJJRFJGWW9OUCthVzY5N093S3VGMEIxN01IODF2MmNFb3NUVVczV3NqRm9USzRhdjdOUDg4NldyVy9LUXVIVlRUcXg2YzhJbWx5WjR0b2gzdzJYMTluRk05aDIwZGdXOWg2RXAwR29CVitHNk9peHF1YjFZZjQySmVDODBkbUtpcHVXSjN0alprWU42bEoxOU90emJlMzRyZm5JbVNHNnVHYmJzMHdEN3lsdTR4TUJnMzdIVUhuTmZkaWJZbWY4Rm5mWWNMUWovL1ozbUsyTUxBMG16WjBHOVA3Y3VsOGNocitFaC9QVjBxbkw3UTUra081OGdLZHBKdUhTczRGNkwvcG5pb0hUOVMvMm5Wc1FoVUQvR2I0LzNWWXNyU1g1YklJdkZvYSt2OEFuQ1BzWEZNdUNhQWt6M3dPWEx0eVpSOVdWSmxHMldwYzBsQ0paenViRjFCYmMzY3hqZ01SaXlPc3A3RStKaU85UmZHTkFPcVNMcUVXVVl3Tk9NcW5mMEtXQ0gyaW8vTE14NE1iR1NQc1ZlK29PTk1sUDJaS0NYSFVrQ1d6ZlJwYU9yS2dpN1hYNzlBU3hSMEM1V2tTL3ZaNGhGM3Rxam1RRU5aa1VSNktwS3RqOG1ZK2pTMXRHR2hMWS9Xek5Kd1pDcXpNRkRIcG1nanRUSCtzT0xpRjM0bkJxR01qSUdhbXl0MVkzTHFxdkZkeE8rd04rRXNuMHBwbitJVFRPYVp4YW5EMnRMUjF0UXhUUHUwMHVaa3JKZU8wN0JvM0dVcDkrNXhUVkU5MkdLRnQ4K0xsVXc5a1lCNFQ3VUlndCtZdXN4VU9ObkkvSUlqbGkrZzFtejFxbms5Ly8yM243UEJqVDlUaTJzU1MxNWZYam01ZDkvTVpKSHZQczlQYStPM3pLT0IvL29TWE9QYlgzMyswK3k0OUVjMGVIY0VSUFVybHR0WjhCcjRPNzNBMHVNNHMveWVPTnVkRDUxbnNyWDFaZk9xRkdQeEYwdWExN0oyOWdUdE81WU9qN2d1NTZDUzVZdHEyYmZLdHEyZmg2ZTdYS1RiL3RTek9SZG1zZWg3SFY2Y1hKVkQvZk9xdjdOUTVwQnFQRkpQUWNyeW9qQjFIdFBQL3JZc2ozTkNDeURIN3QrazI4ekJQM2ZsSGVMbkxZbWZkMis1SjdXSE53TlNiWWl2SmJFRjhZMnFxcTkvMWM4U1I2RjFmUEx4aVFjTEpjNlBxMzZVcFhGR1NoczNmbWozYjJpWjVmRmJWLzA0Uzd5bEE3ZE9Tc0g1Z1M4aFZMOTAxZDg2RHhVNE1ObzY3eWhJV3llSnNpM0VVNmZQSmFtMVRiUDQyelphT3pEdDMvU3RPTVlnYnYzdTZzU3l0TkRaT1NpS25lMkhoUFBmMVQ3alBQWi9YQlZWckhnU3RlckpiMXY4UXVwVHZGZklKUk8vNmdkUkZxYmZyTlRyMy9Scnl5TGxvdGNIUFBIYUFQMy8rWi9lY2NDZUcvVThaK3ZnYjlmSTVJUzc4VFlLcXArUDZkWVNvakMxL05EWlZpandRejg5dllyOG5STTZTZkp0R3dFSEE1ekNlQm5CalVOb0UwZmJ0RUFRcWFyRXZ4dFZsT1RPVmZIY0orWVRROEJQSXhpaC9rMy9YdmpXditxbjF0WjEwbS9WSTVneHQ0NWwrNDN2NHBIRTRxc0ZlcXFCandCc0hZTG5wSC9EZGxDWitMZ05yRk9XcmtOUWdwd2lXcVZxQ1JpM0Q1aDRUamtPUEwxa08wbnFoNFRBd20zSEs2MHYrbUhpd0d6cjNObXVjKzlzZytMVmJ4SHlZZDYvNlN1TzdXOHhKNUpLMjJPaGF2VWs5czV0MXlHVExwVHhmUjVqbEFzb1MxSm5LMkhVN2lLVUJjNGM4MVNGQkhvdHFZVEdSUkd3VUNtOFgzZk9kdXZiVG5XYnlQaFJ0QXRBc0xUM2lTbElLUWpRY3dJU01wU0xScjV5TURnUEFlM08vK3JmK3RaRVllRG5hRGZqNGdQZ3JsUEl5Wkdwc2Q0c0dPVm1QdHJBWUJ6WUFyT1g4MUgxWHJXWTAxeG45TEdwYUFUVzVVTE5PbktkL2VuaUVsSHJTK21qSkV4M1JoQWpZN0RvWElUQ2JvMHhtTVp3UXR4ZEFyZFNWUHpCbkdjc2NWVUdrS1F5VlpyWUhsYXB0dmpKTFlMVFhlbGFTUDcrTkZIKzNEeTZGc1JPRnQ3T3pHMmMrSENnNUtTcTJOKzdVakFrMWJyNmN2L2srMTF6TGlvSGQ2Wi8yWnhuUG45WFZzTmlmSUVDWjdDc2ppa1NLak5mNm9acHdpVG44R0hqb0w2YnZ2V0ZSMUpwaEorVFFwbUIyTlRuMHJreEM5NVJFT1RrM05KNWtod2k3eUxGTXduc1kwYWFvSjI5NUFjR3FZNVdkbVZGODR3clRlM1p1WnhjYnlUMTJuTXRFaUMvZ29kTDkwQ2FUQkVReHd3TzFUSDlHaFhhUDhtdng4cktDM2hXbVBxQUcySGV5RHEvLzV4c0Z0K2NjVW9NT1J6R3J1aWM3a2FmVndJdjBHcWVWL0NhUG8xL0g2K3B5eVdWdFltbEs1S3RTVVgxdlJ6YjRpaC9nci9Pd2s4cUFYOFgvQnM3dG1sbG90Lzlic2VpZk1WZkhWVk52dzN2dkdlTExwR0QyV1k0VmdWK1g4RWdtakhtSlRDUVhEaVo3cXhBWGRzQ3Y3SDBLWFh6dzAwbXVkMHdQcHpWdDFPeVNHcnFIcU9JS0w5a25sbytQZGlUYVF3QzZNK3diUWpWQkFoVCt5eGU2ZnM0OUYvREFPMXBHb2JJMitjU0JrbFVZaGpRaW45bnlROHNYWWtzN2Jyc3VKaFkrb0x5SWRYakxPUldycUhQcVh4TnBqc3dXTGhtTU1xYkhSeVg4MnBIaGVKVGVxYmdHeEorRVIwQXVDbWwyU3YweDVONmNUTFByWEplb3BwUEIzUDNoY1VzZFJvMEZncWVwL2xGdHYvblpPSkNINkJyN3MrT2Y1N3VkZGhaeUtsVjUweWpDdlpsK0RyaENTTVk3WUNvZXNEL09Sd29vbHFtTXJIL0Z6K0JDOWZTNXk2V0gycFRHMVhBUXpEQXNqTkZsTTlVRDNLWVBsaXVwR2ZwKy9DMC8xYm90L3IzL2l6ZnFLZnpwMzdVc1NqbVVPaU02V2tsOWd2d3NZaWU0cmVMOVUrNW1IU1IzUWxHUHJVSnI3RTc4dDdVNU5nTUVPYXBnU1dxT2NYUjBjKzJPWlFBZ2ZmTkplMWFLUFVTNEliclV0ZmF3clY3cjQzd3V6RUl6QjBNV0pyaVhVY3VpYlVtODQremZMUUJuSHhvRmYydEFjZnNqRnFCMDR3V2Z3VmdNRTFuaDBVbSt5T3E5eWJ6c3NNSzI1NjA4UGZUNHdLY3h3QnQvMHQwSU8zKytMTzgzSkwvQ0F4Z0lkOUZ2RmwwU3hyQnlvVVQ5V0NKNnVZWk8xU0hGNEZRVFF2NzNpVUxpU1JNN3dBbmIwMjl1TCtjMm0ra2M1dmRMSy9Tc3p5bzQxa1NwcG10UFNZU1luNEs1eXVNUjVKU3BaMEExQmJ1WFZ1WGtTbndPeEE4RHVrQ01sMzgvWGJQdUZORzJSbGNpbUN4RUR6OUEzcWswZmtnL0ptNFhXM3dKdW5XZFdGQjQ1blB5MkF3eGZjejdMY0JqUlpEZlBYNXlKNG9lM2lJZGpOTzJSbURlV3VwVnQ2QjVhaFc0Q2NWaGJOWTV6QTdXYmptWmx4TnY0eEh0RkJYcitzTys2SHc4dzZ6Z1hyQWM1MXBSVUd5VHVCTVN6aGhQb3hza1UxZTRWOGpFQm9YK0k0OGtJSnhEb1B4OEdmeHJvUlRaR29FSDZSQURNY1BwdmE0ZVJuT1Q3dGFUWEcwaHZtSU1YUjVDL05SREdpOG41SWxreVhiS2tZWmJVek5qRUd3U3ZHM0xYMjZBd0dMUUxoSTdXQ2NXeHFKaTlPR3ZzOWZGVk1laXg0dmkxMk8rWXFmaTExRUdLaUk0SEhaS09KMHNTMEY0aUtUN3RnZERMQWRIUkpiVmkxYml4NWpUL2pEVi8vVHJxT0xsdHBJVHQ2QlFFWndvaFIvbTdHSjUwdkhxRHFOWjNxOUE0WnRGSTAvZ2RjTGMwRlZicWxiajd3MC91bjBQOHBsTFJ5ell6bFdOeVN2UlgyeVRXTTNBSEVVa1B5WExyWDdTVHB6VDQwZWsvd3BIVGpOOFhjd0R6LzkzRW0rS0FhaGdreE96VjhUNzkySGFvcGxqYzZMMzVta0dMaUVnOFM1NVZMZkszSVpkYjY0VFAvWGFQaG1lcWdocjIramo5YkUvOVI1cHZnN3NEU2JoUUVkWThheEhnaXdqOExXWmJPaGQyRCs2VFU1b3FMTVJsMlZPdVUzNVlDeFBOQ3hDTDlVNVQ0MDl6MllJb1B1WlBGV09EMlkrcFN6TktKWHNINGFnQUZwcEFsbmcrcmJPMm5BczBid0dFUE9JejU1dFNTdHo0OS9MMWtDTjN5Tm5oZEh1VDJaWDVTRE1mUnBidWdiL3lkMWVqVi9MSnVrTWVIdCtmWmxPT2JvemdqVVQ3bXI0ZlUxZHBPVVoveituQmIyejN3ZSsxVFlUOGpNblBlQXozOWJzTGRsU2Q0dmlkc3VXQWM0RTF0UGQ0QjdSSVoyL1F4OHorV2xGVlNXcjVra04yTlV1VXVibE1iSUVSaW9pVW5qN0FKUDZ1YWMzL2tpWGRWY3I2bzF2dndGY2pKRXBoWXU0SzVkS0k0MmREMFRIWTc0NEhjV0xUNS91N3dVS2F0NkxSKzhNTWR5b1R3YzdSYVJpZFg5ZUU1d1ltalg3ajBqTDBwOC9OcDRhWWpzaWIyRHBQd1Y3cWc4NHRpb0tHZlVGbWwxN1VVNWxsZXQyWjJScGFxYzkvSjMzMEtXTDgvSkVocEN6dHZaWktjcFRMUGpIQzZCLzBVODNaRHhSbm5zeitQcmNubC96cjVPUWFEUWtzaHE3VVpCTUdCalVQaHRSU2YrTDhYVEM4dCsvM2ZnVDhTMkJtRVpkWTFBalF6ZGpNRkFvbXRoSXNvZ3A2NXRIZk1namlHSHlCaVFPUjZldHl1WDV2RHFNcHNpNTZLOC95L0o1ejJIeTVtcGIzQ3NubUI3UzlZaGliMlJMb0Q1UmJhM3NlWjZVdEo3U2E3ejdmSjFGL2d0TFhqSlRuZmVEZ2FJY1piOHVsVUMvZHZ3MlB6dVg1N1hQdjhoUEQxWGJ2L1RPdTUxdFFBdnQ2KzA4VjNON0FuMml3cWYrVTdtMitYcE5DWW1NWEtBNXd2YXJRYitaWGgrR1U1ZThOeTUzUDJUN3o5QjUrQXh0Z08xM21COFlZNzU2TWUrbU04NzlZS1ptNXBLOHBxU2VBTFRSVGxRR2hjM2Q3a3p1RkZLOHA1bGM2ZlA3a2xOQkluTlB6R3p0YkZyTmVnZVpseXpzWEttZWNqUUhobExLSEw0Wisrem5vSDkyL0dvM1ZnWm1kbzRzVVgzVmZtM2RtUDVYeUpQclkwM1ZxUFpIc3NMamp0LzZmcHRFNi9odkVXNzY5UVNWUTlNbEtpSUw5Wm43MnRqc2thdW42WGw1VGtSc2taeUdXMDhHRTQ5Wi9tV01rUWEvQytoeGRiV3BnZ0dRMFRqTWxUR2Z6dGJIQitzd1J6SHp5UnZNNk1icDZRdG5PN0szVU5ubXByWkFjb0JOeVI1OXBsdWVqQkF0SlpScTh2Z0svS2xnWnJaR3pNSEhQT1hXQXVqR3dpOExaNHg3NXNCQ3JHZlBkUDVuU25VMTJHa2MvZHgzSjhhK3UxT3FxM3ZtRXZXQStJK3RUaDFpT2tBSmlwK3g3UDA2V0dtb1d5bTNhWGxlRUFiNzJmYStOQ2tFWXRBYU1Zd0dHVUEwMVZnT1VPZnhpVCsxT2V2b045VHplb1hyWlU4VlNmOG5BNHJXK1dLdUVhOUpyRnVNRTRzUGNhK0I4bVhQTG5IV0k1cC9vaWV5MkhDeStiM016bUsxOVFkUDRlbk83S09TT0pwOVVEcUVxaFAxTEpydzJZdXRhZ3ZiZVVzNmpoR3BzREh3T2U5eG41endsdlZpOUdOSnFwTnNmNTR2ZGpqcnVSM1dtTlA0U3R3MllrN1cvejBWdldIcjhsei81WmFtQmZKVjdGekt3aVZ3MHR5MTAvQmM2NG01b21hQ3c1d2p5elFWQmtNU016d3gyMU9lL09UWDdHR1pJdWpuTlFDREtvTk4zYXZxRmNwY1hmNDQ3N1E1TGh4eUp2WFVneElxNnRuY3F2ZGNYT1IxL2cxSFJ2QS9YRWZzZ09tdCtjM3NrWUp4SkZuVHVZN3VuWXpJcHZVTmZ5UThGVTgyTFdwengrUjRZV21iR0RPVTNpV2pRM2xEclEraGRhaDRQblBhSzhHWjJrS3RwTWV6SGtQeDhSd3NDQTVDL3hNZlZORG1QK2MzNG45UDROVDkvWmt2ck81VVc1eGp6dER3N3pON3pCTnBBSHhNUmxUSzJMbUoveSswdzNuR2h2MXRQaC9XcDRhY1lZbUwxMHZlOXJIczBYUHN3WCtZSWtqRm9nTFVzOXE0amtHNDBRU3gyc1lqQTR3NS9lR1BpVXQ1SVkyM0pBOHVLZ1dyZlM4WkdxUHFTVFNFeWRncDA5d2daMmw5ZXpmN0VETllZQTJmNndQY21BaUdFNWpmSy93UmZLeVQ2SFk3aVdUN2xBS3hmSGFuc3lidGNIMHZ2dUYxS28yVDBHdzlMa0xSRFd3QmQ0SDRqaXo4azJCMC9aWTlIbk0wMlc5aVNtVWEvaTErcDV6Y24rNmlWaUQvOHI3K0YreUpjQlYvOEZIeldOd2xMdmJ6L083OTRGOTNPVkJ5bSt6KzQyNmt1NDhCRVRHTFE3MCtMSllOdG1nV0hJdFdtaUtFb1JnNFpidG9vUkU0cDJyNWNPdmlxcllYNW9wS21hR1JWSDRGRXRpTXlTU3hGRW0zWkdVeUQxTmlWeC9FZno1V2hyenVhbFBFZFRWR0hJSWkrR1hSYUFtWUFDTDlvdVQreVhycDRhK29lSE1aRVBLZTRvMktOcjJ4STBQNXJMNFJzNlRBMitrc3RUM05qYkJvQ3JaejB5dEtLY1Q1ZHpVU09yWmE1ZmkwakNCdEpYdXlWamlPSlBHODN4WmF6ZVNSQkxDa3pETDFEMEdtMWxESXdmemhKWXVNNlFGYmF3ZXl0YUI4cEFmaWxONUJ6U1c0TnJRNTY1QjJOWkVFSWROcmZLbFNxMU9WQXgvV1hiOVVRaDQxeENuSHVUY0w0Q2IxNTR2UzV6NURTMU5sOUlhVEM3UU55a2RpNjFLdUdkTDl2Z3NLYVZSODI5TDVWNVJwNXFpVGg5VWRUb25CeFVWQnozTWRQVkE1OHVpYjB0RlhUSHE4bjR6bHBYbGJUclRpbEp2bjkwYnVuekE2dGo4ekd4V2QrUDdGV3QvVzIyYTN6TTExck8wL1doNnA4cUxGWnFUZWQxR1h6UnR4RXFpN0lHQ3hyYm94VEEvbHAwYjRjYUY0dmhRdE9yZ01RLzBlckdoZW1PamV6WjFsaXloNVV3dnJvbTNGTThpbFJHclBCaEt2RTBrY1pRWDlGK1RNTHFXcnNCcXBtd2xNcFk4ZHd6RFVXTGVSMThNa1hjZGJaeUMyNWp5Q3RrOWl2SlJkYmlGejUvQ2N4dTdobmo3aGFvWklrOURTWndPcFFudndZRk1RM3FCU2UyeVhFSlF0TVhxVVZWVStVSFpvTG1pM1dhQ0c2MmxmTzRXSmZybFp2MFVsMFVyQkFoVXJLSVlrSmNsTlNzOEQ5SnVVT09kMlBScFc1VE5qYkg1d004WHoxQityRnBoTUE1L253elVXdzkrVmdZTzFuKzRERlJ0UUNGKzN5djVZWFE2ZjhXbCsvMDlNUWJkQ1c3VVBPeEZkYWt1NVNOcVYxQUdCNG9IeEVkM0p2RFl0dERXT202YzBlWDJNcVdIK1lId0JVMmhUS3FTWnhJeWYzV3hMUEJEUTJNVG9XaTc3end3cHpwd3BObGUwbm1nVENsenVoeTFaWTdhcGdLR2ZTeVkydVBPenNsaFo1NDBVVWphbHl5bmlhcG5jSzVMWlhCVnRyd1FXVHFXTFZWMG9vZDlZZHVIaUM1SmJWMW1pTzNWaG1FcEU0WllPMHhHdzdxbk91UUl5ME5GbDBxdkRSUVBoZ29MeDN3T3VCZ1pBNFhTWURKRlpxRG1kVmNjRE95c2JadG5HTGZQSFQ2aTNicEFWdzgyTGIwai9FRGEyN1F4USthUFhaL2kwRHo3ZUVBWUN6bDFRM1ZXWjBrNjFrOWZIZ1NldXJWTC9wTjAxd3JmSm50WEVkWEEwTlhFRnZEOThjWVVFYm1IOWNxeUdlcTZEN2Z4Snl5VHN5V1Q0bmZ4djdYLzNRZmh0dnJkY2IvMVAvOUpDUGV1MFhBRk1YL3YzZGR1dDlHWUQzRVppOWJsd2k4NzNYYUQveVNOaVJ0YjhFNTVvRjdwcTR3YU9oTHJYWTFwN29XcEw3MWYxaTVVekR1RnhiZGd6cWFHTmlUN2RWb1RKUVhDeGpXQzhjSFVhQmxqSVFmVVJpNExlb2w4N1VBeG0rQ1g3QWRhTy9Ud09ad2FCRjcxZ0czNGc4Q3ZpQnhSSDdmMDgwcmJ0b09CR21DZmtoeG1TUHk0OUxTVmoyV2lOMWpTcXkzWEtzZDJLcTVzb3lxK3A4L1RxbFdsV05xcm50V2JyVmNsbm1lNjRwa0QrZUU4L3JIK1A4RjlGWjZRdmRweG1LWXRTaG9jRDlJcGRwYzBCQ3FQN1pMY3NxVVU4Mm9SM2pSYWU1b3A4b0pQdFFXUlpXT2k1SFloUXl0T1pVcmZpZnBkV04vS3lPajJOY28yRCtLYndFRGxMT3p2YzhRbnd2cVNxVUkybjUrYjJwaWoraFFkc08wdnhxQjdxMnEzWkI1bitLb1dLai91SEJ3TGlFT0VkVC9sMGVzMnZsZ1lJRElmaFVPTU5DWmJneFRiMEZEOWEycERncGNpUGdscjJ6UjhhaXp4YzRpeEpxcGZ5R051YW9UL1UxTlVPV3gvb0tqbXM4RTh0N0NmUUhhbVU5WmdNRVV6VGIybU1sT1hyZU1obDBXYmpDZE5aSThjNW5teTFISHRMb0tWQ3dzN0RISitkd3pqQ3h3ZTQ1K1RiWFVLZTFUUnA1am56dVpPbDV1a3lSN3JlN2QrQUJybHMycExFdmNvVTdUbG5zaHljNXl6dEh2QS9sMGROL2Z6Ykw2a09wVHdXV0duelM0ZGZCWS92QStEY1dad2JpYmRZWDlmTG16Nkp4ZFU2WWJLeG5meGJubFlhaU9XZnRJbSs2WHRPWCtZRk1IYitMZ2xDcDFENlFOUXZNQlF4V090S0EySjMwanRoVi95eFBGNVd1WjNHNW5MWVRqeitUVWM0SHRSWFBodFhtdjdrY3BlbERrQjRuTnlqM1VZcU15VHVaVjZHY3NqOWF1T0IxUXFoMW83MjhFZ0tibFRtaCszZGZsei9ObzRUQzg0MmhvNFVQMloxdHFlcGJaTndNbGNpcll6N0FOajNqRjFMNFlENVVCOUVoNzdNRzA5Yy9ZUVBNb0g1N1pub0g1RXJmUnJZK295aG5OVDRzT3N4VzU0cnVWWk1GL1g4Mnl1dlpSdVJYNXBVaDNCZFN4RmU3bmVLV01hSEZNZnQvTVNXYzhXNWVoWUR0MzhSckVWMEpBYzNOek9PNVU3TkUvbUo3Uzg2R0JBaW9kelM4ZkhIRkRIUVljTk82RFJHSzJTNitEMCtjclBwSHljMWVUaS9EWWx4emlMNUFVS1hkVE03UkZhN2V0MjlpSStvd2NYcW9WL2RySmxTRk1mdkFndkdKYWFUMFYweEczYXBsTXozbFRjd29mdTNPYmx1ZTA0c0gvY3RSVU1HRWpZcjJodDNYQWJuY2xoeEJpR3JuRWVoTE5MU0wrd05ZWHlFc3hITmQrUExlakpIZzluSmY1NHk2NmNPU3kxc0MwczVOeGFEclE0VDRqU3lGVlB0bmM0Y3hEdFoyWWtDWFlMdDdETmQ0MThHUFVKdXJkRnFIVGtMeXlick5OQUwvejFCbjdaWXd4azZ5UVhsMWErZUNPWHFwVHRRS2QrZG1oU1hxNHRoUnVpbXRrQk9aRFZBMkhzWFRMWHMwdTdBOFdEWElyampvemFUNWJzMWovVFdqNEhWeDR1L1VSNTFMSlp0dFhoejFWTEhKU0c5YVhYZXB3Z3U5S0VISVQ5MFNqdG5mREhXNTFMR3RYUEtpcnN0a3pqL3BlM294TmJTdTNuVWFjMXpwRXJXODg1WWx2cUphaHVDNDhIV3h4blRuQlh5ZDViTFkzVzh0amwxaFJsL0o3V1lUaFp0ajZaVDdQbHUvQkp5STdiWkhuM2VLaDdJOCtNeDFod2p5d2NLQWh1MElMeXpKdVZNWlF3SFdaYXEzMnZndWZUR2s1VUg0Z0kyeUNzTit2dHh0WGZNeDNQaFp3ZDFwbzNiWHMrNGZWYTY3a3hxWjRwNlpnVjhTK1N4bW01WHBDUk5ZSXhFODM4VWZPYXNIbEwyZmpKZHVyU2ZwenNsUDk4M21ESkwrbXp6V1hyMmpJcGxqMEdoaXJsbjdzeFdSdUFPYjhHSXFMbFVpVGZLNU40Qy9QVlBkYlRpT3BwNnVPUDE1amVNRC9uRDU3SVlYbWFRTzBrRCtYbzR4UXR1TVdhSS9FeFZyUVYrNWpuS0gxTWdXZGdNQTdNQUsyUHZoYmhXYmZVNDJkd1IwOW9LTnV3eFNkOXpVNGNxbmVQOXpOTnpZekJkQWduWGJqOGhiWXlVQmxhWW9IbFowVG5wTTkzWlZ1ZEtiRGx0WllQMG8ySFpvdm1zTXZmSTNTZzVRTWtCMHZhU1Z1dG5UUW0xbVZFNlVCT0c2RTZmTUNYNTZ4VzEyY0MwYmtJQkhMdTZEeGpDSHNzdHg0Y3lJdzFtZTVzelk2TVorQitXY3NrT3VlL0hrOUdXZGI1cU5IeW5wdFZqS2x1eUx6R1UyU0tLSy95QVhlamZkRSs4RkVTcGp3UUgzZDJzUzJacGN0MDNjTGZ1eEk2dmlmNXo4eUxVNGVGUDVpRGdVbExLOFFVT2N1T2NzYlNYeW55TFJaZHh5dlhwekxwRHN2b2NHY0wvTm9TMWJWVnRiU1hzU1hLTU4xTURqR3paK0E2T1VHRXhtaUxzc3lvNUJOeWVvZkFlN2F1UkdBd2plM0p4bTJmNkdIVVdxaEtHL3VjdkRiSEtIS2FaVjRWei9zTnZ2SGNxUzBuZERuRytMVzJMdjd6WXNtMzIrc0NzbFZ1amVmTE5ucHlTTGpBZFBzdEo2MVZhb2NQMi9WTTZldVJVelB1VFczbGFvUHF0QTZ5cnFjdjNXeld1dmJsRi92NXY3RTlxc3UzYkoyRDJZSExqck0zZjcwZjhkbzR0SWtibUovRlJXRUg1SFAzVTBNanZQaHdyc1hwbEM5cDNOVDJvMDF0eUwyMS8vd0xXNGRPZUtScXg5RzY1MWJlK3pYeFlxUGxaZys0UlhOTHVqUDNxN2FiWEs2dmdhZUc5cGpNTkd3MzJHS05ndDBiR3NwaHpzYkFaejJDQ3p3ZXgzcFFZTDNtVm0zU0UxdmxkZmpsTHp3LzhxeXYycEY3cnBYeE44c3VxeHpSWSt5UXZDcktKUEhPWFJTNHVOZkcreWZ5Ymk4OFM3WFcva0g5d3puZ3F2VUpIRk9scEp4ZjZNdzNkN2NoWUppRVVXUDd1R1BzREhldmdyT3hsWWxjeXhXMjlHWnp5NU9PTFFhZXNFSHRzMWRQNCtlaVRIOVZuaE43ZUJPNWVMNUUvZ1JYMWIyek1LcC9ERFpMRzhiMlhUMnVsMC9zRDNsR2FKZDJ2elc4SkM1UEFEZmV3SHkwQjV4Q2MxWDY0dG44VE5lWnRLZDVwOWI1dDIrY1EzbGhlWGtKZTFrZVR1dHFWaVBPMUtjNTlsY0wvNzM2RGM4Y3hYTDBzZXZiRi9JZXBnV3QxVm00aG1ZeXBpNlYyMmNWVzVnWDk1Zlh6YlgzWWsyMGEwMjh0YjdaT2RiRGJmVDMvbzkvL3JqcTAvT3VHa2VUdGM5UStuT25qLzA0ZTdMWWY5NUJZdityVHdDNy9NQ3YxeC9VZGVIcFhQV3p0VTdPMHdxczBIL0FQMjc2Nzk2OSt4NytUMjlKampLNzZWSGUrTkI5Rk9QMzBJcDkxZGttZmhUZTlIYnM5eER6NzAxdlNaLzVIZ1pPYXRsV2F0MThEM3M5VEtGeVFQd2JXY0JCQ2JuVjYza09DdDRuM2dmb1dkdTAvbFN2WjhYeCswMEduRzNvcEU3eTNvOCt0RWZxZXNZUGs5UUs0YlBQQlZab3VZNzlEdVEzdlltRGd1TnpsZmppeDdaWm1QcjFyeWF4QXduc2FSNDdONzBLMGZoUzRpQUhwdEgyNXEwblFOaTlHUFZkZ1ZETVQvUUt2WC9Ud3p4ZFhTbVkvNlozTDN3ckx4NzVzWHo0T2FvZkpicUQ4RlljSngrTzFQOWNQZnNmelFDOW5oV0dVVW9Fc0p3RWtiRG1lK25XZDExbm05ejAvdSs3RXYvL0tQL285ZjU5L0xQWCs5NS8yRWJCOS81TjR5cStqakg3dlgvenZXVVp2dmV2Mms5aTFKQW5DN05FcGZ4N3YvN2NqNnZXVjMwSDJWaDVreGN4Wjc4dlNmK2UvSUxWUVkzL1lQNzVuc3l5UHVLUDhzOS8xdVNpMUl3M1BiWkRKZ0lyaGQ2c3pnQXZvL05MS1YzQ1gzNnV6b2Y0UDlUODlKUDg5M0xZWHM2SGwvRGlTL214MTZ1UWovODdFcTAyelZKcjdCMVE1d0ZDMG5Lc2ttZHE5K3VLcHoxVVhRR1YvMVhmL25haWthb2h1elFUSVUzZEEyaDlzM0lHYms2R0l4OXF3OUkwNjYyWENndC9PcFNWZWplOUR5LzdRdjNNeTJxazR0RExtK2NWSy9FMnFZL1VvVm5KM1NiZGozcVd4emNGOHVwL2g2V2xibkl4alRTcXNFM1IwZEtNeGIzNkJHcDhUVTlxTFljaUZsejBDOGhkLzhnUzJkYW5KTC9Zank1SDJEb1A1ZmRMdjUwQWtHNnQxSEh6QmdpVVN3cFJKbjh2bTQvMWV0aEExQmoycWFtM0psOTgrSGlIQkNFM3ZRcjU1VjBuM0hVb2pPLzl6MS92NWJ2N2Z5M3ZiNVg3MWJkL2ZWTytUdStFKzZabElTYzg0NGV0T0taM0t2dFhlaTJGdjBUNFZ2Q3MwSFdlbHhLaW5oSXl3UTRwNmJDNlJ5bXA0ZWEvUTBwUUZHMnltRVlNeFdSUUJDMTAwOE1oeHZPNEprRk1CNWJwOVROWVZ2RE4veEovdjFROHJWaW5MWENsdjE1S2VNM25MbTFJV3FHakZzemQ5SEFzVi9pVFQ0V0RONzB5R3Z3ZTlxLzZPMG9vRW9qV3N4RFEyL3BKR3NWZS84Zi9Dd0FBLy8raHFZVU1wYWNBQUE9PQ==
type: helm.sh/release.v1

Decoded json:

{
  "name": "dotnet",
  "info": {
    "first_deployed": "2023-02-14T23:49:12.655951052+01:00",
    "last_deployed": "2023-02-14T23:49:12.655951052+01:00",
    "deleted": "",
    "description": "Install complete",
    "status": "deployed",
    "notes": "\nYour .NET app is building! To view the build logs, run:\n\noc logs bc/dotnet --follow\n\nNote that your Deployment will report \"ErrImagePull\" and \"ImagePullBackOff\" until the build is complete. Once the build is complete, your image will be automatically rolled out."
  },
  "chart": {
    "metadata": {
      "name": "dotnet",
      "version": "0.0.1",
      "description": "A Helm chart to build and deploy .NET applications",
      "keywords": [
        "runtimes",
        "dotnet"
      ],
      "apiVersion": "v2",
      "annotations": {
        "chart_url": "https://github.com/openshift-helm-charts/charts/releases/download/redhat-dotnet-0.0.1/redhat-dotnet-0.0.1.tgz"
      }
    },
    "lock": null,
    "templates": [
      /* removed */
    ],
    "values": {
      "build": {
        "contextDir": null,
        "enabled": true,
        "env": null,
        "imageStreamTag": {
          "name": "dotnet:3.1",
          "namespace": "openshift",
          "useReleaseNamespace": false
        },
        "output": {
          "kind": "ImageStreamTag",
          "pushSecret": null
        },
        "pullSecret": null,
        "ref": "dotnetcore-3.1",
        "resources": null,
        "startupProject": "app",
        "uri": "https://github.com/redhat-developer/s2i-dotnetcore-ex"
      },
      "deploy": {
        "applicationProperties": {
          "enabled": false,
          "mountPath": "/deployments/config/",
          "properties": "## Properties go here"
        },
        "env": null,
        "envFrom": null,
        "extraContainers": null,
        "initContainers": null,
        "livenessProbe": {
          "tcpSocket": {
            "port": "http"
          }
        },
        "ports": [
          {
            "name": "http",
            "port": 8080,
            "protocol": "TCP",
            "targetPort": 8080
          }
        ],
        "readinessProbe": {
          "httpGet": {
            "path": "/",
            "port": "http"
          }
        },
        "replicas": 1,
        "resources": null,
        "route": {
          "enabled": true,
          "targetPort": "http",
          "tls": {
            "caCertificate": null,
            "certificate": null,
            "destinationCACertificate": null,
            "enabled": true,
            "insecureEdgeTerminationPolicy": "Redirect",
            "key": null,
            "termination": "edge"
          }
        },
        "serviceType": "ClusterIP",
        "volumeMounts": null,
        "volumes": null
      },
      "global": {
        "nameOverride": null
      },
      "image": {
        "name": null,
        "tag": "latest"
      }
    },
    "schema": "removed",
    "files": [
      {
        "name": "README.md",
        "data": "removed"
      }
    ]
  },
  "config": {
    "build": {
      "enabled": true,
      "imageStreamTag": {
        "name": "dotnet:3.1",
        "namespace": "openshift",
        "useReleaseNamespace": false
      },
      "output": {
        "kind": "ImageStreamTag"
      },
      "ref": "dotnetcore-3.1",
      "startupProject": "app",
      "uri": "https://github.com/redhat-developer/s2i-dotnetcore-ex"
    },
    "deploy": {
      "applicationProperties": {
        "enabled": false,
        "mountPath": "/deployments/config/",
        "properties": "## Properties go here"
      },
      "livenessProbe": {
        "tcpSocket": {
          "port": "http"
        }
      },
      "ports": [
        {
          "name": "http",
          "port": 8080,
          "protocol": "TCP",
          "targetPort": 8080
        }
      ],
      "readinessProbe": {
        "httpGet": {
          "path": "/",
          "port": "http"
        }
      },
      "replicas": 1,
      "route": {
        "enabled": true,
        "targetPort": "http",
        "tls": {
          "enabled": true,
          "insecureEdgeTerminationPolicy": "Redirect",
          "termination": "edge"
        }
      },
      "serviceType": "ClusterIP"
    },
    "image": {
      "tag": "latest"
    }
  },
  "manifest": "---\n# Source: dotnet/templates/service.yaml\napiVersion: v1\nkind: Service\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  type: ClusterIP\n  selector:\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n  ports:\n    - name: http\n      port: 8080\n      protocol: TCP\n      targetPort: 8080\n---\n# Source: dotnet/templates/deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\n  annotations:\n    image.openshift.io/triggers: |-\n      [\n        {\n          \"from\":{\n            \"kind\":\"ImageStreamTag\",\n            \"name\":\"dotnet:latest\"\n          },\n          \"fieldPath\":\"spec.template.spec.containers[0].image\"\n        }\n      ]\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app.kubernetes.io/name: dotnet\n      app.kubernetes.io/instance: dotnet\n  template:\n    metadata:\n      labels:\n        helm.sh/chart: dotnet\n        app.kubernetes.io/name: dotnet\n        app.kubernetes.io/instance: dotnet\n        app.kubernetes.io/managed-by: Helm\n        app.openshift.io/runtime: dotnet\n    spec:\n      containers:\n        - name: web\n          image: dotnet:latest\n          ports:\n            - name: http\n              containerPort: 8080\n              protocol: TCP\n          livenessProbe:\n            tcpSocket:\n              port: http\n          readinessProbe:\n            httpGet:\n              path: /\n              port: http\n          volumeMounts:\n      volumes:\n---\n# Source: dotnet/templates/buildconfig.yaml\napiVersion: build.openshift.io/v1\nkind: BuildConfig\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  output:\n    to:\n      kind: ImageStreamTag\n      name: dotnet:latest\n  source:\n    type: Git\n    git:\n      uri: https://github.com/redhat-developer/s2i-dotnetcore-ex\n      ref: dotnetcore-3.1\n  strategy:\n    type: Source\n    sourceStrategy:\n      from:\n        kind: ImageStreamTag\n        name: dotnet:3.1\n        namespace: openshift\n      env:\n        - name: \"DOTNET_STARTUP_PROJECT\"\n          value: \"app\"\n  triggers:\n    - type: ConfigChange\n---\n# Source: dotnet/templates/imagestream.yaml\napiVersion: image.openshift.io/v1\nkind: ImageStream\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  lookupPolicy:\n    local: true\n---\n# Source: dotnet/templates/route.yaml\napiVersion: route.openshift.io/v1\nkind: Route\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  to:\n    kind: Service\n    name: dotnet\n  port:\n    targetPort: http\n  tls:\n    termination: edge\n    insecureEdgeTerminationPolicy: Redirect\n",
  "version": 1
}

https://github.com/openshift/console/pull/12578

Bug OCPBUGS-8683: CSI driver + operator containers are not pinned to mgmt cores

View the Description View the linked PRs

Clone of ~~OCPBUGS-7906~~, but for all the other CSI drivers and operators than shared resource. All Pods / containers that are part of the OCP platform should run on dedicated "management" CPUs (if configured). I.e. they should have annotation 'target.workload.openshift.io/management:{"effect": "PreferredDuringScheduling"}' .

Enhancement: https://github.com/openshift/enhancements/blob/master/enhancements/workload-partitioning/management-workload-partitioning.md

So far nobody ran our cloud CSI drivers with CPU pinning enabled, so this bug is a low prio. I checked LSO, it already has correct CPU pinning in all Pods, e.g. here.

Bug OCPBUGS-18348: Add deprecated alert for DeploymentConfig

View the Description View the linked PRs

Description of problem:

The UI should add an alert for deprecating DeploymentConfig in 4.14

Version-Release number of selected component (if applicable):

pre-merge

How reproducible:

Always

Steps to Reproduce:

1. 
2.
3.

Actual results:

The alert is missing

Expected results:

The alert should exist

Additional info:

https://github.com/openshift/console/pull/12968

Bug OCPBUGS-19679: SDN: 4.14 after ec4 has a higher pod ready latency compared to 4.13.10 [backport 4.14]

View the Description View the linked PRs

Description of problem:

This is to track the SDN specific issue in https://issues.redhat.com/browse/OCPBUGS-18389

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.z in node-density (lite) test

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Everytime

Steps to Reproduce:

1. Install a SDN cluster and scale up to 24 worker nodes, install 3 infra nodes and move monitoring, ingress, registry components to infra nodes. 
2. Run node-density (lite) test with 245 pod per node
3. Compare the pod ready latency to 4.13.z, and 4.14 ec4

Actual results:

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.10

Expected results:

4.14 should have similar pod ready latency compared to previous release

Additional info:

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

With the new multus image provided by Dan Williams in https://issues.redhat.com/browse/OCPBUGS-18389, SDN 24 nodes's latency is similar to without the fix.

% oc -n openshift-network-operator get deployment.apps/network-operator -o yaml | grep MULTUS_IMAGE -A 1
        - name: MULTUS_IMAGE
          value: quay.io/dcbw/multus-cni:informer 
 % oc get pod -n openshift-multus -o yaml | grep image: | grep multus
      image: quay.io/dcbw/multus-cni:informer
....

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer	232389	314	f2c290c1-73ea-4f10-a797-3ab9d45e94b3	aws	amd64	SDN	24	245	61234	311776	https://drive.google.com/file/d/1o7JXJAd_V3Fzw81pTaLXQn1ms44lX6v5/view?usp=drive_link
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

Zenghui Shi Peng Liu request to modify the multus-daemon-config ConfigMap by removing readinessindicatorfile flag

scale down CNO deployment to 0
edit configmap to remove 80-openshift-network.conf (sdn) or 10-ovn-kubernetes.conf (ovn-k)
restart (delete) multus pod on each worker

Steps:

oc scale --replicas=0 -n openshift-network-operator deployments network-operator
oc edit cm multus-daemon-config -n openshift-multus, and remove the line "readinessindicatorfile": "/host/run/multus/cni/net.d/80-openshift-network.conf",
oc get po ~~n openshift-multus | grep multus~~ | egrep -v "multus-additional|multus-admission" | awk '{print $1}' | xargs oc delete po -n openshift-multus

Now the readinessindicatorfile flag is removed and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

Test Result: p99 is better compared to without the fix(remove readinessindicatorfile) but is stall worse than ec4, avg is still bad.

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag	232389	316	d7a754aa-4f52-49eb-80cf-907bee38a81b	aws	amd64	SDN	24	245	51775	105296	https://drive.google.com/file/d/1h-3JeZXQRO-zsgWzen6aNDQfSDqoKAs2/view?usp=drive_link

Zenghui Shi Peng Liu request to set logLever to debug in additional to removing readinessindicatorfile flag

edit the cm to set "logLevel": "verbose" -> "debug" and restart all multus pods

Now the logLever is debug and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep logLevel
        "logLevel": "debug",
% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag and logLevel=debug	232389	320	5d1d3e6a-bfa1-4a4b-bbfc-daedc5605f7d	aws	amd64	SDN	24	245	49586	105314	https://drive.google.com/file/d/1p1PDbnqm0NlWND-komc9jbQ1PyQMeWcV/view?usp=drive_link

Edit

https://github.com/openshift/multus-cni/pull/189

Bug OCPBUGS-3542: Add bootstrapExternalStaticDNS in installer to avoid using bootstrapExternalStaticGateway as DNS

View the Description View the linked PRs

Description of problem:

The bootstrapExternalStaticGateway IP uses as DNS for bootstrap node

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. Deploy baremetal IPI using static boostrap IP.
2. It consumes bootstrapExternalStaticGateway as DNS for the bootstrap node.
3.

Actual results:

Sometimes bootstrapExternalStaticGateway cannot act as DNS

Expected results:

DNS resolution should work on bootstrap if it uses static IP

Additional info:

https://github.com/openshift/installer/pull/6585

Bug OCPBUGS-14008: kube-apiserver on SNO should disallow access before readyz endpoint is passing

View the Description View the linked PRs

https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1684775113222139?thread_ts=1684769886.464419&cid=CB48XQ4KZ

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1500

Bug OCPBUGS-17827: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-4476: On-prem keepalived ingress check is heavy on api-server at scale

View the Description View the linked PRs

Description of problem: While running scale tests of OpenShift on OpenStack at scale, we're seeing it performing significantly worse than on AWS platform for the same number of nodes. More specifically, we're seeing high traffic to API server, and high load for the haproxy pod.

Version-Release number of selected component (if applicable):

All supported versions

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Slack thread at https://coreos.slack.com/archives/CBZHF4DHC/p1669910986729359 provides more info.

https://github.com/openshift/machine-config-operator/pull/3441

Bug MGMT-14656: Boot Order configuration problem during installation: Stuck at 'Pending User Action' OCP 4.13

View the Description View the linked PRs

Description of the problem:

When starting installation where the nodes has multiple disks on 4.13, after reboot the installation might stuck on "pending user action" with the following error:

Expected the host to boot from disk, but it booted the installation image - please reboot and fix the boot order to boot from disk QEMU_HARDDISK 05abcd32e95a61a3 (sda, /dev/disk/by-id/wwn-0x05abcd32e95a61a3).

When running the live-iso with RHEL /dev/sda might actually be vdb.
Since the boot order configuration is usally HD first, machine usually try vda before it moves on to try other boot options (that are not HD).
When installing on /dev/sda (vdb) the machine might not try to boot from the installation disk.

Solution suggestion:
A better way to find vda is by the hctl ( 0:0:0:0 should be /vda)
Action item: in case of libvirt (why not all platforms?) we should update the way we choose the default installation disk and choose the disk with hctl 0:0:0:0 (when it's available...)

How reproducible:

Create nodes with 2 disks and start installation.

Steps to reproduce:

1. Register new cluster

2. Add 6 nodes (3 master + 3 workers) with multiple disks each - might be even reproducible with only 3 masters

3. Start the installation

Note that it might take a few attempts to reproduce this issue

Actual results:

Pending for input

Expected results:

Installation success

Slack thread https://redhat-internal.slack.com/archives/CUPJTHQ5P/p1684317064257809

https://github.com/openshift/assisted-service/pull/5354

Bug OCPBUGS-12576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/151

Bug OCPBUGS-19884: [4.14] Install does not begin if secure boot was enabled for the first time

View the Description View the linked PRs

Description of problem:
If secure boot is currently disabled, and user attempts to enable it via ZTP, install will not begin the first time ZTP was triggered.

When secure boot is enabled viz ZTP, then boot options will be configured before virtual CD was attached, thus first boot will be booting into existing HD with secure boot on. Install will then get stuck because boot from CD was never triggered.

Version-Release number of selected component (if applicable):
4.10

How reproducible:
Always

Steps to Reproduce:
1. Secure boot is currently disabled in bios
2. Attempt to deploy a cluster with secure boot enabled via ZTP
3.

Actual results:

spoke cluster got booted with secure boot option toggled, into existing HD
spoke cluster did not boot into virtual CD, thus install never started.
agentclusterinstall gets stuck here:
State: insufficient
State Info: Cluster is not ready for install

Expected results:

installation started and completed successfully

Additional info:

Secure boot config used in ZTP siteconfig:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/ff814164cdcd355ed980f1edf269dbc2afbe09aa/siteconfig/master-2.yaml#L40

https://github.com/openshift/ironic-image/pull/404

Bug OCPBUGS-12286: Enable/Disable plugin options are not shown on Operator details page

View the Description View the linked PRs

Description of problem:

The option to Enable/Disable a console plugin on Operator details page is not shown any more, it looks like a regression(the option is shown in 4.13)

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-19-125337

How reproducible:

Always

Steps to Reproduce:

1. Subscribe 'OpenShift Data Foundation' Operator from OperatorHub
2. on Operator installation page, we choose 'Disable' plugin
3. once operator is successfully installed, go to Installed Operators list page /k8s/all-namespaces/operators.coreos.com~v1alpha1~ClusterServiceVersion
4. console will show 'Plugin available' button for 'OpenShift Data Foundation' Operator, click on the button and hit 'View operator details', user will be taken to Operator details page

Actual results:

4. in OCP <= 4.13, we will show a 'Console plugin' item where user can Enable/Disable the console plugin operator has bring in

however this option is not shown in 4.14

Expected results:

4. Enable/Disable console plugin should be shown on Operator details page

Additional info:

screen recording https://drive.google.com/drive/folders/1fNlodAg6yUeUqf07BG9scvwHlzAwS-Ao?usp=share_link

https://github.com/openshift/console/pull/12766

Bug OCPBUGS-17284: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/48

Bug OCPBUGS-9835: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2012

Bug MGMT-14481: [4.13] logs not sent during installation on rhel9.2

View the Description View the linked PRs

Description of the problem:
https://redhat-internal.slack.com/archives/C01QX5JEDP0/p1682946068422739?thread_ts=1682945335.566899&cid=C01QX5JEDP0

post installation . downloaded collected logs and agent logs are emtpy.
attaching logs.

https://github.com/openshift/assisted-installer-agent/pull/538

Bug OCPBUGS-12869: Fix nmstate related unit tests

View the Description View the linked PRs

Description of problem:

Due to a CI configuration issue (lack of nmstatectl in the image), the current CI unit-test job skips silently those unit tests requiring nmstatectl.

Version-Release number of selected component (if applicable):

How reproducible:

hack/go-test.sh

Steps to Reproduce:

1.
2.
3.

Actual results:

Unit tests are failing

Expected results:

No failure

Additional info:

https://github.com/openshift/installer/pull/7089

Bug OCPBUGS-13662: New install-config fields in 4.13 are ignored without warning

View the Description View the linked PRs

The following install-config fields are new in 4.13:

cpuPartitioning
~~platform.baremetal.loadBalancer~~
~~platform.vsphere.loadBalancer~~

These fields are ignored by the agent-based installation method. Until such time as they are implemented, we should print a warning if they are set to non-default values, as we do for other fields that are ignored.

https://github.com/openshift/installer/pull/7218

Bug OCPBUGS-17595: Updating YAML from console shows error

View the Description View the linked PRs

Description of problem:

After a component is ready, if we edit the component YAML from the console, it shows a stream of error. The YAML does get updated but the error goes away only after multiple reload.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Deploy a pod/deployment
2. After they are seen ready, update the YAML from console
3. Error is seen

Actual results:

Expected results:

No error

Additional info:

https://github.com/openshift/console/pull/13090

Bug OCPBUGS-10165: Update 4.14 ose-machine-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1127

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1127

Bug OCPBUGS-10171: Update 4.14 ose-cluster-machine-approver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/180

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/194

Bug OCPBUGS-17943: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/767

Bug OCPBUGS-1115: Extracting the cli in darwin from a multi payload leads to "filtered all images from manifest list"

View the Description View the linked PRs

Description of problem:

Extracting the cli in darwin from a multi payload leads to "filtered all images from manifest list"

Version-Release number of selected component (if applicable):

Tested with oc4.11

How reproducible:

Always on Darwin machines

Steps to Reproduce:

1.oc adm release extract --command=oc quay.io/openshift-release-dev/ocp-release:4.11.4-multi -v5

Actual results:

I0909 18:37:28.591323   37669 config.go:127] looking for config.json at /Users/lwan/.docker/config.jsonI0909 18:37:28.591601   37669 config.go:135] found valid config.json at /Users/lwan/.docker/config.jsonWarning: the default reading order of registry auth file will be changed from "${HOME}/.docker/config.json" to podman registry config locations in the future version of oc. "${HOME}/.docker/config.json" is deprecated, but can still be used for storing credentials as a fallback. See https://github.com/containers/image/blob/main/docs/containers-auth.json.5.md for the order of podman registry config locations.I0909 18:37:30.391895   37669 client_mirrored.go:174] Attempting to connect to quay.io/openshift-release-dev/ocp-releaseI0909 18:37:30.696483   37669 client_mirrored.go:412] get manifest for sha256:53679d92dc0aea8ff6ea4b6f0351fa09ecc14ee9eda1b560deeb0923ca2290a1 served from registryclient.retryManifest{ManifestService:registryclient.manifestServiceVerifier{ManifestService:(*client.manifests)(0x14000a36330)}, repo:(*registryclient.retryRepository)(0x14000f46e80)}: <nil>I0909 18:37:30.696738   37669 manifest.go:405] Skipping image sha256:fcf4d95df9a189527453d8961a22a3906514f5ecbb05afbcd0b2cdd212aab1a2 for manifestlist.PlatformSpec{Architecture:"amd64", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.696843   37669 manifest.go:405] Skipping image sha256:1992a4713410b7363ae18b0557a7587eb9e0d734c5f0f21fb1879196f40233a3 for manifestlist.PlatformSpec{Architecture:"ppc64le", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.696869   37669 manifest.go:405] Skipping image sha256:3698082cd66e90d2b79b62d659b4e7399bfe0b86c05840a4c31d3197cdac4bfa for manifestlist.PlatformSpec{Architecture:"s390x", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.697106   37669 manifest.go:405] Skipping image sha256:15fc18c81f053cad15786e7a52dc8bff29e647ea642b3e1fabf2621953f727eb for manifestlist.PlatformSpec{Architecture:"arm64", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.697570   37669 workqueue.go:143] about to send work queue error: unable to read image quay.io/openshift-release-dev/ocp-release:4.11.4-multi: filtered all images from manifest listerror: unable to read image quay.io/openshift-release-dev/ocp-release:4.11.4-multi: filtered all images from manifest list

Expected results:

The darwin/$(uname -m) cli is extracted

Additional info:

Are we re-using some function from the `oc mirror` feature to select the manifest to use? It's like it is looking for a "darwin/$(uname -m)" and filter-out all the available linux manifests.

https://github.com/openshift/oc/pull/1311

Bug OCPBUGS-19319: agent-tui failure blocks ssh + console login

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19037~~. The following is the description of the original issue:
—
The agent-interactive-console service is required by both sshd and systemd-logind, so if it exits with an error code there is no way to connect or log in to the box to debug.

https://github.com/openshift/installer/pull/7497

Bug OCPBUGS-9072: Metal Day-1 When No Hostname is Provided by Either rDNS or DHCP, All Hosts are Named "localhost".

View the Description View the linked PRs

Platform:

IPI on Baremetal

What happened?

In cases where no hostname is provided, host are automatically assigned the name "localhost" or "localhost.localdomain".

[kni@provisionhost-0-0 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready master 31m v1.22.1+6859754
master-0-1 Ready master 39m v1.22.1+6859754
master-0-2 Ready master 39m v1.22.1+6859754
worker-0-0 Ready worker 12m v1.22.1+6859754
worker-0-1 Ready worker 12m v1.22.1+6859754

What did you expect to happen?

Having all hosts come up as localhost is the worst possible user experience, because they'll fail to form a cluster but you won't know why.

However, we know the BMH name in the image-customization-controller, it would be possible to configure the ignition to set a default hostname if we don't have one from DHCP/DNS.

If not, we should at least fail the installation with a specific error message to this situation.

----------
30/01/22 - adding how to reproduce
----------

How to Reproduce:

1)prepare and installation with day-1 static ip.

add to install-config uner one of the nodes:
networkConfig:
routes:
config:

destination: 0.0.0.0/0
next-hop-address: 192.168.123.1
next-hop-interface: enp0s4
dns-resolver:
config:
server:
192.168.123.1
interfaces:
name: enp0s4
type: ethernet
state: up
ipv4:
address:
ip: 192.168.123.110
prefix-length: 24
enabled: true

2)Ensure a DNS PTR for the address IS NOT configured.

3)create manifests and cluster from install-config.yaml

installation should either:
1)fail as early as possible, and provide some sort of feed back as to the fact that no hostname was provided.
2)derive the Hostname from the bmh or the ignition files

Bug OCPBUGS-10163: Update 4.14 openshift-enterprise-egress-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/131

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/131

Bug OCPBUGS-10176: Update 4.14 openshift-enterprise-keepalived-ipfailover image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/132

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/132

Bug OCPBUGS-11304: Failing test [bz-Machine Config Operator] Nodes should reach OSUpdateStaged in a timely fashion

View the Description View the linked PRs

Description of problem:

Nodes are taking more than 5m0s to stage OSUpdate

https://sippy.dptools.openshift.org/sippy-ng/tests/4.13/analysis?test=%5Bbz-Machine%20Config%20Operator%5D%20Nodes%20should%20reach%20OSUpdateStaged%20in%20a%20timely%20fashion 

Test started failing back on 2/16/2023. First occurrence of the failure https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-aws-sdn-upgrade/1626326464246845440 

Most recent occurrences across multiple platforms https://search.ci.openshift.org/?search=Nodes+should+reach+OSUpdateStaged+in+a+timely+fashion&maxAge=48h&context=1&type=junit&name=4.13&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

6 nodes took over 5m0s to stage OSUpdate:node/ip-10-0-216-81.ec2.internal OSUpdateStarted at 2023-02-16T22:24:56Z, did not make it to OSUpdateStaged
node/ip-10-0-174-123.ec2.internal OSUpdateStarted at 2023-02-16T22:13:07Z, did not make it to OSUpdateStaged
node/ip-10-0-144-29.ec2.internal OSUpdateStarted at 2023-02-16T22:12:50Z, did not make it to OSUpdateStaged
node/ip-10-0-179-251.ec2.internal OSUpdateStarted at 2023-02-16T22:15:48Z, did not make it to OSUpdateStaged
node/ip-10-0-180-197.ec2.internal OSUpdateStarted at 2023-02-16T22:19:07Z, did not make it to OSUpdateStaged
node/ip-10-0-213-155.ec2.internal OSUpdateStarted at 2023-02-16T22:19:21Z, did not make it to OSUpdateStaged}

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3695

Bug OCPBUGS-13131: Update 4.14 ose-cluster-capi-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/112

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/112

Bug OCPBUGS-13960: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-13253~~.

https://github.com/openshift/installer/pull/7247

Task HOSTEDCP-978: Bump openshift/api version and fix KCM flags

View the Description View the linked PRs

follow-up fixes after the bump of k8 to 1.27 openshift/api#1424

https://github.com/openshift/hypershift/pull/2519

Bug OCPBUGS-18794: Panic detected in pod on 4.14 PowerVS CI runs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18103~~. The following is the description of the original issue:
—
Description:

Now that the huge e2e test case failures in CI jobs is resolved in the recent jobs observed a Undiagnosed panic detected in pod issue.

JobLink

Error:

{ pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686400 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686630 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

Some Observations:
1)While starting ImageConfigController it Failed to watch *v1.Route: as the server could not find the requested resource",

2)which eventually lead sync problem "E0825 01:26:52.428694 1 clusteroperator.go:104] unable to sync ClusterOperatorStatusController: config.imageregistry.operator.openshift.io "cluster" not found, requeuing"

3)and then while creating deployment resource for "cluster-image-registry-operator" it caused a panic error: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference):"

https://github.com/openshift/cluster-image-registry-operator/pull/917

Bug MGMT-15070: [Stage-UI] Unable to change machine-network when multiple interfaces

View the Description View the linked PRs

Description of the problem:

When installing a cluster and we have multiple networks, we can not change the machine network from UI ( its not changed to the new machine network) but when installing it shows the chosen network.

from customer view :

he choose machine network , its in the list but never shown as chosen but actually it appears when installing.

How reproducible:

Always

Steps to reproduce:

Install cluster , mutiple networks

Try to change machine network -> does not work

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5349

Bug OCPBUGS-10133: Update 4.14 ose-openstack-cinder-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-openstack/pull/187

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-openstack/pull/187

Bug OCPBUGS-16148: There is no warning info when add storage to deployment without setting existing pvc name

View the Description View the linked PRs

Description of problem:

On add storage page, if user choose use existing pvc, but leave the pvc name empty, after other fields are filled, click "Save", there is not warning info about the pvc name field. The loading dot icons are shown under "Save" button.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-12-124310

How reproducible:

Always

Steps to Reproduce:

1.Create a deployment.
2.Click "Add Storage" item in action list of the deployment
3.Choose "Use existing claim", but leave it empty.
4.Set mount dir and click "Save".

Actual results:

4. There is not warning info about the empty pvc name.

Expected results:

4. Should show info for the field:"Please fill out this field"

Additional info:

https://github.com/openshift/console/pull/13010

Bug MGMT-13746: Ignition image size should be validated on create/update InfraEnv

View the Description View the linked PRs

Description of the problem:

When creating/updating an InfraEnv, the size of compressed ignition should be validated.
I.e. the service should generate the entire ignition for each request, compress it (as done in ignition Archive), and ensure its size is up to 256KiB.

Notes:

The validation added by MGMT-13008 is performed directly on the `IgnitionConfigOverride` property. Thus, the validation isn't accurate as it should be done on the entire generated ignition config.
See full discussion here.
Related issue: MGMT-13643

How reproducible:

100%

Steps to reproduce:

1. Register an InfraEnv that would result with an ignition archive larger than 256KIB.
E.g. Invoke 'POST /v2/infra-envs' with large values in body (infra-env-create-params)

Actual results:

Expected results:

The request should fail with an error message explaining the generated ignition archive is too large.

https://github.com/openshift/assisted-service/pull/5273

Bug OCPBUGS-11371: oc-mirror fails to complete with heads only complaining about devworkspace-operator

View the Description View the linked PRs

Description of problem:

oc-mirror fails to complete with heads only complaining about devworkspace-operator

Version-Release number of selected component (if applicable):

# oc-mirror version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.12.0-202302280915.p0.g3d51740.assembly.stream-3d51740", GitCommit:"3d517407dcbc46ededd7323c7e8f6d6a45efc649", GitTreeState:"clean", BuildDate:"2023-03-01T00:20:53Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

Attempt a headsonly mirroring for registry.redhat.io/redhat/redhat-operator-index:v4.10

Steps to Reproduce:

1. Imageset currently:

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  registry:
    imageURL: myregistry.mydomain:5000/redhat-operators
    skipTLS: false
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.10

2.$ oc mirror --config=./imageset-config.yml docker://otherregistry.mydomain:5000/redhat-operators

Checking push permissions for otherregistry.mydomain:5000
Found: oc-mirror-workspace/src/publish
Found: oc-mirror-workspace/src/v2
Found: oc-mirror-workspace/src/charts
Found: oc-mirror-workspace/src/release-signatures
WARN[0026] DEPRECATION NOTICE:
Sqlite-based catalogs and their related subcommands are deprecated. Support for
them will be removed in a future release. Please migrate your catalog workflows
to the new file-based catalog format. 

The rendered catalog is invalid.

Run "oc-mirror list operators --catalog CATALOG-NAME --package PACKAGE-NAME" for more information.  

error: error generating diff: channel fast: head "devworkspace-operator.v0.19.1-0.1679521112.p" not reachable from bundle "devworkspace-operator.v0.19.1"

Actual results:

error: error generating diff: channel fast: head "devworkspace-operator.v0.19.1-0.1679521112.p" not reachable from bundle "devworkspace-operator.v0.19.1"

Expected results:

For the catalog to be mirrored.

https://github.com/openshift/oc-mirror/pull/608

Bug OCPBUGS-15728: Machine config drifts when deploying with platform external

View the Description View the linked PRs

Description of problem:

When deploying with external platform, the reported state of the machine config pool is degraded, and we can observe a drift in the configuration:

$ diff /etc/mcs-machine-config-content.json ~/rendered-master-1b6aab788192600896f36c5388d48374
<                         "contents": "[Unit]\nDescription=Kubernetes Kubelet\nWants=rpc-statd.service network-online.target\nRequires=crio.service kubelet-auto-node-size.service\nAfter=network-online.target crio.service kubelet-auto-node-size.service\nAfter=ostree-finalize-staged.service\n\n[Service]\nType=notify\nExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests\nExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state\nExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state\nEnvironmentFile=/etc/os-release\nEnvironmentFile=-/etc/kubernetes/kubelet-workaround\nEnvironmentFile=-/etc/kubernetes/kubelet-env\nEnvironmentFile=/etc/node-sizing.env\n\nExecStart=/usr/local/bin/kubenswrapper \\\n    /usr/bin/kubelet \\\n      --config=/etc/kubernetes/kubelet.conf \\\n      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \\\n      --kubeconfig=/var/lib/kubelet/kubeconfig \\\n      --container-runtime-endpoint=/var/run/crio/crio.sock \\\n      --runtime-cgroups=/system.slice/crio.service \\\n      --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \\\n      --node-ip=${KUBELET_NODE_IP} \\\n      --minimum-container-ttl-duration=6m0s \\\n      --cloud-provider=external \\\n      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \\\n       \\\n      --hostname-override=${KUBELET_NODE_NAME} \\\n      --provider-id=${KUBELET_PROVIDERID} \\\n      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \\\n      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bde9fb486f1e8369b465a8c0aff7152c2a1f5a326385ee492140592b506638d6 \\\n      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \\\n      --v=${KUBELET_LOG_LEVEL}\n\nRestart=always\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\n",
---
>                         "contents": "[Unit]\nDescription=Kubernetes Kubelet\nWants=rpc-statd.service network-online.target\nRequires=crio.service kubelet-auto-node-size.service\nAfter=network-online.target crio.service kubelet-auto-node-size.service\nAfter=ostree-finalize-staged.service\n\n[Service]\nType=notify\nExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests\nExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state\nExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state\nEnvironmentFile=/etc/os-release\nEnvironmentFile=-/etc/kubernetes/kubelet-workaround\nEnvironmentFile=-/etc/kubernetes/kubelet-env\nEnvironmentFile=/etc/node-sizing.env\n\nExecStart=/usr/local/bin/kubenswrapper \\\n    /usr/bin/kubelet \\\n      --config=/etc/kubernetes/kubelet.conf \\\n      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \\\n      --kubeconfig=/var/lib/kubelet/kubeconfig \\\n      --container-runtime-endpoint=/var/run/crio/crio.sock \\\n      --runtime-cgroups=/system.slice/crio.service \\\n      --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \\\n      --node-ip=${KUBELET_NODE_IP} \\\n      --minimum-container-ttl-duration=6m0s \\\n      --cloud-provider= \\\n      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \\\n       \\\n      --hostname-override=${KUBELET_NODE_NAME} \\\n      --provider-id=${KUBELET_PROVIDERID} \\\n      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \\\n      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bde9fb486f1e8369b465a8c0aff7152c2a1f5a326385ee492140592b506638d6 \\\n      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \\\n      --v=${KUBELET_LOG_LEVEL}\n\nRestart=always\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\n",


the difference is --cloud-provider=external /--cloud-provider= is the flags passed to the kubelet.


We also observe the following log in the MCC:
W0629 09:57:44.583046       1 warnings.go:70] unknown field "spec.infra.status.platformStatus.external.cloudControllerManager"


"spec.infra.status.platformStatus.external.cloudControllerManager" is basically the flag in the Infrastructure object that enables the external platform.

Version-Release number of selected component (if applicable):

4.14 nightly

How reproducible:

Always when platform is external

Steps to Reproduce:

1. Deploy a cluster with the external platform enabled, the featureSet TechPreviewNoUpgrade should be set and the Infrastructure object should look like:

apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-06-28T10:37:12Z"
  generation: 1
  name: cluster
  resourceVersion: "538"
  uid: 57e09773-0eca-4767-95ce-8ec7d0f2cdae
spec:
  cloudConfig:
    name: ""
  platformSpec:
    external:
      platformName: oci
    type: External
status:
  apiServerInternalURI: https://api-int.test-infra-cluster-3cd17632.assisted-ci.oci-rhelcert.edge-sro.rhecoeng.com:6443
  apiServerURL: https://api.test-infra-cluster-3cd17632.assisted-ci.oci-rhelcert.edge-sro.rhecoeng.com:6443
  controlPlaneTopology: HighlyAvailable
  cpuPartitioning: None
  etcdDiscoveryDomain: ""
  infrastructureName: test-infra-cluster-3c-pqqqm
  infrastructureTopology: HighlyAvailable
  platform: External
  platformStatus:
    external:
      cloudControllerManager:
        state: External
    type: External
2. Observe the drift with: oc get mcp

Actual results:

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      138m
worker   rendered-worker-d48036fe2b657e6c71d5d1275675fefc   True      False      False      3              3                   3                     0                      138m

Expected results:

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-2ff4e25f807ef3b20b7c6e0c6526f05d   True      False      False      3              3                   3                     0                      33m
worker   rendered-worker-48b7f39d78e3b1d94a0aba1ef4358d01   True      False      False      3              3                   3                     0                      33m

Additional info:

https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1688035248716119

https://github.com/openshift/machine-config-operator/pull/3773

Story MCO-687: TestMetrics e2e not cleaning up correctly after itself

View the Description View the linked PRs

The TestMetrics e2e test is not correctly cleaning up the MachineConfigs and MachineConfigPools it creates. This means that other e2e tests which run after this e2e test can falsely fail or become flaky.

What's happening is this:

The target node is removed from the ephemeral MachineConfigPool by unlabelling it.
A race condition occurs when we call WaitForPoolComplete because technically, the pool is updated at this point since it has not yet picked up the unlabelling event from the target node.
We delete the ephemeral MachineConfigPool, which deletes the rendered MachineConfigs that belong to it.
The node starts the update process, but cannot find the rendered MachineConfigs for the ephemeral pool since they were deleted. The MCD degrades at this point and blocks the worker MachineConfigPool.

The cleanup flow should look like this:

The target node is removed from the ephemeral MachineConfigPool by unlabeling it.
Wait until the target node completes the switch back to the worker pool.
Delete the ephemeral MachineConfigPool that was created for the test.
Delete any MachineConfigs assigned to that ephemeral MachineConfigPool.

https://github.com/openshift/machine-config-operator/pull/3813

Bug OCPBUGS-13621: Singular Ingress and API cluster VIPs cannot be removed via cluster update API

View the Description View the linked PRs

Description of problem:

a cluster update request with empty strings for api_vip and ingress_vip will not remove the cluster vips.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. See the following test: https://gist.github.com/nmagnezi/4a3dad01ee197d3984fa7a0604b62cc0
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/assisted-service/pull/5216

Bug OCPBUGS-5360: Re-enable operator-install-single-namespace.spec.ts test

View the Description View the linked PRs

Description of problem:

https://issues.redhat.com//browse/OCPBUGS-5287 disabled the test due to https://issues.redhat.com/browse/THREESCALE-9015.  Once https://issues.redhat.com/browse/THREESCALE-9015 is resolved, need to re-enable the test.

https://github.com/openshift/console/pull/12424

Bug OCPBUGS-12609: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/69

Bug OCPBUGS-1684: collect-profiles pods causing regular CPU bursts

View the Description View the linked PRs

Description of problem:

After an upgrade from 4.9 to 4.10 collect+ process causing  CPU bursts of 5-6 seconds every 15 minutes regularly. During each burst collect+ consume 100% CPU.

Top Command Dump Sample:
top - 07:00:04 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.3 us,  4.5 sy,  0.0 ni, 80.8 id,  7.4 wa,  0.8 hi,  0.3 si,  0.0 st
MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2009 root      20   0 3741252 172136  71396 S  12.9   0.5  36:42.79 kubelet
   1954 root      20   0 2663680 130928  46156 S   7.9   0.4   6:50.44 crio
   9440 root      20   0 1633728 546036  60836 S   7.9   1.7  21:06.80 fluentd
      1 root      20   0  238416  15412   8968 S   5.9   0.0   1:56.73 systemd
   1353 800       10 -10  796808 165380  40916 S   5.0   0.5   2:32.11 ovs-vsw+
   5454 root      20   0 1729112  73680  37404 S   2.0   0.2   3:52.21 coredns
1061248 1000360+  20   0 1113524  24304  17776 S   2.0   0.1   0:00.03 collect+
    306 root       0 -20       0      0      0 I   1.0   0.0   0:00.37 kworker+
    957 root      20   0  264076 126280 119596 S   1.0   0.4   0:06.80 systemd+
   1114 dbus      20   0   83188   6224   5140 S   1.0   0.0   0:04.30 dbus-da+
   5710 root      20   0  406004  31384  15068 S   1.0   0.1   0:04.11 tuned
   6198 nobody    20   0 1632272  46588  20516 S   1.0   0.1   0:17.60 network+
1061291 1000650+  20   0   11896   2748   2496 S   1.0   0.0   0:00.01 bash
1061355 1000650+  20   0   11896   2868   2616 S   1.0   0.0   0:00.01 bashtop - 07:00:05 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 248 total,   2 running, 245 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.4 us,  2.0 sy,  0.0 ni, 81.5 id,  4.2 wa,  0.6 hi,  0.2 si,  0.0 st
MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  36464  21300 S  74.3   0.1   0:00.78 collect+
   9440 root      20   0 1633728 545412  60900 S  11.9   1.7  21:06.92 fluentd
   2009 root      20   0 3741252 172396  71396 S   4.0   0.5  36:42.83 kubelet
      1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.74 systemd
    300 root       0 -20       0      0      0 I   1.0   0.0   0:00.46 kworker+
   1427 root      20   0   19656   2204   2064 S   1.0   0.0   0:01.55 agetty
   2419 root      20   0 1714748  38812  22884 S   1.0   0.1   0:24.42 coredns+
   2528 root      20   0 1634680  36464  20628 S   1.0   0.1   0:22.01 dynkeep+
1009372 root      20   0       0      0      0 I   1.0   0.0   0:00.42 kworker+
1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.56 toptop - 07:00:06 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu(s): 15.3 us,  1.5 sy,  0.0 ni, 82.7 id,  0.1 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  35740  21428 S  99.0   0.1   0:01.78 collect+
   2009 root      20   0 3741252 172396  71396 S   3.0   0.5  36:42.86 kubelet
   9440 root      20   0 1633728 545076  60900 S   2.0   1.7  21:06.94 fluentd
   1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.12 ovs-vsw+
   1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.45 crio top - 07:00:07 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.7 us,  1.1 sy,  0.0 ni, 83.6 id,  0.1 wa,  0.4 hi,  0.1 si,  0.0 st
MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  35236  21492 S 102.0   0.1   0:02.80 collect+
   2009 root      20   0 3741252 172660  71396 S   7.0   0.5  36:42.93 kubelet
   3288 nobody    20   0  718964  30648  11680 S   3.0   0.1   3:36.84 node_ex+
      1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.75 systemd
   1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.13 ovs-vsw+
   1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.46 crio
   5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.22 coredns
   9440 root      20   0 1633728 545080  60900 S   1.0   1.7  21:06.95 fluentd
1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.57 toptop - 07:00:08 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   2 running, 245 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.2 us,  0.9 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  35164  21492 S 100.0   0.1   0:03.81 collect+
   2009 root      20   0 3741252 172660  71396 S   3.0   0.5  36:42.96 kubelet
1061543 1000650+  20   0   34564   9804   5772 R   3.0   0.0   0:00.03 python
   9440 root      20   0 1633728 543952  60900 S   2.0   1.7  21:06.97 fluentd
1053353 root      20   0   50200   4012   3292 R   2.0   0.0   0:01.59 top
   2330 root      20   0 1654612  61260  34720 S   1.0   0.2   0:55.81 coredns
   8023 root      20   0   12056   3044   2580 S   1.0   0.0   0:24.59 install+top - 07:00:09 up 10:10,  0 users,  load average: 0.34, 0.27, 0.28
Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.9 us,  3.2 sy,  0.0 ni, 85.6 id,  1.5 wa,  0.5 hi,  0.2 si,  0.0 st
MiB Mem :  32151.9 total,  22621.0 free,   2160.5 used,   7370.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29441.9 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2009 root      20   0 3741252 172660  71396 S   5.0   0.5  36:43.01 kubelet
   9440 root      20   0 1633728 542684  60900 S   4.0   1.6  21:07.01 fluentd
   1353 800       10 -10  796808 165380  40916 S   2.0   0.5   2:32.15 ovs-vsw+
      1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.76 systemd
   1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.47 crio
   5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.23 coredns
   6198 nobody    20   0 1632272  45936  20516 S   1.0   0.1   0:17.61 network+
   7016 root      20   0   12052   3204   2736 S   1.0   0.0   0:24.19 install+

Version-Release number of selected component (if applicable):

How reproducible:

Lab environment does not present same behavior.

Steps to Reproduce:

1.
2.
3.

Actual results:

Regular high CPU spikes

Expected results:

No CPU spikes

Additional info:

Provided logs:
1-) top command dump uploaded to SF case 03317387
2-) must-gather uploaded to SF case 03317387

https://github.com/openshift/operator-framework-olm/pull/486

Bug OCPBUGS-17363: BMH is not reconciled on Secret change

View the Description View the linked PRs

When we update a Secret referenced in the BareMetalHost, an immediate reconcile of the corresponding BMH is not triggered. In most states we requeue each CR after a timeout, so we should eventually see the changes.

In the case of BMC Secrets, this has been broken since the fix for ~~OCPBUGS-1080~~ in 4.12.

https://github.com/openshift/baremetal-operator/pull/296

Bug OCPBUGS-8268: OpenShift pipeline TaskRun(s) column Duration is not present as column in UI

View the Description View the linked PRs

Description of problem:

PipelineRun has Duration column and inside it TaskRun - doesn't

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Have OpenShift Pipeline with 2+ tasks configured and invoked

Steps to Reproduce:

1. Once PipelineRun is invoked - navigate to invoked TaskRuns
2. You will see there columns like Status, Started, but no Duration

Actual results:

Expected results:

Additional info:

I'll add screenshots for PipelineRuns and TaskRuns

https://github.com/openshift/console/pull/12633

Bug OCPBUGS-12098: Update 4.14 ose-nutanix-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-nutanix/pull/47

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-nutanix/pull/47

Bug OCPBUGS-14049: OpenShift on OpenStack: Password Rotation of OSP User still leads to unknown authentication failures in Keystone

View the Description View the linked PRs

Description of problem:

After all cluster operators have reconciled after the password rotation, we can still see authentication failures in keystone (attached screenshot of splunk query)

Version-Release number of selected component (if applicable):

Environment:
- OpenShift 4.12.10 on OpenStack 16
- The cluster is managed via RHACM, but password rotation shall be done via "regular"  OpenShift means.

How reproducible:

Rotated the OpenStack credentials according to the documentation [1]

[1] https://docs.openshift.com/container-platform/4.12/authentication/managing_cloud_provider_credentials/cco-mode-passthrough.html#manually-rotating-cloud-creds_cco-mode-passthrough

Additional info:

- we can't trace back where these authentication failures come from - they do disappear after a cluster upgrade (so when nodes are rebooted and all pods are restarted which indicates that there's still a component using the old credentials)
- The relevant technical integration points _seem_ to be working though (LBaaS, CSI, Machine API, Swift)

What is the business impact? Please also provide timeframe information.

- We cannot rely on splunk monitoring for authentication issues since it's currently constantly showing authentication errors - We cannot be entirely sure that everything works as expected since we don't know the component that doesn't seem to use the new credentials

Bug OCPBUGS-17497: openshift-tests unable to run on OCP cluster on Power platform

View the Description View the linked PRs

Description of problem:

E2E test suite is getting failed with below error -

Falling back to built-in suite, failed reading external test suites: unable to extract k8s-tests binary: failed extracting "/usr/bin/k8s-tests" from "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f98d9998691052cb8049f806f8c1dc9a6bac189c10c33af9addd631eedfb5528": exit status 1
No manifest filename passed

Version-Release number of selected component (if applicable):

4.14

How reproducible:

So far with 4.14 clusters on Power

Steps to Reproduce:

1. Deploy 4.14 cluster on Power
2. Run e2e test suite from - https://github.com/openshift/origin
3. Monitor e2e

Actual results:

E2E test failed

Expected results:

E2E should pass

Additional info:

./openshift-tests run -f ./test-suite.txt -o /tmp/conformance-parallel-out.txt
warning: KUBE_TEST_REPO_LIST may not be set when using openshift-tests and will be ignored
openshift-tests version: v4.1.0-6960-gd9cf51f
  Aug  9 00:48:21.959: INFO: Enabling in-tree volume drivers
Attempting to pull tests from external binary...
Falling back to built-in suite, failed reading external test suites: unable to extract k8s-tests binary: failed extracting "/usr/bin/k8s-tests" from "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f98d9998691052cb8049f806f8c1dc9a6bac189c10c33af9addd631eedfb5528": exit status 1
creating a TCP service service-test with type=LoadBalancer in namespace e2e-service-lb-test-bvmbl
  Aug  9 00:48:35.424: INFO: Waiting up to 15m0s for service "service-test" to have a LoadBalancer
  Aug  9 00:48:36.272: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/new started responding to GET requests over new connections
  Aug  9 00:48:36.272: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/reused started responding to GET requests over reused connections
  Aug  9 00:48:36.310: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/new started responding to GET requests over new connections
  Aug  9 00:48:36.310: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/reused started responding to GET requests over reused connections
  Aug  9 01:04:07.507: INFO: disruption/ci-cluster-network-liveness connection/reused started responding to GET requests over reused connections
  Aug  9 01:04:07.507: INFO: disruption/ci-cluster-network-liveness connection/new started responding to GET requests over new connections
Starting SimultaneousPodIPController
  I0809 01:04:37.551879  134117 shared_informer.go:311] Waiting for caches to sync for SimultaneousPodIPController
  Aug  9 01:04:37.558: INFO: ns/openshift-image-registry route/test-disruption-reused disruption/image-registry connection/reused started responding to GET requests over reused connections
  Aug  9 01:04:37.624: INFO: disruption/cache-kube-api connection/new started responding to GET requests over new connections
  E0809 01:04:37.719406  134117 shared_informer.go:314] unable to sync caches for SimultaneousPodIPControllerSuite run returned error: error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the condition
disruption/kube-api connection/new producer sampler context is done
disruption/cache-kube-api connection/reused producer sampler context is done
disruption/oauth-api connection/new producer sampler context is done
disruption/oauth-api connection/reused producer sampler context is done
ERRO[0975] disruption sample failed: context canceled    auditID=464fb276-71b0-48bf-8fb4-3099ae37cedf backend=oauth-api type=reused
disruption/cache-kube-api connection/new producer sampler context is done
disruption/openshift-api connection/reused producer sampler context is done
disruption/cache-openshift-api connection/reused producer sampler context is done
ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/new producer sampler context is done
ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/reused producer sampler context is done
ns/openshift-console route/console disruption/ingress-to-console connection/new producer sampler context is done
disruption/ci-cluster-network-liveness connection/reused producer sampler context is done
disruption/ci-cluster-network-liveness connection/new producer sampler context is done
ns/openshift-image-registry route/test-disruption-new disruption/image-registry connection/new producer sampler context is done
ns/openshift-image-registry route/test-disruption-reused disruption/image-registry connection/reused producer sampler context is done
ns/openshift-console route/console disruption/ingress-to-console connection/reused producer sampler context is done
disruption/kube-api connection/reused producer sampler context is done
disruption/openshift-api connection/new producer sampler context is done
disruption/cache-openshift-api connection/new producer sampler context is done
disruption/cache-oauth-api connection/reused producer sampler context is done

disruption/cache-oauth-api connection/new producer sampler context is done
Shutting down SimultaneousPodIPController
SimultaneousPodIPController shut down
No manifest filename passed
error running options: error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the conditionerror: error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the condition

https://github.com/openshift/origin/pull/28180

Bug OCPBUGS-8007: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/515

Bug OCPBUGS-19664: Installed Operators page crashes with "Oh no! Something went wrong." error

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-11286~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

OCP 4.13.0-0.nightly-2023-03-23-204038
ODF 4.13.0-121.stable

How reproducible:

Steps to Reproduce:

1. Installed ODF over OCP, everything was fine on the Installed Operators page.
2. Later when checked Installed Operators page, it crashed with "Oh no! Something went wrong" error.
3.

Actual results:

 Installed Operators page crashes with "Oh no! Something went wrong." error

Expected results:

 Installed Operators page shouldn't crash

Component and Stack trace logs from the console page- http://pastebin.test.redhat.com/1096522

Additional info:

https://github.com/openshift/console/pull/13187

Bug OCPBUGS-13649: Object count quotas do not work for certain objects in ClusterResourceQuotas

View the Description View the linked PRs

Description of problem:

Customer has noticed that object count quotas ("count/*") do not work for certain objects in ClusterResourceQuotas. For example, the following ResourceQuota works as expected:

~~~
apiVersion: v1
kind: ResourceQuota
metadata:
[..]
spec:
  hard:
    count/routes.route.openshift.io: "900"
    count/servicemonitors.monitoring.coreos.com: "100"
    pods: "100"
status:
  hard:
    count/routes.route.openshift.io: "900"
    count/servicemonitors.monitoring.coreos.com: "100"
    pods: "100"
  used:
    count/routes.route.openshift.io: "0"
    count/servicemonitors.monitoring.coreos.com: "1"
    pods: "4"
~~~

However when using "count/servicemonitors.monitoring.coreos.com" in ClusterResourceQuotas, this does not work (note the missing "used"):

~~~
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
[..]
spec:
  quota:
    hard:
      count/routes.route.openshift.io: "900"
      count/servicemonitors.monitoring.coreos.com: "100"
      count/simon.krenger.ch: "100"
      pods: "100"
  selector:
    annotations:
      openshift.io/requester: kube:admin
status:
  namespaces:
[..]
  total:
    hard:
      count/routes.route.openshift.io: "900"
      count/servicemonitors.monitoring.coreos.com: "100"
      count/simon.krenger.ch: "100"
      pods: "100"
    used:
      count/routes.route.openshift.io: "0"
      pods: "4"
~~~

This behaviour does not only apply to "servicemonitors.monitoring.coreos.com" objects, but also to other objects, such as:

- count/kafkas.kafka.strimzi.io: '0' - count/prometheusrules.monitoring.coreos.com: '100' - count/servicemonitors.monitoring.coreos.com: '100' 

The debug output for kube-controller-manager shows the following entries, which may or may not be related:

~~~
$ oc logs kube-controller-manager-ip-10-0-132-228.eu-west-1.compute.internal | grep "servicemonitor" I0511 15:07:17.297620 1 patch_informers_openshift.go:90] Couldn't find informer for monitoring.coreos.com/v1, Resource=servicemonitors I0511 15:07:17.297630 1 resource_quota_monitor.go:181] QuotaMonitor using a shared informer for resource "monitoring.coreos.com/v1, Resource=servicemonitors" I0511 15:07:17.297642 1 resource_quota_monitor.go:233] QuotaMonitor created object count evaluator for servicemonitors.monitoring.coreos.com [..] I0511 15:07:17.486279 1 patch_informers_openshift.go:90] Couldn't find informer for monitoring.coreos.com/v1, Resource=servicemonitors I0511 15:07:17.486297 1 graph_builder.go:176] using a shared informer for resource "monitoring.coreos.com/v1, Resource=servicemonitors", kind "monitoring.coreos.com/v1, Kind=ServiceMonitor" ~~~

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.15

How reproducible:

Always

Steps to Reproduce:

1. On an OCP 4.12 cluster, create the following ClusterResourceQuota:

~~~
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
  name: case-03509174
spec:
  quota: 
    hard:
      count/servicemonitors.monitoring.coreos.com: "100"
      pods: "100"
  selector:
    annotations: 
      openshift.io/requester: "kube:admin"
~~~

2. As "kubeadmin", create a new project and deploy one new ServiceMonitor, for example: 

~~~
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: simon-servicemon-2
  namespace: simon-1
spec:
  endpoints:
    - path: /metrics
      port: http
      scheme: http
  jobLabel: component
  selector:
    matchLabels:
      deployment: echoenv-1
~~~

Actual results:

The "used" field for ServiceMonitors is not populated in the ClusterResourceQuota for certain objects. It is unclear if these quotas are enforced or not

Expected results:

ClusterResourceQuota for ServiceMonitors is updated and enforced

Additional info:

* Must-gather for a cluster showing this behaviour (added debug for kube-controller-manager) is available here: https://drive.google.com/file/d/1ioEEHZQVHG46vIzDdNm6pwiTjkL9QQRE/view?usp=share_link
* Slack discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1683876047243989

Bug OCPBUGS-14762: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-16735: oc adm inspect does not truncate files when overwriting

View the Description View the linked PRs

Description of problem:

oc adm inspect generated files sometime have the leading "---" and some time do not. This depends on the order of objects collected. This by itself is not an issue.

However this becomes an issue when combined with multiple invocations of oc adm inspect and collecting data to the same directory like must-gather does.

If an object is collected multiple times then the second time oc might overwrite the original file improperly and leave 4 bytes of the original content behind.

This is happening when not writing the "---\n" in the second invocation as this makes the content 4B shorter and the original tailing 4B are left in the file intact.

This garbage confuses YAML parsers.

Version-Release number of selected component (if applicable):

4.14 nighly as of Jul 25 and before

How reproducible:

Always

Steps to Reproduce:

Run oc adm inspect twice with different order of objects:

[msivak@x openshift-must-gather]$ oc adm inspect performanceprofile,machineconfigs,nodes --dest-dir=inspect.dual --all-namespaces
[msivak@x openshift-must-gather]$ oc adm inspect nodes --dest-dir=inspect.dual --all-namespaces


And then check the alphabetically first node yaml file - it will have garbage at the end of the file.

Actual results:

Garbage at the end of the file.

Expected results:

No garbage.

Additional info:

I believe this is caused by the lack of Truncate mode here https://github.com/openshift/oc/blob/master/pkg/cli/admin/inspect/writer.go#L54


Collecting data multiple times cannot be easily avoided when multiple collect scripts are combined with relatedObjects requested by operators.

https://github.com/openshift/oc/pull/1520

Bug OCPBUGS-17418: CVO should not panic when openshift-config* ConfigMaps are deleted while a watcher is down

View the Description View the linked PRs

Description of problem:

CVO is observing panic and throwing following error

Interface conversion: cache.DeletedFinalStateUnknown is not v1.Object: missing method GetAnnotations

Linking the job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960 

Observed on other jobs https://search.ci.openshift.org/?search=cache.DeletedFinalStateUnknown+is+not+v1.Object&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug HOSTEDCP-1009: external-dns image should be overridable on hypershift install

View the Description View the linked PRs

Currently the external-dns image is hardcoded
https://github.com/openshift/hypershift/blob/3b73a1a243122b9cb78ebc9848b7af158142d2d2/cmd/install/install.go#L513

hypershift install should have some method of overriding this

https://github.com/openshift/hypershift/pull/2623

Bug OCPBUGS-12813: Update 4.14 ose-cluster-openshift-apiserver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/531

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-13717: The ovnver and ovsver args should be used even to infer to short versions of the RPMs to install in the sdn container images

View the Description View the linked PRs

The ovnver and ovsver args should be used even to infer to short versions of the RPMs to install in the sdn container images

https://github.com/openshift/sdn/pull/534

Bug OCPBUGS-14815: Update OWNERS and OWNERS_ALIASES in external-attacher repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-attacher/pull/55

Bug MGMT-14015: Custom manifest feature usage is never turning off

View the Description View the linked PRs

Description of the problem:

We are turning on the feature-usage flag for custom manifests whenever we are crating a new custom cluster manifest. When we delete that manifest the flag is stays on.

Expected results:

Need to turn off the flag when deleting the custom manifest

https://github.com/openshift/assisted-service/pull/5363

Bug OCPBUGS-19558: [release-4.14] pod latency metric is not working

View the Description View the linked PRs

Description of problem:

The current openshift_sdn_pod_operations_latency metrics is broken which is not calculating actual duration of setup/teardown for the latency metric.
We also need additional metrics to measure the pod latency from end to end so that it gives overall summary for total processing time spent by cni server.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/sdn/pull/577

Bug OCPBUGS-19790: OCP-57089 and OCP-24504 failed in 4.14 azure platform for the load-balancer service couldn't get an external-IP address

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18841~~. The following is the description of the original issue:
—
Description of problem:

Failed to run auto OCP-57089 on a 4.14 azure platform, manually checked it, the created load-balancer service couldn't get an external-IP address

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-09-164123

How reproducible:

100% on the cluster

Steps to Reproduce:

1. Add a wait in the auto script, then run the case
      g.By("check if the lb services have obtained the EXTERNAL-IPs")
      regExp := "([0-9]+.[0-9]+.[0-9]+.[0-9]+)"
      time.Sleep(3600 * time.Second) 
% ./bin/extended-platform-tests run all --dry-run | grep 57089 | ./bin/extended-platform-tests run -f -

2.
% oc get ns | grep e2e-test-router
e2e-test-router-ingressclass-n2z2c                 Active   2m51s 

3. It was pending in EXTERNAL-IP column for internal-lb-57089 service
% oc -n e2e-test-router-ingressclass-n2z2c get svc
NAME                TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
external-lb-57089   LoadBalancer   172.30.198.7    20.42.34.61   28443:30193/TCP   3m6s
internal-lb-57089   LoadBalancer   172.30.214.30   <pending>     29443:31507/TCP   3m6s
service-secure      ClusterIP      172.30.47.70    <none>        27443/TCP         3m13s
service-unsecure    ClusterIP      172.30.175.59   <none>        27017/TCP         3m13s
% 

4.
% oc -n e2e-test-router-ingressclass-n2z2c get svc internal-lb-57089 -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  creationTimestamp: "2023-09-12T07:56:42Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  name: internal-lb-57089
  namespace: e2e-test-router-ingressclass-n2z2c
  resourceVersion: "209376"
  uid: b163bc03-b1c6-4e7b-b4e1-c996e9d135f4
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.30.214.30
  clusterIPs:
  - 172.30.214.30
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: https
    nodePort: 31507
    port: 29443
    protocol: TCP
    targetPort: 8443
  selector:
    name: web-server-rc
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}
%

Actual results:

internal-lb-57089 service couldn't get an external-IP address

Expected results:

internal-lb-57089 service can get an external-IP address

Additional info:

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/282

Bug OCPBUGS-10170: Update 4.14 ose-haproxy-router-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/453

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/453

Bug OCPBUGS-19955: when disabling ipsec, ds pods are deleted

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19918~~. The following is the description of the original issue:
—
Description of problem:

Issue was found when analyzing  bug https://issues.redhat.com/browse/OCPBUGS-19817

Version-Release number of selected component (if applicable):

4.15.0-0.ci-2023-09-25-165744

How reproducible:

everytime

Steps to Reproduce:

The cluster is ipsec cluster and enabled NS extension and ipsec service.
1.  enable e-w ipsec & wait for cluster to settle
2.  disable ipsec & wait for cluster to settle

you'll observer ipsec pods are deleted

Actual results:

no pods

Expected results:

pods should stay
see https://github.com/openshift/cluster-network-operator/blob/master/pkg/network/ovn_kubernetes.go#L314
	// If IPsec is enabled for the first time, we start the daemonset. If it is
	// disabled after that, we do not stop the daemonset but only stop IPsec.
	//
	// TODO: We need to do this as, by default, we maintain IPsec state on the
	// node in order to maintain encrypted connectivity in the case of upgrades.
	// If we only unrender the IPsec daemonset, we will be unable to cleanup
	// the IPsec state on the node and the traffic will continue to be
	// encrypted.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2045

Bug OCPBUGS-15825: agent-gather does not collect agent-tui logs

View the Description View the linked PRs

Description of problem:

agent-gather script does not collect agent-tui logs

Version-Release number of selected component (if applicable):

How reproducible:

Login into a node (before bootstrap is completed), and run agent-gather script

Steps to Reproduce:

1. ssh into one of the node
2. run agent-gather
3. Check the content of the produced tar artifacts

Actual results:

The agent-gather-*.tar.xz does not contain agent-tui logs

Expected results:

The agent-gather-*.tar.xz must contain /var/log/agent/agent-tui.log

Additional info:

agent-tui logs are fundamental to troubleshoot any eventual issue that could happen during the bootstrap, affecting the agent-tui console.

https://github.com/openshift/installer/pull/7293

Bug OCPBUGS-17372: Unable to deploy 4.12 spoke clusters(using 4.12 live iso) from 4.14.0-ec.4 hub, bmh stuck in provisioning state due to Failed to update hostname: Command '['chroot', '/mnt/coreos', 'hostnamectl', 'hostname']' returned non-zero exit status 1

View the Description View the linked PRs

Description of problem:

When deploy 4.12 spoke clusters(using rhcos-412.86.202306132230-0-live.x86_64.iso) or 4.10 spoke clusters from a 4.14.0-ec.4 hub, bmh gets stuck in provisioning state due to Failed to update hostname: Command '['chroot', '/mnt/coreos', 'hostnamectl', 'hostname']' returned non-zero exit status 1. Running `hostnamectl hostname` returns `Unknown operation hostname`. It looks like older versions of hostnamectl do not support the hostname option.

Version-Release number of selected component (if applicable):

4.14.0-ec.4

How reproducible:

100%

Steps to Reproduce:

1. From a 4.14.0-ec.4 hub cluster deploy a 4.12 spoke cluster using rhcos-412.86.202306132230-0-live.x86_64.iso via ZTP procedure

Actual results:

BMH stuck in provisioning state

Expected results:

BMH gets provisioned

Additional info:

I also tried using a 4.14 iso image to deploy the 4.12 payload but then kubelet would fail with err="failed to parse kubelet flag: unknown flag: --container-runtime"

https://github.com/openshift/ironic-agent-image/pull/86

Bug MGMT-15150: Unnecessarily using different installer binaries

View the Description View the linked PRs

~~MGMT-7549~~ added a change to use openshift-install instead of openshift-baremetal-install for platform:none clusters. This was to work around a problem where the baremetal binary was not available for an ARM target cluster, and at the time only none platform was supported on ARM. This problem was resolved by ~~MGMT-9206~~, so we no longer need the workaround.

https://github.com/openshift/assisted-service/pull/5334

Bug OCPBUGS-12324: Update 4.14 prometheus-config-reloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/230

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/230

Bug OCPBUGS-13168: Invalid CA certificate bundle provided by service account token

View the Description View the linked PRs

Description of problem:

oc login --token=$token
--server=https://api.dalh-dev-hs-2.05zb.p3.openshiftapps.com:443 --certificate-authority=ca.crt
The server uses a certificate signed by an unknown authority.
You can bypass the certificate check, but any data you send to the server could be intercepted by others.

The referenced "ca.crt" comes from the Secret created when a Service Account is created.

Version-Release number of selected component (if applicable): 4.12.12

How reproducible: Always

https://github.com/openshift/hypershift/pull/2584

Bug OCPBUGS-17812: HyperShift etcd liveness probe should mirror the standalone openshift etcd probe

View the Description View the linked PRs

Description of problem:

etcd pods running in a hypershift control plane use an exec probe to check cluster health and have a very small timeout (1s). We should be using the same as standalone etcd with a 30s timeout

Version-Release number of selected component (if applicable):

All

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift hosted cluster
2. Examine etcd pod(s) yaml

Actual results:

Probe is of type exec and has a timeout of 1s

Expected results:

Probe is of type http and has a timeout of 30s

Additional info:

https://github.com/openshift/hypershift/pull/2918

Bug OCPBUGS-13084: Cannot use EgressIP for the vsphere csi driver to access the vcenter api

View the Description View the linked PRs

Description of problem:
CU wanted to restrict access to vcenter API and originating traffic needs to use a configured EgressIP. This is working fine for the machine API but the vsphere CSI driver controller uses the host network and hence the configured EgressIP isn't used.

Is it possible to disable this( use of host-network) for CSI controller?

slack thread: https://redhat-internal.slack.com/archives/CBQHQFU0N/p1683135077822559

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/162

Bug OCPBUGS-8628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/173

Bug OCPBUGS-13021: kube-apiserver isn't healthy after a cluster comes up

View the Description View the linked PRs

Description of problem:

APIServer endpoint isn't healthy after a PublicAndPrivate cluster is created. PROGRESS  of the cluster is Completed and PROCESS is false, Nodes are ready, cluster operators on the guest cluster are Available, only issue is condition Type Available is False due to APIServer endpoint is not healthy.

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters
NAME   VERSION               KUBECONFIG         PROGRESS  AVAILABLE  PROGRESSING  MESSAGE
jz-test  4.14.0-0.nightly-2023-04-30-235516  jz-test-admin-kubeconfig  Completed  False    False     APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com is not healthy

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jz-test -n clusters -ojsonpath='{.spec.platform.aws.endpointAccess}{"\n"}'
PublicAndPrivate

jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jz-test
NAME                                                  READY   STATUS    RESTARTS   AGE
aws-cloud-controller-manager-666559d4f-rdsw4          2/2     Running   0          149m
aws-ebs-csi-driver-controller-79fdfb6c76-vb7wr        7/7     Running   0          148m
aws-ebs-csi-driver-operator-7dbd789984-mb9rp          1/1     Running   0          148m
capi-provider-5b7847db9-nlrvz                         2/2     Running   0          151m
catalog-operator-7ccb468d86-7c5j6                     2/2     Running   0          149m
certified-operators-catalog-895787778-5rjb6           1/1     Running   0          149m
cloud-network-config-controller-86698fd7dd-kgzhv      3/3     Running   0          148m
cluster-api-6fd4f86878-hjw59                          1/1     Running   0          151m
cluster-autoscaler-bdd688949-f9xmk                    1/1     Running   0          150m
cluster-image-registry-operator-6f5cb67d88-8svd6      3/3     Running   0          149m
cluster-network-operator-7bc69f75f4-npjfs             1/1     Running   0          149m
cluster-node-tuning-operator-5855b6576b-rckhh         1/1     Running   0          149m
cluster-policy-controller-56d4d6b57c-glx4w            1/1     Running   0          149m
cluster-storage-operator-7cc56c68bb-jd4d2             1/1     Running   0          149m
cluster-version-operator-bd969b677-bh4w4              1/1     Running   0          149m
community-operators-catalog-5c545484d7-hbzb4          1/1     Running   0          149m
control-plane-operator-fc49dcbb4-5ncvf                2/2     Running   0          151m
csi-snapshot-controller-85f7cc9945-n5vgq              1/1     Running   0          149m
csi-snapshot-controller-operator-6597b45897-hqf5p     1/1     Running   0          149m
csi-snapshot-webhook-644d765546-lk9hj                 1/1     Running   0          149m
dns-operator-5b5577d6c7-8dh8d                         1/1     Running   0          149m
etcd-0                                                2/2     Running   0          150m
hosted-cluster-config-operator-5b75ccf55d-6rzch       1/1     Running   0          149m
ignition-server-596fc9d9fb-sb94h                      1/1     Running   0          150m
ingress-operator-6497d476bc-whssz                     3/3     Running   0          149m
konnectivity-agent-6656d8dfd6-h5tcs                   1/1     Running   0          150m
konnectivity-server-5ff9d4b47-stb2m                   1/1     Running   0          150m
kube-apiserver-596fc4bb8b-7kfd8                       3/3     Running   0          150m
kube-controller-manager-6f86bb7fbd-4wtxk              1/1     Running   0          138m
kube-scheduler-bf5876b4b-flk96                        1/1     Running   0          149m
machine-approver-574585d8dd-h5ffh                     1/1     Running   0          150m
multus-admission-controller-67b6f85fbf-bfg4x          2/2     Running   0          148m
oauth-openshift-6b6bfd55fb-8sdq7                      2/2     Running   0          148m
olm-operator-5d97fb977c-sbf6w                         2/2     Running   0          149m
openshift-apiserver-5bb9f99974-2lfp4                  3/3     Running   0          138m
openshift-controller-manager-65666bdf79-g8cf5         1/1     Running   0          149m
openshift-oauth-apiserver-56c8565bb6-6b5cv            2/2     Running   0          149m
openshift-route-controller-manager-775f844dfc-jj2ft   1/1     Running   0          149m
ovnkube-master-0                                      7/7     Running   0          148m
packageserver-6587d9674b-6jwpv                        2/2     Running   0          149m
redhat-marketplace-catalog-5f6d45b457-hdn77           1/1     Running   0          149m
redhat-operators-catalog-7958c4449b-l4hbx             1/1     Running   0          12m
router-5b7899cc97-chs6t                               1/1     Running   0          150m

jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig 
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-0-137-99.us-east-2.compute.internal   Ready    worker   131m   v1.26.2+d2e245f
ip-10-0-140-85.us-east-2.compute.internal   Ready    worker   132m   v1.26.2+d2e245f
ip-10-0-141-46.us-east-2.compute.internal   Ready    worker   131m   v1.26.2+d2e245f
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get co --kubeconfig=hostedcluster.kubeconfig 
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      126m    
csi-snapshot-controller                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
dns                                        4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
image-registry                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      128m    
ingress                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
insights                                   4.14.0-0.nightly-2023-04-30-235516   True        False         False      130m    
kube-apiserver                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
kube-controller-manager                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
kube-scheduler                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
kube-storage-version-migrator              4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
monitoring                                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
network                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
node-tuning                                4.14.0-0.nightly-2023-04-30-235516   True        False         False      131m    
openshift-apiserver                        4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
openshift-controller-manager               4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
openshift-samples                          4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
operator-lifecycle-manager                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
service-ca                                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      130m    
storage                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      131m    
jiezhao-mac:hypershift jiezhao$ 

HC conditions:
==============
  status:
    conditions:
    - lastTransitionTime: "2023-05-01T19:45:49Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidAWSIdentityProvider
    - lastTransitionTime: "2023-05-01T20:00:18Z"
      message: Cluster version is 4.14.0-0.nightly-2023-04-30-235516
      observedGeneration: 3
      reason: FromClusterVersion
      status: "False"
      type: ClusterVersionProgressing
    - lastTransitionTime: "2023-05-01T19:46:22Z"
      message: Payload loaded version="4.14.0-0.nightly-2023-04-30-235516" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-04-30-235516"
        architecture="amd64"
      observedGeneration: 3
      reason: PayloadLoaded
      status: "True"
      type: ClusterVersionReleaseAccepted
    - lastTransitionTime: "2023-05-01T20:03:14Z"
      message: Condition not found in the CVO.
      observedGeneration: 3
      reason: StatusUnknown
      status: Unknown
      type: ClusterVersionUpgradeable
    - lastTransitionTime: "2023-05-01T20:00:18Z"
      message: Done applying 4.14.0-0.nightly-2023-04-30-235516
      observedGeneration: 3
      reason: FromClusterVersion
      status: "True"
      type: ClusterVersionAvailable
    - lastTransitionTime: "2023-05-01T20:00:18Z"
      message: ""
      observedGeneration: 3
      reason: FromClusterVersion
      status: "True"
      type: ClusterVersionSucceeding
    - lastTransitionTime: "2023-05-01T19:47:51Z"
      message: The hosted cluster is not degraded
      observedGeneration: 3
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2023-05-01T19:45:01Z"
      message: ""
      observedGeneration: 3
      reason: QuorumAvailable
      status: "True"
      type: EtcdAvailable
    - lastTransitionTime: "2023-05-01T19:45:38Z"
      message: Kube APIServer deployment is available
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: KubeAPIServerAvailable
    - lastTransitionTime: "2023-05-01T19:44:27Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: InfrastructureReady
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: External DNS is not configured
      observedGeneration: 3
      reason: StatusUnknown
      status: Unknown
      type: ExternalDNSReachable
    - lastTransitionTime: "2023-05-01T19:44:19Z"
      message: Configuration passes validation
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidHostedControlPlaneConfiguration
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: AWS KMS is not configured
      observedGeneration: 3
      reason: StatusUnknown
      status: Unknown
      type: ValidAWSKMSConfig
    - lastTransitionTime: "2023-05-01T19:44:37Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidReleaseInfo
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com
        is not healthy
      observedGeneration: 3
      reason: waitingForAvailable
      status: "False"
      type: Available
    - lastTransitionTime: "2023-05-01T19:47:18Z"
      message: All is well
      reason: AWSSuccess
      status: "True"
      type: AWSEndpointAvailable
    - lastTransitionTime: "2023-05-01T19:47:18Z"
      message: All is well
      reason: AWSSuccess
      status: "True"
      type: AWSEndpointServiceAvailable
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: Configuration passes validation
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidConfiguration
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: HostedCluster is supported by operator configuration
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: SupportedHostedCluster
    - lastTransitionTime: "2023-05-01T19:45:39Z"
      message: Ignition server deployment is available
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: IgnitionEndpointAvailable
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: Reconciliation active on resource
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ReconciliationActive
    - lastTransitionTime: "2023-05-01T19:44:12Z"
      message: Release image is valid
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidReleaseImage
    - lastTransitionTime: "2023-05-01T19:44:12Z"
      message: HostedCluster is at expected version
      observedGeneration: 3
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2023-05-01T19:44:13Z"
      message: OIDC configuration is valid
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidOIDCConfiguration
    - lastTransitionTime: "2023-05-01T19:44:13Z"
      message: Reconciliation completed succesfully
      observedGeneration: 3
      reason: ReconciliatonSucceeded
      status: "True"
      type: ReconciliationSucceeded
    - lastTransitionTime: "2023-05-01T19:45:52Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: AWSDefaultSecurityGroupCreated

kube-apiserver log:
==================
E0501 19:45:07.024278       7 memcache.go:238] couldn't get current server API group list: Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_config-operator_01_proxy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_quota-openshift_01_clusterresourcequota.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_security-openshift_01_scc.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_securityinternal-openshift_02_rangeallocation.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_apiserver-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_authentication.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_build.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_console.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_dns.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_featuregate.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_image.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagecontentpolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagecontentsourcepolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagedigestmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagetagmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_infrastructure-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_ingress.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_network.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_node.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_oauth.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_project.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_scheduler.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a PublicAndPrivate cluster

Actual results:

APIServer endpoint is not healthy, and HC condition Type 'Available' is False

Expected results:

APIServer endpoint should be healthy, and Type 'Available' should be True

Additional info:

Bug OCPBUGS-5059: duplicate entry in spec.plugins will cause console panic

View the Description View the linked PRs

Description of problem:

console will have panic error when duplicate entry is set in spec.plugins

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2022-12-19-122634

How reproducible:

Always

Steps to Reproduce:

1. Create console-demo-plugin manifests
$ oc apply -f dynamic-demo-plugin/oc-manifest.yaml 
namespace/console-demo-plugin created
deployment.apps/console-demo-plugin created
service/console-demo-plugin created
consoleplugin.console.openshift.io/console-demo-plugin created 
2.Enable console-demo-plugin
$ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-demo-plugin"] } }' --type=merge 
console.operator.openshift.io/cluster patched
3. Add a duplicate entry in spec.plugins in consoles.operator/cluster 
$ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-demo-plugin", "console-demo-plugin"] } }' --type=merge  console.operator.openshift.io/cluster patched
$ oc get consoles.operator cluster -o json | jq .spec.plugins
[
  "console-demo-plugin",
  "console-demo-plugin"
]
4. check console pods status
$ oc get pods -n openshift-console                        
NAME                         READY   STATUS             RESTARTS      AGE
console-6bcc87c7b4-6g2cf     0/1     CrashLoopBackOff   1 (21s ago)   50s
console-6bcc87c7b4-9g6kk     0/1     CrashLoopBackOff   3 (3s ago)    50s
console-7dc78ffd78-sxvcv     1/1     Running            0             2m58s
downloads-758fc74758-9k426   1/1     Running            0             3h18m
downloads-758fc74758-k4q72   1/1     Running            0             3h21m

Actual results:

3. console pods will be in CrashLoopBackOff status
$ oc logs console-6bcc87c7b4-9g6kk -n openshift-console
W1220 06:48:37.279871       1 main.go:228] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
I1220 06:48:37.279889       1 main.go:238] The following console plugins are enabled:
I1220 06:48:37.279898       1 main.go:240]  - console-demo-plugin
I1220 06:48:37.279911       1 main.go:354] cookies are secure!
I1220 06:48:37.331802       1 server.go:607] The following console endpoints are now proxied to these services:
I1220 06:48:37.331843       1 server.go:610]  - /api/proxy/plugin/console-demo-plugin/thanos-querier/ -> https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
I1220 06:48:37.331884       1 server.go:610]  - /api/proxy/plugin/console-demo-plugin/thanos-querier/ -> https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
panic: http: multiple registrations for /api/proxy/plugin/console-demo-plugin/thanos-querier/goroutine 1 [running]:
net/http.(*ServeMux).Handle(0xc0005b6600, {0xc0005d9a40, 0x35}, {0x35aaf60?, 0xc000735260})
    /usr/lib/golang/src/net/http/server.go:2503 +0x239
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func1({0xc0005d9940?, 0x35?}, {0x35aaf60, 0xc000735260})
    /go/src/github.com/openshift/console/pkg/server/server.go:245 +0x149
github.com/openshift/console/pkg/server.(*Server).HTTPHandler(0xc000056c00)
    /go/src/github.com/openshift/console/pkg/server/server.go:621 +0x330b
main.main()
    /go/src/github.com/openshift/console/cmd/bridge/main.go:785 +0x5ff5

Expected results:

3. console pods should be running well

Additional info:

https://github.com/openshift/console-operator/pull/710

Bug OCPBUGS-9931: Enable node healthz server for ovnk in CNO

View the Description View the linked PRs

Node healthz server was added in 4.13 with https://github.com/openshift/ovn-kubernetes/commit/c8489e3ff9c321e77f265dc9d484ed2549df4a6b and https://github.com/openshift/ovn-kubernetes/commit/9a836e3a547f3464d433ce8b9eef336624d51858. We need to configure it by default on 0.0.0.0:10256 on CNO for ovnk, just like we do for sdn.

https://github.com/openshift/cluster-network-operator/pull/1715

Bug OCPBUGS-10109: Update 4.14 prometheus-config-reloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/221

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/221

Bug OCPBUGS-13782: Uninstall instructions supplied by the CSV creator should be visible no matter the optOut state

View the Description View the linked PRs

Description of problem:

If both the below mentioned annotations are used on an operator CSV, the uninstall instructions don't show up on the UI.
- console.openshift.io/disable-operand delete: "true"
- operator.openshift.io/uninstall-message: "some message"

Version-Release number of selected component (if applicable):

➜  $> oc version
Client Version: 4.12.0
Kustomize Version: v4.5.7
Server Version: 4.13.0-rc.5
Kubernetes Version: v1.26.3+379cd9f

➜  $> oc get co | grep console
console                                    4.13.0-rc.5   True        False         False      4h49m

How reproducible:

Always

Steps to Reproduce:

1.Add both the mentioned annotations on an operator CSV. 
2.Make sure "console.openshift.io/disable-operand delete" is set to "true".
3.Upon clicking "Uninstall operator", the result can be observed on the pop-up.

Actual results:

The uninstall pop-up doesn't have the "Message from Operator developer" section.

Expected results:

The uninstall instructions should show up under "Message from Operator developer".

Additional info:

The two annotations seemed to be linked here, https://github.com/openshift/console/blob/3e0bb0928ce09030bc3340c9639b2a1df9e0a007/frontend/packages/operator-lifecycle-manager/src/components/modals/uninstall-operator-modal.tsx#LL395C10-L395C26

https://github.com/openshift/console/pull/12840

Bug OCPBUGS-14887: CMO version tags outdated

View the Description View the linked PRs

The version tracker needs an update.

https://github.com/openshift/cluster-monitoring-operator/pull/1995

Bug OCPBUGS-14995: Ingress operator performs spurious updates in response to API's defaulting of router deployment's router container's ports' hostPort field when using HostNetwork

View the Description View the linked PRs

Description of problem

When the ingress operator creates or updates a router deployment that specifies spec.template.spec.hostNetwork: true, the operator does not set spec.template.spec.containers[*].ports[*].hostPort. As a result, the API sets each port's hostPort field to the port's containerPort field value. The operator detects this as an external update and attempts to revert it. The operator should not update the deployment in response to API defaulting.

Version-Release number of selected component (if applicable)

I observed this in CI for OCP 4.14 and was able to reproduce the issue on OCP 4.11.37. The problematic code was added in https://github.com/openshift/cluster-ingress-operator/pull/694/commits/af653f9fa7368cf124e11b7ea4666bc40e601165 in OCP 4.11 to implement ~~NE-674~~.

How reproducible

Easily.

Steps to Reproduce

1. Create an IngressController that specifies the "HostNetwork" endpoint publishing strategy type:

oc create -f - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: example-hostnetwork
  namespace: openshift-ingress-operator
spec:
  domain: example.xyz
  endpointPublishingStrategy:
    type: HostNetwork
EOF

2. Check the ingress operator's logs:

oc -n openshift-ingress-operator logs -c ingress-operator deployments/ingress-operator

Actual results

The ingress operator logs "updated router deployment" multiple times for the "example-hostnetwork" IngressController, such as the following:

2023-06-15T02:11:47.229Z        INFO    operator.ingress_controller     ingress/deployment.go:131       updated router deployment       {"namespace": "openshift-ingress", "name": "router-example-hostnetwork", "diff": "  &v1.Deployment{\n  \tTypeMeta:   {},\n  \tObjectMeta: {Name: \"router-example-hostnetwork\", Namespace: \"openshift-ingress\", UID: \"d7c51022-460e-4962-8521-e00255f649c3\", ResourceVersion: \"3356177\", ...},\n  \tSpec: v1.DeploymentSpec{\n  \t\tReplicas: &2,\n  \t\tSelector: &{MatchLabels: {\"ingresscontroller.operator.openshift.io/deployment-ingresscontroller\": \"example-hostnetwork\"}},\n  \t\tTemplate: v1.PodTemplateSpec{\n  \t\t\tObjectMeta: {Labels: {\"ingresscontroller.operator.openshift.io/deployment-ingresscontroller\": \"example-hostnetwork\", \"ingresscontroller.operator.openshift.io/hash\": \"b7c697fd\"}, Annotations: {\"target.workload.openshift.io/management\": `{\"effect\": \"PreferredDuringScheduling\"}`, \"unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds\": \"10\"}},\n  \t\t\tSpec: v1.PodSpec{\n  \t\t\t\tVolumes: []v1.Volume{\n  \t\t\t\t\t{Name: \"default-certificate\", VolumeSource: {Secret: &{SecretName: \"router-certs-example-hostnetwork\", DefaultMode: &420}}},\n  \t\t\t\t\t{\n  \t\t\t\t\t\tName: \"service-ca-bundle\",\n  \t\t\t\t\t\tVolumeSource: v1.VolumeSource{\n  \t\t\t\t\t\t\t... // 16 identical fields\n  \t\t\t\t\t\t\tFC:        nil,\n  \t\t\t\t\t\t\tAzureFile: nil,\n  \t\t\t\t\t\t\tConfigMap: &v1.ConfigMapVolumeSource{\n  \t\t\t\t\t\t\t\tLocalObjectReference: {Name: \"service-ca-bundle\"},\n  \t\t\t\t\t\t\t\tItems:                {{Key: \"service-ca.crt\", Path: \"service-ca.crt\"}},\n- \t\t\t\t\t\t\t\tDefaultMode:          &420,\n+ \t\t\t\t\t\t\t\tDefaultMode:          nil,\n  \t\t\t\t\t\t\t\tOptional:             &false,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tVsphereVolume: nil,\n  \t\t\t\t\t\t\tQuobyte:       nil,\n  \t\t\t\t\t\t\t... // 8 identical fields\n  \t\t\t\t\t\t},\n  \t\t\t\t\t},\n  \t\t\t\t\t{\n  \t\t\t\t\t\tName: \"stats-auth\",\n  \t\t\t\t\t\tVolumeSource: v1.VolumeSource{\n  \t\t\t\t\t\t\t... // 3 identical fields\n  \t\t\t\t\t\t\tAWSElasticBlockStore: nil,\n  \t\t\t\t\t\t\tGitRepo:              nil,\n  \t\t\t\t\t\t\tSecret: &v1.SecretVolumeSource{\n  \t\t\t\t\t\t\t\tSecretName:  \"router-stats-example-hostnetwork\",\n  \t\t\t\t\t\t\t\tItems:       nil,\n- \t\t\t\t\t\t\t\tDefaultMode: &420,\n+ \t\t\t\t\t\t\t\tDefaultMode: nil,\n  \t\t\t\t\t\t\t\tOptional:    nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tNFS:   nil,\n  \t\t\t\t\t\t\tISCSI: nil,\n  \t\t\t\t\t\t\t... // 21 identical fields\n  \t\t\t\t\t\t},\n  \t\t\t\t\t},\n  \t\t\t\t\t{\n  \t\t\t\t\t\tName: \"metrics-certs\",\n  \t\t\t\t\t\tVolumeSource: v1.VolumeSource{\n  \t\t\t\t\t\t\t... // 3 identical fields\n  \t\t\t\t\t\t\tAWSElasticBlockStore: nil,\n  \t\t\t\t\t\t\tGitRepo:              nil,\n  \t\t\t\t\t\t\tSecret: &v1.SecretVolumeSource{\n  \t\t\t\t\t\t\t\tSecretName:  \"router-metrics-certs-example-hostnetwork\",\n  \t\t\t\t\t\t\t\tItems:       nil,\n- \t\t\t\t\t\t\t\tDefaultMode: &420,\n+ \t\t\t\t\t\t\t\tDefaultMode: nil,\n  \t\t\t\t\t\t\t\tOptional:    nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tNFS:   nil,\n  \t\t\t\t\t\t\tISCSI: nil,\n  \t\t\t\t\t\t\t... // 21 identical fields\n  \t\t\t\t\t\t},\n  \t\t\t\t\t},\n  \t\t\t\t},\n  \t\t\t\tInitContainers: nil,\n  \t\t\t\tContainers: []v1.Container{\n  \t\t\t\t\t{\n  \t\t\t\t\t\t... // 3 identical fields\n  \t\t\t\t\t\tArgs:       nil,\n  \t\t\t\t\t\tWorkingDir: \"\",\n  \t\t\t\t\t\tPorts: []v1.ContainerPort{\n  \t\t\t\t\t\t\t{\n  \t\t\t\t\t\t\t\tName:          \"http\",\n- \t\t\t\t\t\t\t\tHostPort:      80,\n+ \t\t\t\t\t\t\t\tHostPort:      0,\n  \t\t\t\t\t\t\t\tContainerPort: 80,\n  \t\t\t\t\t\t\t\tProtocol:      \"TCP\",\n  \t\t\t\t\t\t\t\tHostIP:        \"\",\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t{\n  \t\t\t\t\t\t\t\tName:          \"https\",\n- \t\t\t\t\t\t\t\tHostPort:      443,\n+ \t\t\t\t\t\t\t\tHostPort:      0,\n  \t\t\t\t\t\t\t\tContainerPort: 443,\n  \t\t\t\t\t\t\t\tProtocol:      \"TCP\",\n  \t\t\t\t\t\t\t\tHostIP:        \"\",\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t{\n  \t\t\t\t\t\t\t\tName:          \"metrics\",\n- \t\t\t\t\t\t\t\tHostPort:      1936,\n+ \t\t\t\t\t\t\t\tHostPort:      0,\n  \t\t\t\t\t\t\t\tContainerPort: 1936,\n  \t\t\t\t\t\t\t\tProtocol:      \"TCP\",\n  \t\t\t\t\t\t\t\tHostIP:        \"\",\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tEnvFrom:       nil,\n  \t\t\t\t\t\tEnv:           {{Name: \"DEFAULT_CERTIFICATE_DIR\", Value: \"/etc/pki/tls/private\"}, {Name: \"DEFAULT_DESTINATION_CA_PATH\", Value: \"/var/run/configmaps/service-ca/service-ca.crt\"}, {Name: \"RELOAD_INTERVAL\", Value: \"5s\"}, {Name: \"ROUTER_ALLOW_WILDCARD_ROUTES\", Value: \"false\"}, ...},\n  \t\t\t\t\t\tResources:     {Requests: {s\"cpu\": {i: {...}, s: \"100m\", Format: \"DecimalSI\"}, s\"memory\": {i: {...}, Format: \"BinarySI\"}}},\n  \t\t\t\t\t\tVolumeMounts:  {{Name: \"default-certificate\", ReadOnly: true, MountPath: \"/etc/pki/tls/private\"}, {Name: \"service-ca-bundle\", ReadOnly: true, MountPath: \"/var/run/configmaps/service-ca\"}, {Name: \"stats-auth\", ReadOnly: true, MountPath: \"/var/lib/haproxy/conf/metrics-auth\"}, {Name: \"metrics-certs\", ReadOnly: true, MountPath: \"/etc/pki/tls/metrics-certs\"}},\n  \t\t\t\t\t\tVolumeDevices: nil,\n  \t\t\t\t\t\tLivenessProbe: &v1.Probe{\n  \t\t\t\t\t\t\tProbeHandler: v1.ProbeHandler{\n  \t\t\t\t\t\t\t\tExec: nil,\n  \t\t\t\t\t\t\t\tHTTPGet: &v1.HTTPGetAction{\n  \t\t\t\t\t\t\t\t\tPath:        \"/healthz\",\n  \t\t\t\t\t\t\t\t\tPort:        {IntVal: 1936},\n  \t\t\t\t\t\t\t\t\tHost:        \"localhost\",\n- \t\t\t\t\t\t\t\t\tScheme:      \"HTTP\",\n+ \t\t\t\t\t\t\t\t\tScheme:      \"\",\n  \t\t\t\t\t\t\t\t\tHTTPHeaders: nil,\n  \t\t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t\tTCPSocket: nil,\n  \t\t\t\t\t\t\t\tGRPC:      nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tInitialDelaySeconds:           0,\n  \t\t\t\t\t\t\tTimeoutSeconds:                1,\n- \t\t\t\t\t\t\tPeriodSeconds:                 10,\n+ \t\t\t\t\t\t\tPeriodSeconds:                 0,\n- \t\t\t\t\t\t\tSuccessThreshold:              1,\n+ \t\t\t\t\t\t\tSuccessThreshold:              0,\n- \t\t\t\t\t\t\tFailureThreshold:              3,\n+ \t\t\t\t\t\t\tFailureThreshold:              0,\n  \t\t\t\t\t\t\tTerminationGracePeriodSeconds: nil,\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tReadinessProbe: &v1.Probe{\n  \t\t\t\t\t\t\tProbeHandler: v1.ProbeHandler{\n  \t\t\t\t\t\t\t\tExec: nil,\n  \t\t\t\t\t\t\t\tHTTPGet: &v1.HTTPGetAction{\n  \t\t\t\t\t\t\t\t\tPath:        \"/healthz/ready\",\n  \t\t\t\t\t\t\t\t\tPort:        {IntVal: 1936},\n  \t\t\t\t\t\t\t\t\tHost:        \"localhost\",\n- \t\t\t\t\t\t\t\t\tScheme:      \"HTTP\",\n+ \t\t\t\t\t\t\t\t\tScheme:      \"\",\n  \t\t\t\t\t\t\t\t\tHTTPHeaders: nil,\n  \t\t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t\tTCPSocket: nil,\n  \t\t\t\t\t\t\t\tGRPC:      nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tInitialDelaySeconds:           0,\n  \t\t\t\t\t\t\tTimeoutSeconds:                1,\n- \t\t\t\t\t\t\tPeriodSeconds:                 10,\n+ \t\t\t\t\t\t\tPeriodSeconds:                 0,\n- \t\t\t\t\t\t\tSuccessThreshold:              1,\n+ \t\t\t\t\t\t\tSuccessThreshold:       
      0,\n- \t\t\t\t\t\t\tFailureThreshold:              3,\n+ \t\t\t\t\t\t\tFailureThreshold:              0,\n  \t\t\t\t\t\t\tTerminationGracePeriodSeconds: nil,\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tStartupProbe: &v1.Probe{\n  \t\t\t\t\t\t\tProbeHandler: v1.ProbeHandler{\n  \t\t\t\t\t\t\t\tExec: nil,\n  \t\t\t\t\t\t\t\tHTTPGet: &v1.HTTPGetAction{\n  \t\t\t\t\t\t\t\t\tPath:        \"/healthz/ready\",\n  \t\t\t\t\t\t\t\t\tPort:        {IntVal: 1936},\n  \t\t\t\t\t\t\t\t\tHost:        \"localhost\",\n- \t\t\t\t\t\t\t\t\tScheme:      \"HTTP\",\n+ \t\t\t\t\t\t\t\t\tScheme:      \"\",\n  \t\t\t\t\t\t\t\t\tHTTPHeaders: nil,\n  \t\t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t\tTCPSocket: nil,\n  \t\t\t\t\t\t\t\tGRPC:      nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tInitialDelaySeconds:           0,\n  \t\t\t\t\t\t\tTimeoutSeconds:                1,\n  \t\t\t\t\t\t\tPeriodSeconds:                 1,\n- \t\t\t\t\t\t\tSuccessThreshold:              1,\n+ \t\t\t\t\t\t\tSuccessThreshold:              0,\n  \t\t\t\t\t\t\tFailureThreshold:              120,\n  \t\t\t\t\t\t\tTerminationGracePeriodSeconds: nil,\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tLifecycle:              nil,\n  \t\t\t\t\t\tTerminationMessagePath: \"/dev/termination-log\",\n  \t\t\t\t\t\t... // 6 identical fields\n  \t\t\t\t\t},\n  \t\t\t\t},\n  \t\t\t\tEphemeralContainers: nil,\n  \t\t\t\tRestartPolicy:       \"Always\",\n  \t\t\t\t... // 31 identical fields\n  \t\t\t},\n  \t\t},\n  \t\tStrategy:        {Type: \"RollingUpdate\", RollingUpdate: &{MaxUnavailable: &{Type: 1, StrVal: \"25%\"}, MaxSurge: &{}}},\n  \t\tMinReadySeconds: 30,\n  \t\t... // 3 identical fields\n  \t},\n  \tStatus: {ObservedGeneration: 1, Replicas: 2, UpdatedReplicas:
2, UnavailableReplicas: 2, ...},\n  }\n"}

Note the following in the diff:

                                                Ports: []v1.ContainerPort{                                                                                                                                                                                                                                                                                                                                                               
                                                        {                                                                                                                                                                                                                                                                                                                                                                                
                                                                Name:          \"http\",                                                                                                                                                                                                                                                                                                                                                 
-                                                               HostPort:      80,                                                                                                                                                                                                                                                                                                                                                       
+                                                               HostPort:      0,                                                                                                                                                                                                                                                                                                                                                        
                                                                ContainerPort: 80,                                                                                                                                                                                                                                                                                                                                                       
                                                                Protocol:      \"TCP\",                                                                                                                                                                                                                                                                                                                                                  
                                                                HostIP:        \"\",                                                                                                                                                                                                                                                                                                                                                     
                                                        },                                                                                                                                                                                                                                                                                                                                                                               
                                                        {
                                                                Name:          \"https\",
-                                                               HostPort:      443,
+                                                               HostPort:      0,
                                                                ContainerPort: 443,
                                                                Protocol:      \"TCP\",
                                                                HostIP:        \"\",
                                                        },
                                                        {
                                                                Name:          \"metrics\",
-                                                               HostPort:      1936,
+                                                               HostPort:      0,
                                                                ContainerPort: 1936,
                                                                Protocol:      \"TCP\",
                                                                HostIP:        \"\",
                                                        },
                                                },

Expected results

The operator should ignore updates by the API that only set default values. The operator should not perform these unnecessary updates to the router deployment.

https://github.com/openshift/cluster-ingress-operator/pull/947

Bug OCPBUGS-16445: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1077

Bug OCPBUGS-17545: oc-mirror fails to exec on arm64 cluster with error: bin/usr/bin/registry/opm: exec format error

View the Description View the linked PRs

Description of problem:

oc-mirror fails to  on arm64 platform with error : Rendering catalog image "ec2-18-224-73-36.us-east-2.compute.amazonaws.com:5000/arm/home/ec2-user/ocmtest/oci-multi-index:1fb06f" with file-based catalog 
Rendering catalog image "ec2-18-224-73-36.us-east-2.compute.amazonaws.com:5000/arm/redhat/community-operator-index:v4.13" with file-based catalog 
error: error rebuilding catalog images from file-based catalogs: error regenerating the cache for ec2-18-224-73-36.us-east-2.compute.amazonaws.com:5000/arm/redhat/community-operator-index:v4.13: fork/exec /home/ec2-user/ocmtest/oc-mirror-workspace/src/catalogs/registry.redhat.io/redhat/community-operator-index/v4.13/bin/usr/bin/registry/opm: exec format error

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.  Clone the repo to arm64 cluster and build oc-mirror;
2. Copy the catalog index to localhost ;
`skopeo copy --all  --format oci  docker://registry.redhat.io/redhat/redhat-operator-index:v4.13 oci:///home/ec2-user/ocmtest/oci-multi-index  --remove-signatures`
3.  Run the oc-mirror command :
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
archiveSize: 16
mirror:
  operators:
  - catalog: oci:///home/ec2-user/ocmtest/oci-multi-index
    full: false # only mirror the latest versions
    packages:
    - name: cluster-logging
  - catalog: registry.redhat.io/redhat/community-operator-index:v4.13
    full: false # only mirror the latest versions
    packages:
    - name: namespace-configuration-operator
`oc-mirror --config config-413.yaml docker://xxxx:5000/arm --dest-skip-tls`

Expected results:

No errors and succeed

https://github.com/openshift/oc-mirror/pull/676

Bug OCPBUGS-5233: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-server/pull/128

Bug OCPBUGS-8005: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12998

Bug OCPBUGS-13250: Disk "name" in BMH HardwareDetails is incorrect

View the Description View the linked PRs

After installation with the assisted installer, the cluster contains BareMetalHost CRs (in the 'unmanaged' state) generated by assisted. These CRs include HardwareDetails data captured from the assisted-installer-agent.
Likely due to misleading documentation in Metal³ (since fixed by https://github.com/metal3-io/baremetal-operator/pull/657), the name field of storage devices is set to a name like sda instead of what Metal³'s own inspection would set it to, which is /dev/sda. This field is meant to be round-trippable to the rootDeviceHints, and as things stand it is not.

https://github.com/openshift/assisted-service/pull/5193

Bug OCPBUGS-16531: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/35

Bug OCPBUGS-17346: Avoid recreating prometheus Statefulset during 4.13 > 4.14 upgrades

View the Description View the linked PRs

Description of problem:

Due to https://github.com/openshift/cluster-monitoring-operator/pull/1986, the prometheus-operator was instructed to inject the app.kubernetes.io/part-of: openshift-monitoring label (via its --labels option) to resources it creates.

The label is also

Version-Release number of selected component (if applicable):

4.14

How reproducible:

upgrade to a 4.14 version with the commit https://github.com/openshift/cluster-monitoring-operator/pull/1986

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

We should avoid recreating the statefulset as this leads to downtime (for Prometheus, both Pods are recreated)

Additional info:

See why prometheus-operator doesn't use cascade=orphan for deletion (to keep the Pods around and avoid downtime)
Maybe other statefulsets are recreated as well (alertmanager etc.), maybe removing the --labels option will fix it for all of them (they are all created by the operator)
See if we touched the matchLabels of other statefulsets outside the control of the prom operator
See if we can add an origin test to make sure Statefulsets (maybe other resources as well are not recreated), can we really live with that? (what if we really want to change an immutable field), maybe in origin we can specify upgrade versions??

https://github.com/openshift/cluster-monitoring-operator/pull/2066

Bug OCPBUGS-4370: Make sure k8s.ovn.org/node-primary-ifaddr annotation is correct

View the Description View the linked PRs

When we set the k8s.ovn.org/node-primary-ifaddr annotation on the node, we simply take the first valid IP address we find on the node gateway. We exclude link-local addresses and those in internally reserved subnets (https://github.com/openshift/ovn-kubernetes/pull/1386).

Now, we might have more than one "valid" IP address on the gateway, as observed in:
https://bugzilla.redhat.com/show_bug.cgi?id=2081390#c11 , https://bugzilla.redhat.com/show_bug.cgi?id=2081390#c14

For instance, taken from a different cluster than in the linked BZ:

sh-4.4# ip a show br-ex
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 00:52:12:af:f3:53 brd ff:ff:ff:ff:ff:ff
inet6 fd69::2/125 scope global dadfailed tentative <---- masquerade IP, excluded
valid_lft forever preferred_lft forever
inet6 fd2e:6f44:5dd8:c956::4/128 scope global nodad deprecated <--- real node IP, included
valid_lft forever preferred_lft 0sec
inet6 fd2e:6f44:5dd8:c956::17/128 scope global dynamic noprefixroute <---added by keepalive, INCLUDED!!
valid_lft 3017sec preferred_lft 3017sec
inet6 fe80::252:12ff:feaf:f353/64 scope link noprefixroute <--- link local, excluded
valid_lft forever preferred_lft forever

Above we have fd2e:6f44:5dd8:c956::4/128 which is the LB VIP of ingress added by keepalive.

We don't currently distinguish in the code between the node IP as in node.spec.IP and other IPs that might be added to br-ex by other components.

Would it be a good idea to just set the node primary address annotation to match node.spec.IP?

Bug OCPBUGS-8237: Terraform is hammering Ironic API with requests

View the Description View the linked PRs

Description of problem:

If you check the Ironic API logs from a bootstrap VM, you'll see that terraform is making several GET requests per second. This is way too much, bare metal machine states do not change that fast. Not even on virtual emulation.

2023-03-01 12:37:38.234 1 INFO eventlet.wsgi.server [None req-c5628ecb-c94c-4b7c-95b3-2ee933ba850b - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0060174[00m
2023-03-01 12:37:38.240 1 INFO eventlet.wsgi.server [None req-275e077e-8ec7-43a9-8948-e1d39b46b331 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0056679[00m
2023-03-01 12:37:38.246 1 INFO eventlet.wsgi.server [None req-0d867822-fcff-4ba0-8773-37415b3f532f - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0056052[00m
2023-03-01 12:37:38.252 1 INFO eventlet.wsgi.server [None req-7e64cb21-869e-4a98-ad18-54adb6e5dec5 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0055907[00m
2023-03-01 12:37:38.258 1 INFO eventlet.wsgi.server [None req-de9995a8-9201-47b0-aa40-505e39b48279 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0055318[00m
2023-03-01 12:37:38.265 1 INFO eventlet.wsgi.server [None req-9e969582-0388-4e47-ad5b-966e1fd2a6da - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0059781[00m
2023-03-01 12:37:38.354 1 INFO eventlet.wsgi.server [None req-84fad0b8-2a28-476e-90c9-ebb6a9cda833 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0884116[00m

https://github.com/openshift/installer/pull/6956

Bug OCPBUGS-12494: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/92

Bug OCPBUGS-12897: Knative Route Details Page should show the URL of the route as it is shown in the Openshift Routes Details page

View the Description View the linked PRs

Description of problem:

Currently the Knative Routes Details page doesnot show the URL of the Route.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install Knative Serving (Serverless Operator)
2. Create a SF from the Add Page.
3. Navigate to the Knative Routes Details page

Actual results:

No URL is shown

Expected results:

URL should be shown

Additional info:

Images: https://drive.google.com/drive/folders/13Ya0mFhDrgFIrVcq6DaLyOxZbatz82Al?usp=share_link

https://github.com/openshift/console/pull/12853

Bug OCPBUGS-14848: Validation fails due the chose non block disk

View the Description View the linked PRs

Description of problem:

when using agent based installer to provision OCP, the Validation failed with the following message:
"id": "sufficient-installation-disk-speed"
"status": "failure"
"message": "While preparing the previous installation the installation disk speed measurement failed or was found to be insufficient"

Version-Release number of selected component (if applicable):

4.13.0
{

  "versions": {

    "assisted-installer": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a8b33263729ab42c0ff29b9d5e8b767b7b1a9b31240c592fa8d173463fb04d1",

    "assisted-installer-controller": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce3e2e4aac617077ac98b82d9849659595d85cd31f17b3213da37bc5802b78e1",

    "assisted-installer-service": "Unknown",

    "discovery-agent": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70397ac41dffaa5f3333c00ac0c431eff7debad9177457a038b6e8c77dc4501a"

  }

}

How reproducible:

100%

Steps to Reproduce:

1. Using agent based installer provision the DELL 16G server
2. 
3.

Actual results:

Validation failed with "sufficient-installation-disk-speed"

Expected results:

Validation pass

Additional info:

[root@c2-esx02 bin]# lsblkNAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTSloop0         7:0    0 125.7G  0 loop /var/lib/containers/storage/overlay                                      /var                                      /etc                                      /run/ephemeralloop1         7:1    0   934M  0 loop /usr                                      /boot                                      /                                      /sysrootnvme1n1     259:0    0   1.5T  0 disknvme0n1     259:2    0 894.2G  0 disk├─nvme0n1p1 259:6    0     2M  0 part├─nvme0n1p2 259:7    0    20M  0 part├─nvme0n1p3 259:8    0  93.1G  0 part├─nvme0n1p4 259:9    0 701.9G  0 part└─nvme0n1p5 259:10   0  99.2G  0 partnvme2n1     259:3    0   1.5T  0 disknvme4n1     259:4    0   1.5T  0 disknvme3n1     259:5    0   1.5T  0 disk[root@c2-esx02 bin]# ls -lh /dev |grep nvmecrw-------.   1 root root    239,     0 Jun 12 06:01 nvme0-rw-r--r--.   1 root root          4.0M Jun 12 06:04 nvme0c0n1brw-rw----.   1 root disk    259,     2 Jun 12 06:01 nvme0n1brw-rw----.   1 root disk    259,     6 Jun 12 06:01 nvme0n1p1brw-rw----.   1 root disk    259,     7 Jun 12 06:01 nvme0n1p2brw-rw----.   1 root disk    259,     8 Jun 12 06:01 nvme0n1p3brw-rw----.   1 root disk    259,     9 Jun 12 06:01 nvme0n1p4brw-rw----.   1 root disk    259,    10 Jun 12 06:01 nvme0n1p5crw-------.   1 root root    239,     1 Jun 12 06:01 nvme1brw-rw----.   1 root disk    259,     0 Jun 12 06:01 nvme1n1crw-------.   1 root root    239,     2 Jun 12 06:01 nvme2brw-rw----.   1 root disk    259,     3 Jun 12 06:01 nvme2n1crw-------.   1 root root    239,     3 Jun 12 06:01 nvme3brw-rw----.   1 root disk    259,     5 Jun 12 06:01 nvme3n1crw-------.   1 root root    239,     4 Jun 12 06:01 nvme4brw-rw----.   1 root disk    259,     4 Jun 12 06:01 nvme4n1[root@c2-esx02 bin]# lsblk -f nvme0c0n1lsblk: nvme0c0n1: not a block device[root@c2-esx02 bin]# ls -l /dev/disk/by-id/total 0lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-CN0WW56VFCP0033900HU -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB112600291P9SGN -> ../../nvme3n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB115400P81P9SGN -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB120401CP1P9SGN -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB124501MF1P9SGN -> ../../nvme4n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB112600291P9SGN -> ../../nvme3n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB115400P81P9SGN -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB120401CP1P9SGN -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB124501MF1P9SGN -> ../../nvme4n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.0050434209000001 -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e44e7a445351 -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e48f14515351 -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e49d3e605351 -> ../../nvme4n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e4fd973e5351 -> ../../nvme3n1[root@c2-esx02 bin]# ls -l /dev/disk/by-pathtotal 0lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c3:00.0-nvme-1 -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c4:00.0-nvme-1 -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c5:00.0-nvme-1 -> ../../nvme3n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c6:00.0-nvme-1 -> ../../nvme4n1

https://github.com/openshift/assisted-installer-agent/pull/558

Bug OCPBUGS-19730: [release-4.14] kubevirt csi driver external provisioner/attacher timeout too low

View the Description View the linked PRs

Description of problem:

The timeout of calls to the csi driver from both the external csi-provisioner and csi-attacher are 15 seconds by default. However hotplugging a volume into the Virtual Machine can take up to a minute (sometimes more). This causes the context timeout to expire, and in some cases causes the bookkeeping of what volumes are attached to become corrupted, and detaching the volumes doesn't always get handled properly afterwards.

Version-Release number of selected component (if applicable):

How reproducible:

Run the standard csi conformance tests against the csi driver. Most of the runs this issue will appear as a random failed test or two. The failed test are because the deletion of the persistent volume never happens.

Because of this we cannot get a good signal on the state of the csi driver.

Steps to Reproduce:

1.
2.
3.

Actual results:

Random failed tests of the csi conformance suite.

Expected results:

csi conformance suite passes

Additional info:

Fixed in upstream by increasing the timeouts to 3 minutes instead of 15 seconds.

https://github.com/openshift/kubevirt-csi-driver/pull/25

Bug OCPBUGS-3064: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/330

Bug OCPBUGS-12157: ProvisioningFailed: error generating accessibility requirements: no topology key found on CSINode

View the Description View the linked PRs

Description of problem:

After adding FailureDomain topology as day-2 operation, I get ProvisioningFailed due to error generating accessibility requirements: no topology key found on CSINode ocp-storage-fxsc6-worker-0-fb977

Version-Release number of selected component (if applicable):

pre-merge payload with opt-in CSIMigration PRs

How reproducible:

2/2

Steps to Reproduce:

1. I installed the cluster without specifying the failureDomains (so I got one which generated by installer)
2. Added new failureDomain to test topology, and make sure all related resources(datacenterand ClusterComputeResource) are tagged in vsphere 
3. create pvc but failed with provisioning:
Warning ProvisioningFailed 80m (x14 over 103m) csi.vsphere.vmware.com_ocp-storage-fxsc6-master-0_a18e2651-6455-42b2-abc2-b3b3d197da56 failed to provision volume with StorageClass "thin-csi": error generating accessibility requirements: no topology key found on CSINode ocp-storage-fxsc6-worker-0-fb977

4. Here is the node label and csinode info 
$ oc get node ocp-storage-fxsc6-worker-0-b246w --show-labels 
NAME STATUS ROLES AGE VERSION LABELS 
ocp-storage-fxsc6-worker-0-b246w Ready worker 8h v1.26.3+2727aff beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-storage-fxsc6-worker-0-b246w,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos 
$ oc get csinode ocp-storage-fxsc6-worker-0-b246w -ojson | jq .spec.drivers[].topologyKeys 
null 

5. other logs:
I only find something in csi-driver-controller-8597f567f8-4f8z6 {"level":"info","time":"2023-04-17T10:30:13.352999527Z","caller":"k8sorchestrator/topology.go:326","msg":"failed to retrieve tags for category \"cns.vmware.topology-preferred-datastores\". Reason: GET https://ocp-storage.vmc.qe.devcluster.openshift.com:443/rest/com/vmware/cis/tagging/category/id:cns.vmware.topology-preferred-datastores: 404 Not Found","TraceId":"573c3fc8-e6cf-4594-8154-07bd514fcb46"}

In the vpd pod, the tag check passed: I0417 11:05:02.711093 1 util.go:110] Looking for CC: workloads-02 I0417 11:05:02.766516 1 zones.go:168] ClusterComputeResource: ClusterComputeResource:domain-c5265 @ /OCP-DC/host/workloads-02 I0417 11:05:02.766622 1 zones.go:64] Validating tags for ClusterComputeResource:domain-c5265. I0417 11:05:02.813568 1 zones.go:81] Processing attached tags I0417 11:05:02.813678 1 zones.go:90] Found Region: region-A I0417 11:05:02.813721 1 zones.go:96] Found Zone: zone-B I0417 11:05:02.834718 1 util.go:110] Looking for CC: qe-cluster/workloads-03 I0417 11:05:02.844475 1 reflector.go:559] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: Watch close - *v1.ConfigMap total 7 items received I0417 11:05:02.890279 1 zones.go:168] ClusterComputeResource: ClusterComputeResource:domain-c9002 @ /OCP-DC/host/qe-cluster/workloads-03 I0417 11:05:02.890406 1 zones.go:64] Validating tags for ClusterComputeResource:domain-c9002. I0417 11:05:02.946720 1 zones.go:81] Processing attached tags I0417 11:05:02.946871 1 zones.go:96] Found Zone: zone-C I0417 11:05:02.946917 1 zones.go:90] Found Region: region-A I0417 11:05:02.946965 1 vsphere_check.go:242] CheckZoneTags passed

Actual results:

Provisioning failed.

Expected results:

Provisioning should be succeed.

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver/pull/82

Bug OCPBUGS-12327: Update 4.14 ose-network-interface-bond-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/bond-cni/pull/52

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/bond-cni/pull/52

Bug OCPBUGS-8404: Specifying non-existen secret for API namedCertificates renders inconsistent config and causes kube-apiserver crash-loop

View the Description View the linked PRs

Description of problem:

If a custom API server certificate is added as per documentation[1], but the secret name is wrong and points to a non-existing secret, the following happens:
- The kube-apiserver config is rendered with some of the namedCertificates pointing to /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/
- As the secret in apiserver/cluster object is wrong, no user-serving-cert-000 secret is generated, so the /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/ does not exist (and may be automatically removed if manually created).
- The combination of the 2 points above causes kube-apiserver to start crash-looping because its config points to non-existent certificates.

This is a cluster-kube-apiserver-operator, because it should validate that the specified secret exists and degrade and do nothing if it doesn't, not render inconsistent configuration.

Version-Release number of selected component (if applicable):

First found in 4.11.13, but also reproduced in the latest nightly build.

How reproducible:

Always

Steps to Reproduce:

1. Setup a named certificate pointing to a secret that doesn't exist.
2.
3.

Actual results:

Inconsistent configuration that points to non-existing secret. Kube API server pod crash-loop.

Expected results:

Cluster Kube API Server Operator to detect that the secret is wrong, do nothing and only report itself as degraded with meaningful message so the user can fix. No Kube API server pod crash-looping.

Additional info:

Once the kube-apiserver is broken, even if the apiserver/cluster object is fixed, it is usually needed to apply a manual workaround in the crash-looping master. An example of workaround that works is[2], even though that KB article was written for another bug with different root cause. 

References:

[1] - https://docs.openshift.com/container-platform/4.11/security/certificates/api-server.html#api-server-certificates
[2] - https://access.redhat.com/solutions/4893641

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1482

Task MGMT-8097: Expose schedulable masters via k8s API

View the Description View the linked PRs

The ability to schedule workloads on master nodes is currently exposed via the REST API as a boolean Cluster property "schedulable_masters". For the k8s, we should align with other OpenShift APIs and have a boolean property in the ACM Spec called mastersSchedulable.

https://github.com/openshift/assisted-service/pull/5240

Bug OCPBUGS-11910: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/623

Bug OCPBUGS-15044: [performance] Checking IRQBalance settings Verify GloballyDisableIrqLoadBalancing Spec field [test_id:36150] Verify that IRQ load balancing is enabled/disabled correctly

View the Description View the linked PRs

Description of problem:

[performance] Checking IRQBalance settings Verify GloballyDisableIrqLoadBalancing Spec field [test_id:36150] Verify that IRQ load balancing is enabled/disabled correctly

[rfe_id:27368][performance]
 Pre boot tuning adjusted by tuned  
[test_id:35363][crit:high][vendor:cnf-qe@redhat.com][level:acceptance] 
stalld daemon is running on the host

[rfe_id:27363][performance]
 CPU Management Verification of cpu manager functionality Verify CPU 
usage by stress PODs [test_id:27492] Guaranteed POD should work on 
isolated cpu

tests fails often in 4.13 and 4.14 upstream CI jobs

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-telco5g-cnftests/1669344976506458112/artifacts/e2e-telco5g-cnftests/telco5g-cnf-tests/artifacts/test_results.html

Version-Release number of selected component (if applicable):

4.14 4.13

How reproducible:

CI job

Steps to Reproduce:

Ci job

Actual results:

failures

Expected results:

pass

Additional info:

https://snapshots.raintank.io/dashboard/snapshot/6sZ1uBR5P1O1gknyxebPQPtEo7RVEu0C
history and pass/fail ratio

https://github.com/openshift/cluster-node-tuning-operator/pull/768

Bug OCPBUGS-16174: Update VSCode Extension and link and descriptions on Create Serverless Function form

View the Description View the linked PRs

Description of problem:

Update the VScode extension link to https://marketplace.visualstudio.com/items?itemName=redhat.vscode-openshift-connector

And change the description to

The OpenShift Serverless Functions support in the VSCode IDE extension enables developers to effortlessly create, build, run, invoke and deploy serverless functions on OpenShift, providing a seamless development experience within the familiar VSCode environment.

https://github.com/openshift/console/pull/13015

Bug OCPBUGS-19083: Ironic: Invalid cross-device link

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19019~~. The following is the description of the original issue:
—
Using metal-ipi with okd-scos ironic fails to provision nodes

https://github.com/openshift/ironic-image/pull/399

Bug OCPBUGS-3860: agent based installer does not have mastersSchedulable parameter

View the Description View the linked PRs

Description of problem:

I have completed to install OCP as 3 masters and 2 workers.
But I was not able to find mastersSchedulable parameter after command below from all files on manafest directory.
$ openshift-install agent create cluster-manifests  --log-level debug --dir kni

And I used the installer this.
https://github.com/openshift/installer/releases/tag/agent-installer-v4.11.0-dev-preview-2

Version-Release number of selected component (if applicable):

How reproducible:

execution the installer

Steps to Reproduce:

1. download the installer
2. openshift-install agent create cluster-manifests  --log-level debug --dir kni

Actual results:

There is no mastersSchedulable parameter

Expected results:

Some file(like cluster-scheduler-02-config.yml) has mastersSchedulable parameter

Additional info:

https://github.com/openshift/installer/pull/7439

Bug MGMT-13955: [BE] - Creating a cluster with skip validation - cluster fails on prepare-for-installation phase but no error/warning in events

View the Description View the linked PRs

Description of the problem:

In BE 2.16.0 - try to install new cluster with enabled ignore-validation {"host-validation-ids": "[\"all\"]", "cluster-validation-ids": "[\"all\"]"} - one host with less HD space (18GB). Installation starts, but after 20 minutes waiting, cluster is back to draft status without any event

How reproducible:

100%

Steps to reproduce:

1. Create new multi cluster - configure one of the hosts to have 18GB HD (minimum req is 20GB)

2. Enable ignore-validations by:

curl -X 'PUT' \
  'http://api.openshift.com/api/assisted-install/v2/clusters/eaffbd37-2a0b-42b2-a706-ad5b23ff17a3/ignored-validations' \
  --header "Authorization: Bearer $(ocm token)" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "ignored_host_validations": "[\"all\"]",
  "ignored_cluster_validations": "[\"all\"]"
}'

3. start installation. cluster is stuck on prepare-for-installation for 20 minutes and then moves to draft with no event about the reason

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5158

Story MGMT-14298: zVM: minimal-ISO is set as default in assisted installer

View the Description View the linked PRs

This issue is valid for UI and API.
For UI
If a new cluster is being created and s390x is selected as architecture, an error message pops up if next button is being pressed (all other necessary values are filed correctly):

"cannot use Minimal ISO because it's not compatible with the s390x architecture on version 4.13.0-rc.3-multi of OpenShift"

There is no workaround because the matching selection (full-iso or iPXE) could be set on addHosts Dialog.

For API
The infra env object could not be created if type is not set. The error message:
"cannot use Minimal ISO because it's not compatible with the s390x architecture on version 4.13.0-rc.3-multi of OpenShift"
is returned.

Workaround is to set image_type to "full-iso" during infra env creation.

For s390x architecture the default should be always full-iso.

https://github.com/openshift/assisted-service/pull/5136

Bug OCPBUGS-17184: Update 4.14 ose-nutanix-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/12

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/12

Bug OCPBUGS-17315: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/85

Bug OCPBUGS-11479: Error extracting libnmstate.so.1.3.3 when create image

View the Description View the linked PRs

Description of problem:

There is error when creating image:
FATAL failed to fetch Agent Installer ISO: failed to generate asset "Agent Installer ISO": stat /home/core/.cache/agent/files_cache/libnmstate.so.1.3.3: no such file or directory

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-04-06-060829

How reproducible:

always

Steps to Reproduce:

1. Prepare the agent-config.yaml and install-config.yaml files

2. Run 'bin/openshift-install agent create image --log-level debug'

3. There is following output with errors:
DEBUG extracting /usr/bin/agent-tui to /home/core/.cache/agent/files_cache, oc image extract --path /usr/bin/agent-tui:/home/core/.cache/agent/files_cache --confirm quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c11d31d47db4afb03e4a4c8c40e7933981a2e3a7ef9805a1413c441f492b869b 
DEBUG Fetching image from OCP release (oc adm release info --image-for=agent-installer-node-agent --insecure=true registry.ci.openshift.org/ocp/release@sha256:83caa0a8f2633f6f724c4feb517576181d3f76b8b76438ff752204e8c7152bac) 
DEBUG extracting /usr/lib64/libnmstate.so.1.3.3 to /home/core/.cache/agent/files_cache, oc image extract --path /usr/lib64/libnmstate.so.1.3.3:/home/core/.cache/agent/files_cache --confirm quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c11d31d47db4afb03e4a4c8c40e7933981a2e3a7ef9805a1413c441f492b869b 
DEBUG File /usr/lib64/libnmstate.so.1.3.3 was not found, err stat /home/core/.cache/agent/files_cache/libnmstate.so.1.3.3: no such file or directory 
ERROR failed to write asset (Agent Installer ISO) to disk: cannot generate ISO image due to configuration errors 
FATAL failed to fetch Agent Installer ISO: failed to generate asset "Agent Installer ISO": stat /home/core/.cache/agent/files_cache/libnmstate.so.1.3.3: no such file or directory

Actual results:

The image generate fail

Expected results:

The image should generate success.

Additional info:

https://github.com/openshift/installer/pull/7075

Bug OCPBUGS-13359: Typing in Quick Starts filter input field will crash the Console

View the Description View the linked PRs

Description of problem:

When typing into the filter input field at the Quick Starts page, console will crash

Version-Release number of selected component (if applicable):

4.13.0-rc.7

How reproducible:

Always

Steps to Reproduce:

1. Go to the Quick Starts page 
2. Type something into the filter input field
3.

Actual results:

Console will crash:


TypeError
Description:
t.toLowerCase is not a functionComponent trace:
at Sn (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:168364)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:874032)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/quick-start-chunk-274c58e3845ea0aa718b.min.js:1:202)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:241397)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:241397)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:67583)
    at T
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:69628)
    at Suspense
    at i (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:450974)
    at section
    at m (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:720272)
    at div
    at div
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1528877)
    at div
    at div
    at c (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:545409)
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:774923)
    at div
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:458124)
    at l (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1170951)
    at https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:457833
    at S (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:98:86864)
    at main
    at div
    at v (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:264066)
    at div
    at div
    at c (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:62024)
    at div
    at div
    at c (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:545409)
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:774923)
    at div
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:458124)
    at Un (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:183620)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:874032)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/quick-start-chunk-274c58e3845ea0aa718b.min.js:1:1261)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:241397)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1605535)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at _t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:142374)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at i (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:829516)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1599727)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1599916)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1597332)
    at te (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623385)
    at https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1626517
    at r (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:121910)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:67583)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:69628)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:64188)
    at re (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1626828)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:803496)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1074899)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:652518)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:150:190871)
    at Suspense
Stack trace:
TypeError: t.toLowerCase is not a function
    at pt (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:136019)
    at Sn (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:168723)
    at na (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:58879)
    at za (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:68397)
    at Hs (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:112289)
    at xl (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:98327)
    at Cl (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:98255)
    at _l (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:98118)
    at pl (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:95105)
    at https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:44774

Expected results:

Console should work

Additional info:

https://github.com/openshift/console/pull/13126

Bug OCPBUGS-15499: Console-operator is hotlooping

View the Description View the linked PRs

Description of problem:

Console-operator's config file gets updated every couple of seconds, where only the `resourceVersion` field get s changed.

Version-Release number of selected component (if applicable):

4.14-ec-2

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/774

Bug OCPBUGS-16074: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/141

Bug OCPBUGS-16265: Cluster operator storage Degraded is True with PowerVSBlockCSIDriverOperatorCR_PowerVSBlockCSIDriverStaticResources

View the Description View the linked PRs

Description of problem:

Getting below error while creating cluster in mon01 zone
Joblink: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-ovn-ppc64le-powervs/1680759459892170752
Error:
level=info msg=Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"c3773b1e-8818-4bfc-9605-dbd9dbc0c03f","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}
level=info msg=Cluster operator network ManagementStateDegraded is False with : 
level=error msg=Cluster operator storage Degraded is True with PowerVSBlockCSIDriverOperatorCR_PowerVSBlockCSIDriverStaticResourcesController_SyncError: PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/main_attacher_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-main-attacher-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/main_provisioner_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-main-provisioner-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/volumesnapshot_reader_provisioner_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-provisioner-volumesnapshot-reader-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/main_resizer_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-main-resizer-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/storageclass_reader_resizer_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-resizer-storageclass-reader-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded:

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Expected results:

cluster creation should be successful

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/386

Bug OCPBUGS-8478: TestBoundTokenSignerController causes unrecoverable disruption in e2e-gcp-operator CI job

View the Description View the linked PRs

The cluster-kube-apiserver-operator CI has been constantly failing for the past week and more specifically the e2e-gcp-operator job because the test cluster ends in a state where a lot of requests start failing with "Unauthorized" errors.

This caused multiple operators to become degraded and tests to fail.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1450/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1631333936435040256

Looking at the failures and a must-gather we were able to capture inside of a test cluster, it turned out that the service account issuer could be the culprit here. Because of that we opened https://issues.redhat.com/browse/API-1549.

However, it turned that disabling TestServiceAccountIssuer didn't resolve the issue and the cluster was still too unstable for the tests to pass.

In a separate attempt we also tried disabling TestBoundTokenSignerController and this time the tests were passing. However, the cluster was still very unstable during the e2e run and the kube-apiserver-operator went degraded a couple of times: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1455/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1632871645171421184/artifacts/e2e-gcp-operator/gather-extra/artifacts/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-5cf9d4569-m2spq_kube-apiserver-operator.log.

On top of that instead of seeing Unauthorized errors, we are now seeing a lot of connection refused.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1455

Bug MGMT-14226: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5153

Bug OCPBUGS-10961: Fix description for BuildAdapter SDK extension

View the Description View the linked PRs

Description of problem:

The description for the BuildAdapter SDK extension is wrong.

Actual results:

BuildAdapter contributes an adapter to adapt element to data that can be used by Pod component

Expected results:

BuildAdapter contributes an adapter to adapt element to data that can be used by Build component

Additional info:

https://github.com/openshift/console/pull/12683

Bug OCPBUGS-9909: Could not import multiple resources via JSON (while YAML supports this)

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):
All versions?
At least on 4.12+

How reproducible:
Always

Steps to Reproduce:

Open the console and click on the + sign in the top right navigation header.

This JSON works fine:

{
  "apiVersion": "v1",
  "kind": "ConfigMap",
  "metadata": {
    "generateName": "a-configmap-"
  }
}

But neither an array could be used to import multiple resources:

[
  {
    "apiVersion": "v1",
    "kind": "ConfigMap",
    "metadata": {
      "generateName": "a-configmap-"
    }
  },
  {
    "apiVersion": "v1",
    "kind": "ConfigMap",
    "metadata": {
      "generateName": "a-configmap-"
    }
  }
]

Fails with error: No "apiVersion" field found in YAML.

Nor a Kubernetes List "resource" could be used:

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "apiVersion": "v1",
      "kind": "ConfigMap",
      "metadata": {
        "generateName": "a-configmap-"
      }
    },
    {
      "apiVersion": "v1",
      "kind": "ConfigMap",
      "metadata": {
        "generateName": "a-configmap-"
      }
    }
  ]
}

Fails with error: The server doesn't have a resource type "kind: List, apiVersion: v1".

Actual results:
Both JSON structures could not be imported.

Expected results:
Both JSON structures works fine and create multiple resources.

If the JSON array contains just one item the resource detail page should be opened, otherwise the import result page similar to when the user imports a yaml with multiple resources.

Additional info:
Found this JSON structure for example in issue ~~OCPBUGS-4646~~

https://github.com/openshift/console/pull/12721

Bug OCPBUGS-17280: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/229

Bug OCPBUGS-4122: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3653

Bug OCPBUGS-8092: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-9985: TCP DNS Local Preference is not working for Openshift SDN

View the Description View the linked PRs

Description of problem:

DNS Local endpoint preference is not working for TCP DNS requests for Openshift SDN.

Reference code: https://github.com/openshift/sdn/blob/b58a257b896d774e0a092612be250fb9414af5ca/vendor/k8s.io/kubernetes/pkg/proxy/iptables/proxier.go#L999-L1012

This is where the DNS request is short-circuited to the local DNS endpoint if it exists. This is important because DNS local preference protects against another outstanding bug, in which daemonset pods go stale for a few second upon node shutdown (see https://issues.redhat.com/browse/OCPNODE-549 for fix for graceful node shutdown). This appears to be contributing to DNS issues in our internal CI clusters. https://lookerstudio.google.com/reporting/3a9d4e62-620a-47b9-a724-a5ebefc06658/page/MQwFD?s=kPTlddLa2AQ shows large amounts of "dns_tcp_lookup" failures, which I attribute to this bug.

UDP DNS local preference is working fine in Openshift SDN. Both UDP and TCP local preference work fine in OVN. It's just TCP DNS Local preference that is not working Openshift SDN.

Version-Release number of selected component (if applicable):

4.13, 4.12, 4.11

How reproducible:

100%

Steps to Reproduce:

1. oc debug -n openshift-dns
2. dig +short +tcp +vc +noall +answer CH TXT hostname.bind
# Retry multiple times, and you should always get the same local DNS pod.

Actual results:

[gspence@gspence origin]$ oc debug -n openshift-dns
Starting pod/image-debug ...
Pod IP: 10.128.2.10
If you don't see a command prompt, try pressing enter.
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-gzlhm"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-dnbsp"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-gzlhm"

Expected results:

[gspence@gspence origin]$ oc debug -n openshift-dns
Starting pod/image-debug ...
Pod IP: 10.128.2.10
If you don't see a command prompt, try pressing enter.
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"

Additional info:

https://issues.redhat.com/browse/OCPBUGS-488 is the previous bug I opened for UDP DNS local preference not working.

iptables-save from a 4.13 vanilla cluster bot AWS,SDN: https://drive.google.com/file/d/1jY8_f64nDWi5SYT45lFMthE0vhioYIfe/view?usp=sharing

https://github.com/openshift/sdn/pull/518

Story HOSTEDCP-802: Add Ability to Set NodePool UpgradeType Thru HyperShift CLI

View the Description View the linked PRs

As a user of the HyperShift CLI, I would like to be able to set the NodePool UpgradeType through a flag when either creating a new cluster or creating a new NodePool.

DoD:

A flag has been added to the create new cluster command allowing the NodePool UpgradeType to be set to either Replace or InPlace
A flag has been added to the create new NodePool command allowing the NodePool UpgradeType to be set to either Replace or InPlace
If either flag is not set, the default will be Replace as that is the current default

https://github.com/openshift/hypershift/pull/2367

Task MGMT-15213: temporary disable release-domain-name-resolved-correctly validation

View the Description View the linked PRs

There are few cases that this validation doesn't cover

proxy MGMT-15112
disconnected that use a mirror ~~MGMT-15056~~

The validation will be enabled with MGMT-15112

https://github.com/openshift/assisted-service/pull/5351

Bug OCPBUGS-18046: govc version need to be updated in the installer image

View the Description View the linked PRs

Description of problem:

we need update the govc version to support PR:https://github.com/openshift/release/pull/42334.
As the command "govc vm.network.change -dc xxx  -vm -net xxxxx " only support after govc version v0.30.4. then vm can not fetch ip correctly.

Version-Release number of selected component (if applicable):

ocp 4.14

How reproducible:

Steps to Reproduce:

Actual results:

"govc: path 'ci-segment-151'" resolves to multiple networks
if specific the -net with network path, will got "govc: network '/IBMCloud/host/vcs-mdcnc-workload-1/ci-segment-151' not found"

Expected results:

govc version update, govc vm.network.change can be used to get the unique network.

Additional info:

https://github.com/openshift/installer/pull/7425

Bug OCPBUGS-19333: [4.14] The BMH is stuck in registering "failed to register host in ironic: Bad Gateway"

View the Description View the linked PRs

OCP 4.14.0-rc.0
advanced-cluster-management.v2.9.0-130
multicluster-engine.v2.4.0-154

After encountering https://issues.redhat.com/browse/OCPBUGS-18959

Attempted to forcefully delete the BMH by removing the finalizer.
Then deleted all the metal3 pods.

Attempted to re-create the bmh.

Result:
the bmh is stuck in

oc get bmh
NAME                                           STATE         CONSUMER   ONLINE   ERROR   AGE
hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com   registering              true             15m

seeing this entry in the BMO log:

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"start","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"hardwareData is ready to be deleted","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"host ready to be powered off","baremetalhost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"provisioningState":"powering off before delete"}

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"provisioner.ironic","msg":"ensuring host is powered off (mode: hard)","host":"kni-qe-65~hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com"}

{"level":"error","ts":"2023-09-13T16:15:57Z","msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"namespace":"kni-qe-65","name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","reconcileID":"167061cc-7ab4-4c4a-ae45-8c19dfc3ac22","error":"action \"powering off before delete\" failed: failed to power off before deleting node: Host not registered","errorVerbose":"Host not registered\nfailed to power off before deleting node\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionPowerOffBeforeDeleting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:493\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handlePoweringOffBeforeDelete\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:585\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\naction \"powering off before delete\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:229\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}

https://github.com/openshift/ironic-image/pull/402

Bug OCPBUGS-19461: ovn-ipsec pods CLBO when IPSec NS extension/svc is enabled

View the Description View the linked PRs

Description of problem:

ovn-ipsec pods Crashes when IPSec NS extension/svc is enabled on any $ROLE nodes

IPSec ext and svc were enabled for 2 WORKERS only and their corresponding ovn-ipsec pods are in CLBO


[root@dell-per740-36 ipsec]# oc get pods 
NAME                                       READY   STATUS             RESTARTS         AGE
dell-per740-14rhtsengpek2redhatcom-debug   1/1     Running            0                3m37s
ovn-ipsec-bptr6                            0/1     CrashLoopBackOff   26 (3m58s ago)   130m
ovn-ipsec-bv88z                            1/1     Running            0                3h5m
ovn-ipsec-pre414-6pb25                     1/1     Running            0                3h5m
ovn-ipsec-pre414-b6vzh                     1/1     Running            0                3h5m
ovn-ipsec-pre414-jzwcm                     1/1     Running            0                3h5m
ovn-ipsec-pre414-vgwqx                     1/1     Running            3                132m
ovn-ipsec-pre414-xl4hb                     1/1     Running            3                130m
ovn-ipsec-qb2bj                            1/1     Running            0                3h5m
ovn-ipsec-r4dfw                            1/1     Running            0                3h5m
ovn-ipsec-xhdpw                            0/1     CrashLoopBackOff   28 (116s ago)    132m
ovnkube-control-plane-698c9845b8-4v58f     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-nlgs8     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-wfkd4     2/2     Running            0                3h5m
ovnkube-node-l6sr5                         8/8     Running            27 (66m ago)     130m
ovnkube-node-mj8bs                         8/8     Running            27 (75m ago)     132m
ovnkube-node-p24x8                         8/8     Running            0                178m
ovnkube-node-rlpbh                         8/8     Running            0                178m
ovnkube-node-wdxbg                         8/8     Running            0                178m
[root@dell-per740-36 ipsec]#

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-12-024050

How reproducible:

Always

Steps to Reproduce:

1.Install OVN IPSec cluster (East-West) 
2.Enable IPSec OS extension for North-South
3.Enable IPSec service for North-South

Actual results:

ovn-ipsec pods in CLBO state

Expected results:

All pods under ovn-kubernetes ns should be Running fine

Additional info:

One of the ovn-ipsec CLBO pods logs

# oc logs ovn-ipsec-bptr6
Defaulted container "ovn-ipsec" out of: ovn-ipsec, ovn-keys (init)
+ rpm --dbpath=/usr/share/rpm -q libreswan
libreswan-4.9-4.el9_2.x86_64
+ counter=0
+ '[' -f /etc/cni/net.d/10-ovn-kubernetes.conf ']'
+ echo 'ovnkube-node has configured node.'
ovnkube-node has configured node.
+ ip x s flush
+ ip x p flush
+ ulimit -n 1024
+ /usr/libexec/ipsec/addconn --config /etc/ipsec.conf --checkconfig
+ /usr/libexec/ipsec/_stackmanager start
+ /usr/sbin/ipsec --checknss
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
FATAL ERROR: /usr/libexec/ipsec/pluto: lock file "/run/pluto/pluto.pid" already exists
leak: string logger, item size: 48
leak: string logger prefix, item size: 27
leak detective found 2 leaks, total size 75

journalctl -u ipsec here: https://privatebin.corp.redhat.com/?216142833d016b3c#2Es8ACSyM3VWvwi85vTaYtSx8X3952ahxCvSHeY61UtT

https://github.com/openshift/cluster-network-operator/pull/2014

Bug OCPBUGS-9285: Documentation: Help Explain OpenShift Console List & Detail Resource Pages For Plugin Developers

View the Description View the linked PRs

The issue:

An interesting issue came up on #forum-ui-extensibility. There was an attempt to use extensions to nest a details page under a details page that contained a horizontal nav. This caused an issue with rendering the page content when a sub link was clicked – which caused confusion.

The why:

The reason this happened was the resource details page had a tab that contained a resource list page. This resource list page showed a number of items of CRs that when clicked would try to append their name onto the URL. This confused the navigation, thinking that this path must be another tab, so no tabs were selected and no content was visible. The goal was to reuse this longer path name as a details page of its own with its own horizontal nav. This issue is a conceptual misunderstanding of the way our list & details pages work in OpenShift Console.

List Pages are sometimes found via direct navigation links. List pages are almost all shown on the Search page, allowing a user to navigate to both existing nav items and other non-primary resources.

Details Pages are individual items found in the List Pages (a row). These are stand alone pages that show details of a singular CR and optionally can have tabs that list other resources – but they always transition to a fresh Details page instead of compounding on the currently visible one.

The ask:

If we could document this in a fashion that can help Plugin developers share the same UX that the rest of the Console does then we will have a more unified approach to UX within the Console and through any installed Plugins.

https://github.com/openshift/console/pull/13044

Bug OCPBUGS-9409: Import from git does not work with local BitBucket

View the Description View the linked PRs

==> Description of problem:

"Import from git" functionality with a local Bitbucket instance does not work, due to repository validation that requires to repository to be hosted on Bitbucket Cloud. [1][2]

[1] https://github.com/openshift/console/blob/release-4.10/frontend/packages/git-service/src/services/bitbucket-service.ts#L63

[2] https://github.com/openshift/console/blob/release-4.10/frontend/packages/git-service/src/services/bitbucket-service.ts#L18

==> Version-Release number of selected component (if applicable):

Tested in OCP 4.10

==> How reproducible: 100%

==> Steps to Reproduce:
1. Go to: Developer View > Add+ > From Git
2. Fill the "Git Repo URL" field with the BitBucket repo URL (i.e. http://<bitbucket_url>/scm/<project>/<repository>.git)
3. Select BitBucket from the "Git type" dropdowns button

==> Actual results:
"URL is valid but cannot be reached. If this is a private repository, enter a source Secret in advanced Git options"

==> Expected results:

This functionality should work also with hosted Bitbucket

==> Additional info:

To retrieve slug information from hosted BitBucket we can query: http://<bitbucket_url>/rest/api/1.0/projects/<project>/repos/<repository>

An example:

~~~
curl -ks http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/rest/api/1.0/projects/test/repos/test-repo | jq
{
"slug": "test-repo",
"id": 1,
"name": "test-repo",
"hierarchyId": "28fc5c8782050b43e223",
"scmId": "git",
"state": "AVAILABLE",
"statusMessage": "Available",
"forkable": true,
"project": {
"key": "TEST",
"id": 1,
"name": "test",
"public": false,
"type": "NORMAL",
"links": {
"self": [

{ "href": "http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/projects/TEST" }

]
}
},
"public": true,
"archived": false,
"links": {
"clone": [

{ "href": "http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/scm/test/test-repo.git", "name": "http" }

{ "href": "ssh://git@bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster:7999/test/test-repo.git", "name": "ssh" }

],
"self": [

{ "href": "http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/projects/TEST/repos/test-repo/browse" }

]
}
}
~~~

https://github.com/openshift/console/pull/13021

Bug OCPBUGS-12519: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/160

Bug OCPBUGS-14984: Must Gather does not include BF-2 firmware information

View the Description View the linked PRs

Description of problem:

The must gather should contain additional debug information such as the current configuration and firmware settings of any Bluefields / Mellanox device when using SRIOV

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/must-gather/pull/365

Bug OCPBUGS-16459: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-proxy/pull/260

Bug OCPBUGS-18308: ImageContentSourcePolicy in management cluster impacting KAS pull reference

View the Description View the linked PRs

Description of problem:

When the management cluster has ICSP resources, the pull reference of the Kube APIServer is replaced with a pull ref from the management cluster ICSPs resulting in a pull failure.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster with release registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-28-154013 on a management cluster that has ICSPs
2. Watch the kube-apiserver pods.

Actual results:

kube-apiserver pods are initially deployed with a pull ref from the release payload and they start, but then the deployment is updated with a pull ref from an ICSP mapping and the deployment fails to roll out.

Expected results:

kube-apiserver pods roll out successfully.

Additional info:

https://github.com/openshift/hypershift/pull/2966

Bug OCPBUGS-10910: The network-tools image stream is missing in the cluster samples

View the Description View the linked PRs

Description of problem:

The network-tools image stream is missing in the cluster samples. It is needed for CI tests.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/495

Bug OCPBUGS-12793: `oc new-app` does not propagate containerPort information to the deployment if import-mode is PreserveOriginal

View the Description View the linked PRs

When creating a deployment with `oc new-app` and using `--import-mode=PreserveOriginal`, if there are containerports that are present in the dockerfile, they do not get propagated to the deployment `spec.containers[i].ports[i].containerPort`.

On further inspection this is because the config object which gets passed from the image to the deployment does not contain these details. The image reference in this case is a manifestlisted image which does not contain the docker metadata. Instead these need to be derived from the child manifest.

https://github.com/openshift/oc/pull/1415

Bug OCPBUGS-13205: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/214

Bug OCPBUGS-16515: GCP Serial Failing on Simultaneous MachineSet Scaling Test

View the Description View the linked PRs

test=[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]

Appears to be perma-failing on gcp serial jobs.

We're at the edge of our visible data, but it looks like this may have happened around July 7

Sample failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview-serial/1681814026218115072

https://github.com/openshift/installer/pull/7317

Bug OCPBUGS-10362: revert "force cert rotation every couple days for development" in 4.14

View the Description View the linked PRs

Description of problem:

revert "force cert rotation every couple days for development" in 4.13

Below is the steps to verify this bug:

# oc adm release info --commits registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-06-25-081133|grep -i cluster-kube-apiserver-operator
  cluster-kube-apiserver-operator                https://github.com/openshift/cluster-kube-apiserver-operator                7764681777edfa3126981a0a1d390a6060a840a3

# git log --date local --pretty="%h %an %cd - %s" 776468 |grep -i "#1307"
08973b820 openshift-ci[bot] Thu Jun 23 22:40:08 2022 - Merge pull request #1307 from tkashem/revert-cert-rotation

# oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-081133   True        False         64m     Cluster version is 4.11.0-0.nightly-2022-06-25-081133

$ cat scripts/check_secret_expiry.sh
FILE="$1"
if [ ! -f "$1" ]; then
  echo "must provide \$1" && exit 0
fi
export IFS=$'\n'
for i in `cat "$FILE"`
do
  if `echo "$i" | grep "^#" > /dev/null`; then
    continue
  fi
  NS=`echo $i | cut -d ' ' -f 1`
  SECRET=`echo $i | cut -d ' ' -f 2`
  rm -f tls.crt; oc extract secret/$SECRET -n $NS --confirm > /dev/null
  echo "Check cert dates of $SECRET in project $NS:"
  openssl x509 -noout --dates -in tls.crt; echo
done

$ cat certs.txt
openshift-kube-controller-manager-operator csr-signer-signer
openshift-kube-controller-manager-operator csr-signer
openshift-kube-controller-manager kube-controller-manager-client-cert-key
openshift-kube-apiserver-operator aggregator-client-signer
openshift-kube-apiserver aggregator-client
openshift-kube-apiserver external-loadbalancer-serving-certkey
openshift-kube-apiserver internal-loadbalancer-serving-certkey
openshift-kube-apiserver service-network-serving-certkey
openshift-config-managed kube-controller-manager-client-cert-key
openshift-config-managed kube-scheduler-client-cert-key
openshift-kube-scheduler kube-scheduler-client-cert-key

Checking the Certs,  they are with one day expiry times, this is as expected.
# ./check_secret_expiry.sh certs.txt
Check cert dates of csr-signer-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:41:38 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of csr-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:52:21 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
notBefore=Jun 27 04:41:37 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of aggregator-client in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of external-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of internal-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:49 2022 GMT
notAfter=Jul 27 04:52:50 2022 GMT

Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:28 2022 GMT
notAfter=Jul 27 04:52:29 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT
# 

# cat check_secret_expiry_within.sh
#!/usr/bin/env bash
# usage: ./check_secret_expiry_within.sh 1day # or 15min, 2days, 2day, 2month, 1year
WITHIN=${1:-24hours}
echo "Checking validity within $WITHIN ..."
oc get secret --insecure-skip-tls-verify -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | . != null and fromdateiso8601<='$( date --date="+$WITHIN" +%s )') | "\(.metadata.annotations."auth.openshift.io/certificate-not-before")  \(.metadata.annotations."auth.openshift.io/certificate-not-after")  \(.metadata.namespace)\t\(.metadata.name)"'

# ./check_secret_expiry_within.sh 1day
Checking validity within 1day ...
2022-06-27T04:41:37Z  2022-06-28T04:41:37Z  openshift-kube-apiserver-operator	aggregator-client-signer
2022-06-27T04:52:26Z  2022-06-28T04:41:37Z  openshift-kube-apiserver	aggregator-client
2022-06-27T04:52:21Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer
2022-06-27T04:41:38Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer-signer

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1545

Bug OCPBUGS-16373: arping routinely errors out in RHEL 9

View the Description View the linked PRs

Description of the problem:

In RHEL 8, the arping command (from iputils-s20180629) only returns 1 when used for duplicate address detection. In all other modes it returns 0 on success; 2 or -1 on error.

In RHEL 9, the arping command (from iputils 20210202) also returns 1 in other modes, essentially at random. (There is some kind of theory behind it, but even after multiple fixes to the logic it does not remotely work in any consistent way.)

How reproducible:

60-100% for individual arping commands

100% installation failure

Steps to reproduce:

Build the agent container using RHEL 9 as the base image
arping -c 10 -w 5 -I enp2s0 192.168.111.1; echo $?

Actual results:

arping returns 1

journal on the discovery ISO shows:

Jul 19 04:35:38 master-0 next_step_runne[3624]: time="19-07-2023 04:35:38" level=error msg="Error while processing 'arping' command" file="ipv4_arping_checker.go:28" error="exit status 1"

all hosts are marked invalid and install fails.

Expected results:

ideally arping returns 0

failing that, we should treat both 0 and 1 as success as previous versions of arping effectively did.

https://github.com/openshift/assisted-installer-agent/pull/576

Bug OCPBUGS-6354: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/175

Bug OCPBUGS-12483: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/191

Bug OCPBUGS-14812: Update OWNERS and OWNERS_ALIASES in external-resizer repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-resizer/pull/142

Bug OCPBUGS-16594: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/network-metrics-daemon/pull/79

Bug OCPBUGS-13635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1526

Bug OCPBUGS-14323: Change static manifest pod files permissions to 0600 to conform with CIS benchmarks

View the Description View the linked PRs

Refer to the CIS RedHat OpenShift Container Platform Benchmark PDF: https://drive.google.com/file/d/12o6O-M2lqz__BgmtBrfeJu1GA2SJ352c/view
1.1.7 Ensure that the etcd pod specification file permissions are set to 600 or more restrictive (Manual)
======================================================================================================
As per CIS v1.3 PDF permissions should be 600 with the following statement:
"The pod specification file is created on control plane nodes at /etc/kubernetes/manifests/etcd-member.yaml with permissions 644. Verify that the permissions are 600 or more restrictive."
But when I ran the following command it was showing 644 permissions

for i in $(oc get pods -n openshift-etcd -l app=etcd -o name | grep etcd )
do
echo "check pod $i"
oc rsh -n openshift-etcd $i \
stat -c %a /etc/kubernetes/manifests/etcd-pod.yaml
done

Story HOSTEDCP-926: Send metric when HO/CPO decide to skip cloud resource deletion

View the Description View the linked PRs

Context:

We currently convey cloud creds issues in ValidOIDCConfiguration and ValidAWSIdentityProvider conditions.

The HO relies on those https://github.com/openshift/hypershift/blob/9e4127055dd7be9cfe4fc8427c39cee27a86efcd/hypershift-operator/controllers/hostedcluster/internal/platform/aws/aws.go#L293

to decide if forcefully deletion should be applied and so potentially intentionally leaving resources behind in cloud. (E.g. use case: oidc creds where broken out of band).

The CPO relies on those to wait for deletion of guest cluster resources https://github.com/openshift/hypershift/blob/8596f7f131169a19c6a67dc6ce078c50467de648/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L284-L299

DoD:

When any of the cases above results in the "move kube deletion forward skipping cloud resource deletion" path we should send a metric so consumers / SREs have a sense and can use it to notify customers in conjunction with https://issues.redhat.com/browse/SDA-8613

https://github.com/openshift/hypershift/pull/2531

Bug MGMT-13643: [Staging] No size limitation for additional certificate

View the Description View the linked PRs

Description of the problem:

No limitation for Additional certificates UI field

How reproducible:

100%

Steps to reproduce:

1. create a cluster

2. On add host select 'Configure cluster-wide trusted certificates'

3. On Additional certificates, paste a big string

4. Generate Discovery ISO

Actual results:

UI send it to the BE

Expected results:

There should be a limitation on certificate field

https://github.com/openshift/assisted-service/pull/5226

Bug OCPBUGS-16166: Bump to kubernetes 1.27.4

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.27.4:

Changelog:
v1.27.4: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1273

Bug OCPBUGS-17433: Worker Latency Profile not changing kubelet nodeStatusUpdateFrequency

View the Description View the linked PRs

Description of problem:

I created a cluster with _workerLatencyProfile: LowUpdateSlowReaction_, then I edited the latencyProfile to MediumUpdateAverageReaction using documentation linked and this test case document below. Once I switched I waited for KubeControllerManager and KubeAPIServer to stop progressing/complete and noticed the nodeStatusUpdateFrequency under /etc/kubernetes/kubelet.conf does not change as expected

https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.kf4qxogy9r6
Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-31-181848

How reproducible:

100%

Steps to Reproduce:

1. Create cluster with LowUpdateSlowReaction manifest: Example: https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.22najgyaj9lh
2. Validate values of low update profile components 

$ oc debug node/<worker-node-name>
$ chroot /host 
$ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
  "nodeStatusUpdateFrequency": "1m0s",
$ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
        node-monitor-grace-period:
        - 5m0s
$ oc get KubeAPIServer -o yaml | grep -A 1 default-
        default-not-ready-toleration-seconds:
        - "60"
        Default-unreachable-toleration-seconds:
        - "60"
3. *oc edit nodes.config/cluster*
spec: 
  workerLatencyProfile: MediumUpdateAverageReaction
4. Wait for components to complete using 

oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5
and 
oc get KubeAPIServer -o yaml | grep -i workerlatency -A 5 -B 5

5. Validate medium component values, hitting error here

Actual results:

% oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
        node-monitor-grace-period:
        - 2m0s
prubenda@prubenda1-mac lrc % oc get KubeAPIServer -o yaml | grep -A 1 default-
        default-not-ready-toleration-seconds:
        - "60"
        default-unreachable-toleration-seconds:
        - "60"
sh-5.1# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
  "nodeStatusUpdateFrequency": "1m0s",

Expected results:

$ oc debug node/<worker-node-name>
$ chroot /host 
$ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
  "nodeStatusUpdateFrequency": "20s",
$ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
        node-monitor-grace-period:
        - 2m0s
$ oc get KubeAPIServer -o yaml | grep -A 1 default-
        default-not-ready-toleration-seconds:
        - "60"
        default-unreachable-toleration-seconds:
        - "60"

Additional info:

In the documentation it states that workers will go disabled while the change is being applied and I never saw that occur

https://github.com/openshift/machine-config-operator/pull/3846

Bug OCPBUGS-8523: OKD SCOS: remove workaround for rpm-ostree auth

View the Description View the linked PRs

Description of problem:

Due to rpm-ostree regression (OKD-63) MCO was copying /var/lib/kubelet/config.json into /run/ostree/auth.json on FCOS and SCOS. This breaks Assisted Installer flow, which starts with Live ISO and doesn't have /var/lib/kubelet/config.json

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3591

Bug OCPBUGS-12895: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/54

Bug OCPBUGS-14193: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/631

Story HOSTEDCP-943: Metrics to visualise components duration

View the Description View the linked PRs

Context:

As a SRE / cluster service / dev I'd like to have the ability to identify trends on the duration of granular components that belong to HC/NodePools and that might affect our SLOs, e.g etcd, infra, ignition, nodes.

DoD:

Add metrics to visualise components duration of transitions.

Start with a few and agree on the approach.

Follow up.

https://github.com/openshift/hypershift/pull/2348

Story HOSTEDCP-965: Remove friction for consumers to run dump cluster command

View the Description View the linked PRs

Add a page to our documentation to describe what information needs to be gathered in the case of a failure/bug.

Document how to use the `hypershift dump cluster` command.

Support impersonate flag to make it easier to run against prod envs.

https://github.com/openshift/hypershift/pull/2653

Task MGMT-13947: Revert assisted boot reporter service

View the Description View the linked PRs

We are investigating issues with storage usage in production. Reverting until we have a root cause

https://github.com/openshift/assisted-service/pull/5035

Bug OCPBUGS-11796: Allow installer to use existing Azure NSG during OpenShift IPI install

View the Description View the linked PRs

Description of problem:

In an install where users bring their networks they also bring their own NSGs. However, the installer still creates NSG. In Azure environments using the rule [1] below, users are prohibited from installing cluster, as the apiserver_in rule has the rule set as 0.0.0.0[2]. Having a rule in place where the users could define this before install would allow them to set this connectivity without having the inbound access 



[1] - Rule: Network Security Groups shall not allow rule with 0.0.0.0/Any Source/Destination IP Addresses - Custom Deny

[2] - https://github.com/openshift/installer/blob/master/data/data/azure/vnet/nsg.tf#L31

https://github.com/openshift/installer/pull/7094

Bug OCPBUGS-14352: E2e tests fails because OpenShift Pipelines operator could not be found

View the Description View the linked PRs

E2e tests fails because OpenShift Pipelines operator could not be found.

https://search.ci.openshift.org/?search=AssertionError&maxAge=336h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/console/pull/12872

Bug OCPBUGS-14419: Remove Tech Preview badge from the PAC List and details page

View the Description View the linked PRs

Description of problem:

Pipeline as a code has been GA for some time. So, we should remove the Tech preview badge from the PAC pages.

Version-Release number of selected component (if applicable):

4.13

https://github.com/openshift/console/pull/12888

Bug OCPBUGS-14869: Add timezone info in installer logs

View the Description View the linked PRs

Description of problem:

No timezone info in installer logs

Version-Release number of selected component (if applicable):

4.x

How reproducible:

100%

Steps to Reproduce:

1. openshift-install wait-for install-complete --dir=./foo
2.
3.

Actual results:

INFO Waiting up to 1h0m0s (until 4:52PM) for the cluster at https://api.ocp.example.local:6443 to initialize...

Expected results:

INFO Waiting up to 1h0m0s (until 4:52PM UTC) for the cluster at https://api.ocp.example.local:6443 to initialize...

Additional info:

https://github.com/openshift/installer/pull/7243

Bug OCPBUGS-8282: Disable netlink mode of netclass collector in Node Exporter.

View the Description View the linked PRs

Description of problem:

We should disable netlink mode of netclass collector in Node Exporter. The netlink mode of netclass collector is introduced in 4.13 into the Node Exporter. When using the netlink mode, several metrics become unavailable. So to avoid confusing our user when they upgrade the OCP cluster to a new version and find several metrics missing on the NICs.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Using default config of CMO, Node Exporter's netclass collector is running in netlink mode.
The argument `--collector.netclass.netlink` is present in the `node-exporter` container in `node-exporter` daemonset.

Expected results:

Using default config of CMO, Node Exporter's netclass collector is running in classic mode. 
The argument `--collector.netclass.netlink` is absent in the `node-exporter` container in `node-exporter` daemonset.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1912

Bug WRKLDS-665: [sig-scheduling] Investigate failing test: SchedulerPreemption [Serial] validates pod disruption condition is added to the preempted pod [Suite:openshift/conformance/serial]

View the Description View the linked PRs

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-origin-27694-nightly-4.13-e2e-aws-sdn-serial/1624973266693132288

this is a new test being added in 1.26, we'll be getting that after https://github.com/openshift/origin/pull/27694 merges

https://github.com/openshift/origin/pull/27874

Bug OCPBUGS-14169: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2616

Bug OCPBUGS-12783: Remove reference to "action" descriptors in the OLM Descriptor readme

View the Description View the linked PRs

The OLM descriptors README references an "action" descriptor that was never implemented. This needs to be removed to eliminate confusion.

https://github.com/openshift/console/blob/master/frontend/packages/operator-lifecycle-manager/src/components/descriptors/README.md#:~:text=Action,on%20an%20object

https://github.com/openshift/console/pull/12800

Bug OCPBUGS-19903: kubevirt hypershift platform lacks live migration conformance test

View the Description View the linked PRs

Description of problem:

I have to create this OCPBUG in order to backport a test to the 4.14 branch.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28281

Bug OCPBUGS-10137: Update 4.14 prometheus-operator-admission-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/222

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/222

Bug OCPBUGS-11046: TuningCNI cnf-test failure: sysctl allowlist update

View the Description View the linked PRs

Description of problem:

The following test is permafeailing in Prow CI:
[tuningcni] sysctl allowlist update [It] should start a pod with custom sysctl only after adding sysctl to allowlist

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-kni-cnf-features-deploy-master-e2e-gcp-ovn-periodic/1640987392103944192


[tuningcni]
9915/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:26
9916  sysctl allowlist update
9917  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:141
9918    should start a pod with custom sysctl only after adding sysctl to allowlist
9919    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:156
9920  > Enter [BeforeEach] [tuningcni] - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/pkg/execute/ginkgo.go:9 @ 03/29/23 10:08:49.855
9921  < Exit [BeforeEach] [tuningcni] - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/pkg/execute/ginkgo.go:9 @ 03/29/23 10:08:49.855 (0s)
9922  > Enter [BeforeEach] sysctl allowlist update - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:144 @ 03/29/23 10:08:49.855
9923  < Exit [BeforeEach] sysctl allowlist update - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:144 @ 03/29/23 10:08:49.896 (41ms)
9924  > Enter [It] should start a pod with custom sysctl only after adding sysctl to allowlist - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:156 @ 03/29/23 10:08:49.896
9925  [FAILED] Unexpected error:
9926      <*errors.errorString | 0xc00044eec0>: {
9927          s: "timed out waiting for the condition",
9928      }
9929      timed out waiting for the condition
9930  occurred9931  In [It] at: /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:186 @ 03/29/23 10:09:53.377

Version-Release number of selected component (if applicable):

master (4.14)

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Test fails

Expected results:

Test passes

Additional info:

PR https://github.com/openshift-kni/cnf-features-deploy/pull/1445 adds some useful information to the reported archive.

Story AGENT-275: Installer command to output the internal agent dependency graph

View the Description View the linked PRs

The installer offers a graph command to output its internal dependency graph. It could be useful to have a similar command, ie agent graph to output the specific agent dependency graph

https://github.com/openshift/installer/pull/7066

Bug OCPBUGS-11057: Importing a kn Service shows a non-working Open URL decorator also when the Add Route checkbox was unselected

View the Description View the linked PRs

Description of problem:
When import a Serverless Service from a git repository the topology shows an Open URL decorator also when "Add Route" checkbox was unselected (which is selected by default).

The created kn Route makes the Service available within the cluster and the created URL looks like this: http://nodeinfo-private.serverless-test.svc.cluster.local

So the Service is NOT accidentally exposed. It's "just" that we link an internal route that will not be accessible to the user.

This might happen also for Serverless functions import flow and the import container image import flow.

Version-Release number of selected component (if applicable):
Tested older versions and could see this at least on 4.10+

How reproducible:
Always

Steps to Reproduce:

Install the OpenShift Serverless operator and create the required kn Serving resource.
Navigate to the Developer perspective > Add > Import from Git
Enter a git repository (like https://gitlab.com/jerolimov/nodeinfo
Unselect "Add Route" and press Create

Actual results:
The topology shows the new kn Service with a Open URL decorator on the top right corner.

The button is clickable but the target page could not be opened (as expected).

Expected results:
The topology should not show an Open URL decorator for "private" kn Routes.

The topology sidebar shows similar information, we should maybe release the Link there as well with a Text+Copy button???

A fix should be tested as well with Serverless functions as container images!

Additional info:
When the user unselects the "Add route" option an additional label is added to the kn Service. This label could also be added and removed later. When this label is specified the Open URL decorator should not be shown:

metadata:
  labels:
    networking.knative.dev/visibility: cluster-local

4.14.0-rc.5

Changes from 4.13.53

Complete Features

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Design Doc:

Problem:

Goal

Why is it important?

Use cases:

Acceptance criteria:

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

< High-Level description of the feature ie: Executive Summary >

Goals

Requirements

(Optional) Use Cases

Out of scope

Dependencies

Assumptions

Customer Considerations

Documentation Considerations

What does success look like?

QE Contact

Impact

Related Architecture/Technical Documents

Done Checklist

Problem:

Goal:

Why is it important?

Acceptance criteria:

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

Epic Goal

Why is this important?

Acceptance Criteria (Mandatory)

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Feature Overview

Goals

Epic Goal

Why is this important?