This project is mirrored from https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10.git.
Pull mirroring updated .
- Apr 08, 2025
-
-
Julio Faracco authored
Signed-off-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/591 JIRA: https://issues.redhat.com/browse/RHEL-82297 Testing done: basic host tests, selftests, kvm-unit-tests, LM. This is part 1 of the rebase for kvm/arm64. Omitted due to dependency issues: 59419f10045b KVM: arm64: Eagerly switch ZCR_EL{1,2} f9dd00de1e53 KVM: arm64: Mark some header functions as inline 9b66195063c5 KVM: arm64: Refactor exit handlers ee14db31a9c8 KVM: arm64: Refactor CPTR trap deactivation 407a99c4654e KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN 459f059be702 KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN 8eca7f6d5100 KVM: arm64: Remove host FPSIMD saving for non-protected KVM 6f91d31d47c5 KVM: arm64: Drop pkvm_mem_transition for host/hyp donations 7cbf7c37718e KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing 7a0688832f58 KVM: arm64: Drop pkvm_mem_transition for FF-A fd22af17a458 KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1 aac64ad36955 KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm 41d6028e28bd KVM: arm64: Convert the SVE guest vcpu flag to a vm flag e5ecedcd7cc2 arm64/sysreg: Get rid of CPACR_ELx SysregFields 7052e808c446 arm64/sysreg: Get rid of the TCR2_EL1x SysregFields a5c870d0939b KVM: arm64: Drop useless struct s2_mmu in __kvm_at_s1e2() 570d666c11af KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace 85c7869e30b7 KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts 28991c91d577 KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock cccefb0a0d3b KVM: Drop unused "hva" pointer from __gfn_to_pfn_memslot() e2d2ca71ac03 KVM: Drop @atomic param from gfn=>pfn and hva=>pfn APIs Omitted-fix: 31208bad3937 arm64/fpsimd: Remove unused declaration fpsimd_kvm_prepare() Signed-off-by:
Sebastian Ott <sebott@redhat.com> Approved-by:
Eric Auger <eric.auger@redhat.com> Approved-by:
Cornelia Huck <cohuck@redhat.com> Approved-by:
Gavin Shan <gshan@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Approved-by:
Shaoqin Huang <shahuang@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/651 JIRA: https://issues.redhat.com/browse/RHEL-76477 Enable CONFIG_VDPA_USER on RHEL-10 Upstream Status: RHEL only tested by me Signed-off-by:
Cindy Lu <lulu@redhat.com> Approved-by:
Eugenio Pérez <eperezma@redhat.com> Approved-by:
Jason Wang <jasowang@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
- Apr 02, 2025
-
-
Julio Faracco authored
Signed-off-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/638 ``` This series updates the kernel's driver core 'sysfs' subsystem with changes and preparations for more sysfs api cleanups that can come through all driver trees after this is out. With the above in place, also include the PCI tree's changes to Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI hotplug driver. JIRA: https://issues.redhat.com/browse/RHEL-85241 Signed-off-by:
Myron Stowe <mstowe@redhat.com> ``` Approved-by:
Rafael Aquini <raquini@redhat.com> Approved-by:
David Arcari <darcari@redhat.com> Approved-by:
Jocelyn Falempe <jfalempe@redhat.com> Approved-by:
John W. Linville <linville@redhat.com> Approved-by:
Mika Penttilä <mpenttil@redhat.com> Approved-by:
Corinna Vinschen <vinschen@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Approved-by:
Kamal Heib <kheib@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/411 JIRA: https://issues.redhat.com/browse/RHEL-24185 Update /proc/schedstat with fixes and improved information from upstream. AMD requested these and they don't carry a large risk. Signed-off-by:
Phil Auld <pauld@redhat.com> Approved-by:
Juri Lelli <juri.lelli@redhat.com> Approved-by:
Waiman Long <longman@redhat.com> Approved-by:
Rafael Aquini <raquini@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/634 JIRA: https://issues.redhat.com/browse/RHEL-82993 Upstream status: RHEL only We use special handling for BPF selftests variants. Add support for the cpuv4 variant. Signed-off-by:
Viktor Malik <vmalik@redhat.com> Approved-by:
Jan Stancek <jstancek@redhat.com> Approved-by:
Jerome Marchand <jmarchan@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/629 JIRA: https://issues.redhat.com/browse/RHEL-84541 Upstream Status: net.git commit 546d98393abc commit 546d98393abcf2f841e61163d95ed21fde346cc1 Author: Leon Romanovsky <leon@kernel.org> Date: Mon Feb 3 14:59:23 2025 +0200 bonding: delete always true device check XFRM API makes sure that xs->xso.dev is valid in all XFRM offload callbacks. There is no need to check it again. Signed-off-by:
Leon Romanovsky <leonro@nvidia.com> Acked-by:
Paolo Abeni <pabeni@redhat.com> Reviewed-by:
Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/0b2f8f5f09701bb43bbd83b94bfe5cb506b57adc.1738587150.git.leon@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Hangbin Liu <haliu@redhat.com> Approved-by:
Florian Westphal <fwestpha@redhat.com> Approved-by:
Xin Long <lxin@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/594 JIRA: https://issues.redhat.com/browse/RHEL-82698 CVE: CVE-2025-21837 Signed-off-by:
Jeff Moyer <jmoyer@redhat.com> Approved-by:
Ewan D. Milne <emilne@redhat.com> Approved-by:
Brian Foster <bfoster@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/495 JIRA: https://issues.redhat.com/browse/RHEL-81723 * 2afd96a4a0b1d62c7a44227e535b073926d73368 ALSA: hda/tas2781: Update tas2781 hda SPI driver * 325735e83d7d0016e7b61069df2570e910898466 ALSA: hda/tas2781: Fix index issue in tas2781 hda SPI driver Signed-off-by:
CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> --- <small>Created 2025-02-28 13:13 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small > Approved-by:
Jaroslav Kysela <jkysela@redhat.com> Approved-by:
Tony Camuso <tcamuso@redhat.com> Approved-by:
John W. Linville <linville@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/462 JIRA: https://issues.redhat.com/browse/RHEL-80968 This is a backport of the patch series "Add support for AArch64 AMUv1-based average freq" (https://lore.kernel.org/all/20250131162439.3843071-1-beata.michalska@arm.com/ ) and its dependencies. The first patch in this MR is upstream in v6.14 (since rc3). The rest are now in Linus's master branch and slated for 6.15. Signed-off-by:
Jennifer Berringer <jberring@redhat.com> Approved-by:
David Arcari <darcari@redhat.com> Approved-by:
Mark Langsdorf <mlangsdo@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/619 Revert "cxl/acpi: Fix load failures due to single window creation failure" This commit was pinpointed as the cause of CXL being unavailable on certain Samsung machines. ## Approved Development Ticket(s) All submissions to CentOS Stream must reference a ticket in [Red Hat Jira](https://issues.redhat.com/). JIRA: https://issues.redhat.com/browse/RHEL-82540 Resolves: RHEL-82540 Signed-off-by:
John W. Linville <linville@redhat.com> Approved-by:
Myron Stowe <mstowe@redhat.com> Approved-by:
Lenny Szubowicz <lszubowi@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/457 JIRA: https://issues.redhat.com/browse/RHEL-80216 A small series of livepatch fixes for rhel-10.1 Signed-off-by:
Ryan Sullivan <rysulliv@redhat.com> Approved-by:
Joe Lawrence <joe.lawrence@redhat.com> Approved-by:
Rado Vrbovsky <rvrbovsk@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/648 ``` PCI updates from upstream v6.14: "Enumeration: - Make pci_stop_dev() and pci_destroy_dev() safe so concurrent callers can't stop a device multiple times, even as we migrate from the global pci_rescan_remove_lock to finer-grained locking (Keith Busch) - Improve pci_walk_bus() implementation by making it recursive and moving locking up to avoid need for a 'locked' parameter (Keith Busch) - Unexport pci_walk_bus_locked(), which is only used internally by the PCI core (Keith Busch) - Detect some Thunderbolt chips that are built-in and hence 'trustworthy' by a heuristic since the 'ExternalFacingPort' and 'usb4-host-interface' ACPI properties are not quite enough (Esther Shimanovich) Resource management: - Use PCI bus addresses (not CPU addresses) in 'ranges' properties when building dynamic DT nodes so systems where PCI and CPU addresses differ work correctly (Andrea della Porta) - Tidy resource sizing and assignment with helpers to reduce redundancy (Ilpo Järvinen) - Improve pdev_sort_resources() 'bogus alignment' warning to be more specific (Ilpo Järvinen) Driver binding: - Convert driver .remove_new() callbacks to .remove() again to finish the conversion from returning 'int' to being 'void' (Sergio Paracuellos) - Export pcim_request_all_regions(), a managed interface to request all BARs (Philipp Stanner) - Replace pcim_iomap_regions_request_all() with pcim_request_all_regions(), and pcim_iomap_table()[n] with pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212 (Philipp Stanner) - Remove the now unused pcim_iomap_regions_request_all() (Philipp Stanner) - Export pcim_iounmap_region(), a managed interface to unmap and release a PCI BAR (Philipp Stanner) - Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield, cavium (Philipp Stanner) Error handling: - Add sysfs 'reset_subordinate' to reset the entire hierarchy below a bridge; previously Secondary Bus Reset could only be used when there was a single device below a bridge (Keith Busch) - Warn if we reset a running device where the driver didn't register pci_error_handlers notification callbacks (Keith Busch) ASPM: - Disable ASPM L1 before touching L1 PM Substates to follow the spec closer and avoid a CPU load timeout on some platforms (Ajay Agarwal) - Set devices below Intel VMD to D0 before enabling ASPM L1 Substates as required per spec for all L1 Substates changes (Jian-Hong Pan) Power management: - Enable starfive controller runtime PM before probing host bridge (Mayank Rana) - Enable runtime power management for host bridges (Krishna chaitanya chundru) Power control: - Use of_platform_device_create() instead of of_platform_populate() to create pwrctl platform devices so we can control it based on the child nodes (Manivannan Sadhasivam) - Create pwrctrl platform devices only if there's a relevant power supply property (Manivannan Sadhasivam) - Add device link from the pwrctl supplier to the PCI dev to ensure pwrctl drivers are probed before the PCI dev driver; this avoids a race where pwrctl could change device power state while the PCI driver was active (Manivannan Sadhasivam) - Find pwrctl device for removal with of_find_device_by_node() instead of searching all children of the parent (Manivannan Sadhasivam) - Rename 'pwrctl' to 'pwrctrl' to match new bandwidth controller ('bwctrl') and hotplug files (Bjorn Helgaas) Bandwidth control: - Add read/modify/write locking for Link Control 2, which is used to manage Link speed (Ilpo Järvinen) - Extract Link Bandwidth Management Status check into pcie_lbms_seen(), where it can be shared between the bandwidth controller and quirks that use it to help retrain failed links (Ilpo Järvinen) - Re-add Link Bandwidth notification support with updates to address the reasons it was previously reverted (Alexandru Gagniuc, Ilpo Järvinen) - Add pcie_set_target_speed() and related functionality so drivers can manage PCIe Link speed based on thermal or other constraints (Ilpo Järvinen) - Add a thermal cooling driver to throttle PCIe Links via the existing thermal management framework (Ilpo Järvinen) - Add a userspace selftest for the PCIe bandwidth controller (Ilpo Järvinen) PCI device hotplug: - Add hotplug controller driver for Marvell OCTEON multi-function device where function 0 has a management console interface to enable/disable and provision various personalities for the other functions (Shijith Thotton) - Retain a reference to the pci_bus for the lifetime of a pci_slot to avoid a use-after-free when the thunderbolt driver resets USB4 host routers on boot, causing hotplug remove/add of downstream docks or other devices (Lukas Wunner) - Remove unused cpcihp struct cpci_hp_controller_ops.hardware_test (Guilherme Giacomo Simoes) - Remove unused cpqphp struct ctrl_dbg.ctrl (Christophe JAILLET) - Use pci_bus_read_dev_vendor_id() instead of hand-coded presence detection in cpqphp (Ilpo Järvinen) - Simplify cpqphp enumeration, which is already simple-minded and doesn't handle devices below hot-added bridges (Ilpo Järvinen) Virtualization: - Add ACS quirk for Wangxun FF5xxx NICs, which don't advertise an ACS capability but do isolate functions as though PCI_ACS_RR and PCI_ACS_CR were set, so the functions can be in independent IOMMU groups (Mengyuan Lou) TLP Processing Hints (TPH): - Add and document TLP Processing Hints (TPH) support so drivers can enable and disable TPH and the kernel can save/restore TPH configuration (Wei Huang) - Add TPH Steering Tag support so drivers can retrieve Steering Tag values associated with specific CPUs via an ACPI _DSM to improve performance by directing DMA writes closer to their consumers (Wei Huang) Data Object Exchange (DOE): - Wait up to 1 second for DOE Busy bit to clear before writing a request to the mailbox to avoid failures if the mailbox is still busy from a previous transfer (Gregory Price) Endpoint framework: - Skip attempts to allocate from endpoint controller memory window if the requested size is larger than the window (Damien Le Moal) - Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to handle controller-specific size and alignment constraints, and add test cases to the endpoint test driver (Damien Le Moal) - Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can observe DWC-specific alignment requirements (Damien Le Moal) - Synchronously cancel command handler work in endpoint test before cleaning up DMA and BARs (Damien Le Moal) - Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas Cassel) - Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent (Niklas Cassel) - Avoid NULL dereference if Modem Host Interface Endpoint lacks 'mmio' DT property (Zhongqiu Han) - Release PCI domain ID of Endpoint controller parent (not controller itself) and before unregistering the controller, to avoid use-after-free (Zijun Hu) - Clear secondary (not primary) EPC in pci_epc_remove_epf() when removing the secondary controller associated with an NTB (Zijun Hu) Cadence PCIe controller driver: - Lower severity of 'phy-names' message (Bartosz Wawrzyniak) Freescale i.MX6 PCIe controller driver: - Fix suspend/resume support on i.MX6QDL, which has a hardware erratum that prevents use of L2 (Stefan Eichenberger) Intel VMD host bridge driver: - Add 0xb60b and 0xb06f Device IDs for client SKUs (Nirmal Patel) MediaTek PCIe Gen3 controller driver: - Update mediatek-gen3 DT binding to require the exact number of clocks for each SoC (Fei Shao) - Add support for DT 'max-link-speed' and 'num-lanes' properties to restrict the link speed and width (AngeloGioacchino Del Regno) Microchip PolarFlare PCIe controller driver: - Add DT and driver support for using either of the two PolarFire Root Ports (Conor Dooley) NVIDIA Tegra194 PCIe controller driver: - Move endpoint controller cleanups that depend on refclk from the host to the notifier that tells us the host has deasserted PERST#, when refclk should be valid (Manivannan Sadhasivam) Qualcomm PCIe controller driver: - Add qcom SAR2130P DT binding with an additional clock (Dmitry Baryshkov) - Enable MSI interrupts if 'global' IRQ is supported, since a previous commit unintentionally masked them (Manivannan Sadhasivam) - Move endpoint controller cleanups that depend on refclk from the host to the notifier that tells us the host has deasserted PERST#, when refclk should be valid (Manivannan Sadhasivam) - Add DT binding and driver support for IPQ9574, with Synopsys IP v5.80a and Qcom IP 1.27.0 (devi priya) - Move the OPP "operating-points-v2" table from the qcom,pcie-sm8450.yaml DT binding to qcom,pcie-common.yaml, where it can be used by other Qcom platforms (Qiang Yu) - Add 'global' SPI interrupt for events like link-up, link-down to qcom,pcie-x1e80100 DT binding so we can start enumeration when the link comes up (Qiang Yu) - Disable ASPM L0s for qcom,pcie-x1e80100 since the PHY is not tuned to support this (Qiang Yu) - Add ops_1_21_0 for SC8280X family SoC, which doesn't use the 'iommu-map' DT property and doesn't need BDF-to-SID translation (Qiang Yu) Rockchip PCIe controller driver: - Define ROCKCHIP_PCIE_AT_SIZE_ALIGN to replace magic 256 endpoint .align value (Damien Le Moal) - When unmapping an endpoint window, compute the region index instead of searching for it, and verify that the address was mapped (Damien Le Moal) - When mapping an endpoint window, verify that the address hasn't been mapped already (Damien Le Moal) - Implement pci_epc_ops.align_addr() for rockchip-ep (Damien Le Moal) - Fix MSI IRQ data mapping to observe the alignment constraint, which fixes intermittent page faults in memcpy_toio() and memcpy_fromio() (Damien Le Moal) - Rename rockchip_pcie_parse_ep_dt() to rockchip_pcie_ep_get_resources() for consistency with similar DT interfaces (Damien Le Moal) - Skip the unnecessary link train in rockchip_pcie_ep_probe() and do it only in the endpoint start operation (Damien Le Moal) - Implement pci_epc_ops.stop_link() to disable link training and controller configuration (Damien Le Moal) - Attempt link training at 5 GT/s when both partners support it (Damien Le Moal) - Add a handler for PERST# signal so we can detect host-initiated resets and start link training after PERST# is deasserted (Damien Le Moal) Synopsys DesignWare PCIe controller driver: - Clear outbound address on unmap so dw_pcie_find_index() won't match an ATU index that was already unmapped (Damien Le Moal) - Use of_property_present() instead of of_property_read_bool() when testing for presence of non-boolean DT properties (Rob Herring) - Advertise 1MB size if endpoint supports Resizable BARs, which was inadvertently lost in v6.11 (Niklas Cassel) TI J721E PCIe driver: - Add PCIe support for J722S SoC (Siddharth Vadapalli) - Delay PCIE_T_PVPERL_MS (100 ms), not just PCIE_T_PERST_CLK_US (100 us), before deasserting PERST# to ensure power and refclk are stable (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Set the 'ti,keystone-pcie' mode so v3.65a devices work in Root Complex mode (Kishon Vijay Abraham I) - Try to avoid unrecoverable SError for attempts to issue config transactions when the link is down; this is racy but the best we can do (Kishon Vijay Abraham I) Miscellaneous: - Reorganize kerneldoc parameter names to match order in function signature (Julia Lawall) - Fix sysfs reset_method_store() memory leak (Todd Kjos) - Simplify pci_create_slot() (Ilpo Järvinen) - Fix incorrect printf format specifiers in pcitest (Luo Yifan)" Related post v6.14 (v6.15) Fixes 4c8c0ffd41d1 PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by 2a93192d2058 misc: pci_endpoint_test: Fix pci_endpoint_test_bars_read_bar 86c2345aff3f tools/Makefile: Remove pci target Merge tag 'pci-v6.14-fixes-3' of git://git.kernel.org/../git/pci/pci https://lkml.org/lkml/2025/2/14/1616 commit 78a632a2086c5d5468b0e088a97b26e47c569567 Merge: 3f2ca7b8b33d 81f64e925c29 2 files changed, 5 insertions(+), 3 deletions(-) Merge tag 'pci-v6.14-fixes-2' of git://git.kernel.org/../git/pci/pci https://lkml.org/lkml/2025/2/6/1764 commit bb066fe812d6fb3a9d01c073d9f1e2fd5a63403b Merge: 5b734b49de8e 6f64b83d9fe9 2 files changed, 1 insertion(+), 4 deletions(-) Merge tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/../git/pci/pci https://lkml.org/lkml/2025/1/31/779 commit 0c0746f9dcd6f42e742d2813f9044e12f1497f8a Merge: 1b5f3c51fbb8 d555ed45a5a1 1 file changed, 19 insertions(+), 15 deletions(-) Merge tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/../pci/pci https://lkml.org/lkml/2025/01/24/940 commit 647d69605c70368d54fc012fce8a43e8e5955b04 Merge: 184a0997fb77 10ff5bbfd4b0 89 files changed, 2248 insertions(+), 1670 deletions(-) JIRA: https://issues.redhat.com/browse/RHEL-83611 Signed-off-by:
Myron Stowe <mstowe@redhat.com> ``` Approved-by:
John W. Linville <linville@redhat.com> Approved-by:
David Arcari <darcari@redhat.com> Approved-by:
Eric Chanudet <echanude@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/647 Description: Update drivers/platform/x86/pmc includes support for Arrow Lake U/H support and Panther Lake support JIRA: https://issues.redhat.com/browse/RHEL-47465 Build Info: 67156193 Tested: Successful platform test results on Intel (intel-pantherlake-h-02) system. Signed-off-by:
Steve Best <sbest@redhat.com> Approved-by:
Tony Camuso <tcamuso@redhat.com> Approved-by:
David Arcari <darcari@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/421 JIRA: https://issues.redhat.com/browse/RHEL-80508 CVE: CVE-2024-53216 Comment: CVE identifies a patch which is reverted in the same release in favor of a different fix. Four patches included in this MR fix the problem that the CVE in question described. Signed-off-by:
Olga Kornievskaia <okorniev@redhat.com> Approved-by:
Benjamin Coddington <bcodding@redhat.com> Approved-by:
Scott Mayhew <smayhew@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Julio Faracco authored
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/578 JIRA: https://issues.redhat.com/browse/RHEL-83372 CVE: CVE-2025-21857 ``` commit 071ed42cff4fcdd89025d966d48eabef59913bf2 Author: Pierre Riteau <pierre@stackhpc.com> Date: Thu Feb 13 23:36:10 2025 +0100 net/sched: cls_api: fix error handling causing NULL dereference tcf_exts_miss_cookie_base_alloc() calls xa_alloc_cyclic() which can return 1 if the allocation succeeded after wrapping. This was treated as an error, with value 1 returned to caller tcf_exts_init_ex() which sets exts->actions to NULL and returns 1 to caller fl_change(). fl_change() treats err == 1 as success, calling tcf_exts_validate_ex() which calls tcf_action_init() with exts->actions as argument, where it is dereferenced. Example trace: BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 114 PID: 16151 Comm: handler114 Kdump: loaded Not tainted 5.14.0-503.16.1.el9_5.x86_64 #1 RIP: 0010:tcf_action_init+0x1f8/0x2c0 Call Trace: tcf_action_init+0x1f8/0x2c0 tcf_exts_validate_ex+0x175/0x190 fl_change+0x537/0x1120 [cls_flower] Fixes: 80cd22c3 ("net/sched: cls_api: Support hardware miss to tc action") Signed-off-by:
Pierre Riteau <pierre@stackhpc.com> Reviewed-by:
Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250213223610.320278-1-pierre@stackhpc.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>```> Signed-off-by:
CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com> --- <small>Created 2025-03-13 07:56 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small > Approved-by:
Xin Long <lxin@redhat.com> Approved-by:
Antoine Tenart <atenart@redhat.com> Approved-by:
CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by:
Julio Faracco <jfaracco@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including: * Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger: https://issues.redhat.com/browse/RHEL-68997 * Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state. * The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory. Avoid these by eagerly saving and "flushing" the host's FPSIMD/SVE/SME state when loading a vCPU such that KVM does not need to save any of the host's FPSIMD/SVE/SME state. For clarity, fpsimd_kvm_prepare() is removed and the necessary call to fpsimd_save_and_flush_cpu_state() is placed in kvm_arch_vcpu_load_fp(). As 'fpsimd_state' and 'fpmr_ptr' should not be used, they are set to NULL; all uses of these will be removed in subsequent patches. Historical problems go back at least as far as v5.17, e.g. erroneous assumptions about TIF_SVE being clear in commit: 8383741a ("KVM: arm64: Get rid of host SVE tracking/saving") ... and so this eager save+flush probably needs to be backported to ALL stable trees. Fixes: 93ae6b01 ("KVM: arm64: Discard any SVE state when entering KVM guests") Fixes: 8c845e27 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Fixes: ef3be860 ("KVM: arm64: Add save/restore support for FPMR") Reported-by:
Eric Auger <eauger@redhat.com> Reported-by:
Wilco Dijkstra <wilco.dijkstra@arm.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Tested-by:
Mark Brown <broonie@kernel.org> Tested-by:
Eric Auger <eric.auger@redhat.com> Acked-by:
Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Fuad Tabba <tabba@google.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-2-mark.rutland@arm.com Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit fbc7e61195e23f744814e78524b73b59faa54ab4) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 While we have sanitisation in place for the guest sysregs, we lack that sanitisation out of reset. So some of the fields could be evaluated and not reflect their RESx status, which sounds like a very bad idea. Apply the RESx masks to the the sysreg file in two situations: - when going via a reset of the sysregs - after having computed the RESx masks Having this separate reset phase from the actual reset handling is a bit grotty, but we need to apply this after the ID registers are final. Tested-by:
Joey Gouly <joey.gouly@arm.com> Reviewed-by:
Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 36f998de853cfad60508dfdfb41c9c40a2245f19) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 A lot of the NV code depends on HCR_EL2.{E2H,TGE}, and we assume in places that at least HCR_EL2.E2H is invariant for a given guest. However, we make a point in *not* using the sanitising accessor that would enforce this, and are at the mercy of the guest doing stupid things. Clearly, that's not good. Rework the HCR_EL2 accessors to use __vcpu_sys_reg() instead, guaranteeing that the RESx settings get applied, specially when HCR_EL2.E2H is evaluated. This results in fewer accessors overall. Huge thanks to Joey who spent a long time tracking this bug down. Reported-by:
Joey Gouly <Joey.Gouly@arm.com> Tested-by:
Joey Gouly <joey.gouly@arm.com> Reviewed-by:
Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250112165029.1181056-2-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit c139b6d1b4d27724987af5071177fb5f3d60c1e4) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 When allocating guest stage-2 page-table pages at EL2, pKVM can consume pages from the host-provided kvm_hyp_memcache. As pgtable.c expects zeroed pages, guest_s2_zalloc_page() actively implements this zeroing with a PAGE_SIZE memset. Unfortunately, we don't check the page alignment of the host-provided address before doing so, which could lead to the memset overrunning the page if the host was malicious. Fix this by simply force-aligning all kvm_hyp_memcache allocations to page boundaries. Fixes: 60dfe093 ("KVM: arm64: Instantiate guest stage-2 page-tables at EL2") Reported-by:
Ben Simner <ben.simner@cl.cam.ac.uk> Signed-off-by:
Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250213153615.3642515-1-qperret@google.com Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit b938731ed2d4eea8e268a27bfc600581fedae2a9) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 To pick up the changes in this cset: 97413cea1c48cc05 ("KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Please see tools/include/uapi/README for further details. Reviewed-by:
James Clark <james.clark@linaro.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: kvmarm@lists.linux.dev Link: https://lore.kernel.org/r/20241203035349.1901262-6-namhyung@kernel.org Signed-off-by:
Namhyung Kim <namhyung@kernel.org> (cherry picked from commit 76e231997926c8f5707ed5f9c1f91377efe9e484) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 Ctx diff - missing ff5181d8a2a8 "arm64/gcs: Provide basic EL2 setup to allow GCS usage at EL0 and EL1": arch/arm64/include/asm/el2_setup.h When KVM is in protected mode, host calls to PSCI are proxied via EL2, and cold entries from CPU_ON, CPU_SUSPEND, and SYSTEM_SUSPEND bounce through __kvm_hyp_init_cpu() at EL2 before entering the host kernel's entry point at EL1. While __kvm_hyp_init_cpu() initializes SPSR_EL2 for the exception return to EL1, it does not initialize SCTLR_EL1. Due to this, it's possible to enter EL1 with SCTLR_EL1 in an UNKNOWN state. In practice this has been seen to result in kernel crashes after CPU_ON as a result of SCTLR_EL1.M being 1 in violation of the initial core configuration specified by PSCI. Fix this by initializing SCTLR_EL1 for cold entry to the host kernel. As it's necessary to write to SCTLR_EL12 in VHE mode, this initialization is moved into __kvm_host_psci_cpu_entry() where we can use write_sysreg_el1(). The remnants of the '__init_el2_nvhe_prepare_eret' macro are folded into its only caller, as this is clearer than having the macro. Fixes: cdf36719 ("KVM: arm64: Intercept host's CPU_ON SMCs") Reported-by:
Leo Yan <leo.yan@arm.com> Signed-off-by:
Ahmed Genidi <ahmed.genidi@arm.com> [ Mark: clarify commit message, handle E2H, move to C, remove macro ] Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Ahmed Genidi <ahmed.genidi@arm.com> Cc: Ben Horgan <ben.horgan@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by:
Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250227180526.1204723-3-mark.rutland@arm.com Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 3855a7b91d42ebf3513b7ccffc44807274978b3d) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 On CPUs without FEAT_E2H0, HCR_EL2.E2H is RES1, but may reset to an UNKNOWN value out of reset and consequently may not read as 1 unless it has been explicitly initialized. We handled this for the head.S boot code in commits: 3944382f ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") b3320142 ("arm64: Fix early handling of FEAT_E2H0 not being implemented") Unfortunately, we forgot to apply a similar fix to the KVM PSCI entry points used when relaying CPU_ON, CPU_SUSPEND, and SYSTEM SUSPEND. When KVM is entered via these entry points, the value of HCR_EL2.E2H may be consumed before it has been initialized (e.g. by the 'init_el2_state' macro). Initialize HCR_EL2.E2H early in these paths such that it can be consumed reliably. The existing code in head.S is factored out into a new 'init_el2_hcr' macro, and this is used in the __kvm_hyp_init_cpu() function common to all the relevant PSCI entry points. For clarity, I've tweaked the assembly used to check whether ID_AA64MMFR4_EL1.E2H0 is negative. The bitfield is extracted as a signed value, and this is checked with a signed-greater-or-equal (GE) comparison. As the hyp code will reconfigure HCR_EL2 later in ___kvm_hyp_init(), all bits other than E2H are initialized to zero in __kvm_hyp_init_cpu(). Fixes: 3944382f ("arm64: Treat HCR_EL2.E2H as RES1 when ID_AA64MMFR4_EL1.E2H0 is negative") Fixes: b3320142 ("arm64: Fix early handling of FEAT_E2H0 not being implemented") Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Ahmed Genidi <ahmed.genidi@arm.com> Cc: Ben Horgan <ben.horgan@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250227180526.1204723-2-mark.rutland@arm.com [maz: fixed LT->GE thinko] Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 7a68b55ff39b0a1638acb1694c185d49f6077a0d) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 Vladimir reports that a race condition to attach a VMID to a stage-2 MMU sometimes results in a vCPU entering the guest with a VMID of 0: | CPU1 | CPU2 | | | | kvm_arch_vcpu_ioctl_run | | vcpu_load <= load VTTBR_EL2 | | kvm_vmid->id = 0 | | | kvm_arch_vcpu_ioctl_run | | vcpu_load <= load VTTBR_EL2 | | with kvm_vmid->id = 0| | kvm_arm_vmid_update <= allocates fresh | | kvm_vmid->id and | | reload VTTBR_EL2 | | | | | kvm_arm_vmid_update <= observes that kvm_vmid->id | | already allocated, | | skips reload VTTBR_EL2 Oh yeah, it's as bad as it looks. Remember that VHE loads the stage-2 MMU eagerly but a VMID only gets attached to the MMU later on in the KVM_RUN loop. Even in the "best case" where VTTBR_EL2 correctly gets reprogrammed before entering the EL1&0 regime, there is a period of time where hardware is configured with VMID 0. That's completely insane. So, rather than decorating the 'late' binding with another hack, just allocate the damn thing up front. Attaching a VMID from vcpu_load() is still rollover safe since (surprise!) it'll always get called after a vCPU was preempted. Excuse me while I go find a brown paper bag. Cc: stable@vger.kernel.org Fixes: 934bf871 ("KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()") Reported-by:
Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250219220737.130842-1-oliver.upton@linux.dev Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit fa808ed4e199ed17d878eb75b110bda30dd52434) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 When not running in VHE mode, cpu_prepare_hyp_mode() computes the value of TCR_EL2 using the host's TCR_EL1 settings as a starting point. For nVHE, this amounts to masking out everything apart from the TG0, SH0, ORGN0, IRGN0 and T0SZ fields before setting the RES1 bits, shifting the IPS field down to the PS field and setting DS if LPA2 is enabled. Unfortunately, for hVHE, things go slightly wonky: EPD1 is correctly set to disable walks via TTBR1_EL2 but then the T1SZ and IPS fields are corrupted when we mistakenly attempt to initialise the PS and DS fields in their E2H=0 positions. Furthermore, many fields are retained from TCR_EL1 which should not be propagated to TCR_EL2. Notably, this means we can end up with A1 set despite not initialising TTBR1_EL2 at all. This has been shown to cause unexpected translation faults at EL2 with pKVM due to TLB invalidation not taking effect when running with a non-zero ASID. Fix the TCR_EL2 initialisation code to set PS and DS only when E2H=0, masking out HD, HA and A1 when E2H=1. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Fixes: ad744e8c ("arm64: Allow arm64_sw.hvhe on command line") Signed-off-by:
Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250214133724.13179-1-will@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 102c51c50db88aedd00a318b7708ad60dbec2e95) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 If userspace creates vcpus, then a vgic, we end-up in a situation where irqchip_in_kernel() will return true, but no private interrupt has been allocated for these vcpus. This situation will continue until userspace initialises the vgic, at which point we fix the early vcpus. Should a vcpu run or be initialised in the interval, bad things may happen. An obvious solution is to move this fix-up phase to the point where the vgic is created. This ensures that from that point onwards, all vcpus have their private interrupts, as new vcpus will directly allocate them. With that, we have the invariant that when irqchip_in_kernel() is true, all vcpus have their private interrupts. Reported-by:
Alexander Potapenko <glider@google.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250212182558.2865232-3-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit b3aa9283c0c505b5cfd25f7d6cfd720de2adc807) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 We currently spit out a warning if making a timer interrupt pending fails. But not only this is loud and easy to trigger from userspace, we also fail to do anything useful with that information. Dropping the warning is the easiest thing to do for now. We can always add error reporting if we really want in the future. Reported-by:
Alexander Potapenko <glider@google.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250212182558.2865232-2-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit e6e3e0022ef8f1d584ee4d5b89dca02472c5eb1f) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 Now that EL2 has gained some early timer emulation, it accesses the offsets pointed to by the timer structure, both of which live in the KVM structure. Of course, these are *kernel* pointers, so the dereferencing of these pointers in non-kernel code must be itself be offset. Given switch.h its own version of timer_get_offset() and use that instead. Fixes: b86fc215dc26d ("KVM: arm64: Handle counter access early in non-HYP context") Reported-by:
Linux Kernel Functional Testing <lkft@linaro.org> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Tested-by:
Anders Roxell <anders.roxell@linaro.org> Link: https://lore.kernel.org/r/20250212173454.2864462-1-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 65729da9ce37f5a2c62e2542ef03bc9ac6775a7d) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 At the end of kvm_arch_vcpu_load_fp() we check that no bits are set in SVCR. We only check this for protected mode despite this mattering equally for non-protected mode, and the comment above this is confusing. Remove the comment and simplify the check, moving from WARN_ON() to WARN_ON_ONCE() to avoid spamming the log. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 332b7e6d62b7a3a988017f5184e547aa20e3a19a) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 Don't use an uninitialised stack variable, and just return 0 on the non-error path. Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502100911.8c9DbtKD-lkp@intel.com/ Reviewed-by:
Quentin Perret <qperret@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 8dbccafce3c8ae026606f5c7bc6637667d9d5595) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 When the handling of a guest stage-2 permission fault races with an MMU notifier, the faulting page might be gone from the guest's stage-2 by the point we attempt to call (p)kvm_pgtable_stage2_relax_perms(). In the normal KVM case, this leads to returning -EAGAIN which user_mem_abort() handles correctly by simply re-entering the guest. However, the pKVM hypercall implementation has additional logic to check the page state using __check_host_shared_guest() which gets confused with absence of a page mapped at the requested IPA and returns -ENOENT, hence breaking user_mem_abort() and hilarity ensues. Luckily, several of the hypercalls for managing the stage-2 page-table of NP guests have no effect on the pKVM ownership tracking (wrprotect, test_clear_young, mkyoung, and crucially relax_perms), so the extra state checking logic is in fact not strictly necessary. So, to fix the discrepancy between standard KVM and pKVM, let's just drop the superfluous __check_host_shared_guest() logic from those hypercalls and make the extra state checking a debug assertion dependent on CONFIG_NVHE_EL2_DEBUG as we already do for other transitions. Signed-off-by:
Quentin Perret <qperret@google.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250207145438.1333475-3-qperret@google.com Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit eabc7aaef7a553b64bf6e631ce04526af6c8d104) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 The check_host_shared_guest() path expects to find a last-level valid PTE in the guest's stage-2 page-table. However, it checks the PTE's level before its validity, which makes it hard for callers to figure out what went wrong. To make error handling simpler, check the PTE's validity first. Signed-off-by:
Quentin Perret <qperret@google.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250207145438.1333475-2-qperret@google.com Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit c53fbdb60fb61fd6bda2bc0dc89837966625c5dc) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 The way we deal with the EL2 virtual timer is a bit odd. We try to cope with E2H being flipped, and adjust which offset applies to that timer depending on the current E2H value. But that's a complexity we shouldn't have to worry about. What we have to deal with is either E2H being RES1, in which case there is no offset, or E2H being RES0, and the virtual timer simply does not exist. Drop the adjusting of the timer offset, which makes things a bit simpler. At the same time, make sure that accessing the HV timer when E2H is RES0 results in an UNDEF in the guest. Suggested-by:
Oliver Upton <oliver.upton@linux.dev> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-4-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 0e459810285503fb354537e84049e212c5917c33) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 Both Wei-Lin Chang and Volodymyr Babchuk report that the way we handle the emulation of EL1 timers with NV is completely wrong, specially in the case of HCR_EL2.E2H==0. There are three problems in about as many lines of code: - With E2H==0, the EL1 timers are overwritten with the EL1 state, while they should actually contain the EL2 state (as per the timer map) - With E2H==1, we run the full EL1 timer emulation even when ECV is present, hiding a bug in timer_emulate() (see previous patch) - The comments are actively misleading, and say all the wrong things. This is only attributable to the code having been initially written for FEAT_NV, hacked up to handle FEAT_NV2 *in parallel*, and vaguely hacked again to be FEAT_NV2 only. Oh, and yours truly being a gold plated idiot. The fix is obvious: just delete most of the E2H==0 code, have a unified handling of the timers (because they really are E2H agnostic), and make sure we don't execute any of that when FEAT_ECV is present. Fixes: 4bad3068cfa9f ("KVM: arm64: nv: Sync nested timer state with FEAT_NV2") Reported-by:
Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Reported-by:
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Link: https://lore.kernel.org/r/fqiqfjzwpgbzdtouu2pwqlu7llhnf5lmy4hzv5vo6ph4v3vyls@jdcfy3fjjc5k Link: https://lore.kernel.org/r/87frl51tse.fsf@epam.com Tested-by:
Dmytro Terletskyi <dmytro_terletskyi@epam.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-3-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 1b8705ad5365b5333240b46d5cd24e88ef2ddb14) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 When updating the interrupt state for an emulated timer, we return early and skip the setup of a soft timer that runs in parallel with the guest. While this is OK if we have set the interrupt pending, it is pretty wrong if the guest moved CVAL into the future. In that case, no timer is armed and the guest can wait for a very long time (it will take a full put/load cycle for the situation to resolve). This is specially visible with EDK2 running at EL2, but still using the EL1 virtual timer, which in that case is fully emulated. Any key-press takes ages to be captured, as there is no UART interrupt and EDK2 relies on polling from a timer... The fix is simply to drop the early return. If the timer interrupt is pending, we will still return early, and otherwise arm the soft timer. Fixes: 4d74ecfa ("KVM: arm64: Don't arm a hrtimer for an already pending timer") Cc: stable@vger.kernel.org Tested-by:
Dmytro Terletskyi <dmytro_terletskyi@epam.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-2-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit b450dcce93bc2cf6d2bfaf5a0de88a94ebad8f89) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 For each vcpu that userspace creates, we allocate a number of s2_mmu structures that will eventually contain our shadow S2 page tables. Since this is a dynamically allocated array, we reallocate the array and initialise the newly allocated elements. Once everything is correctly initialised, we adjust pointer and size in the kvm structure, and move on. But should that initialisation fail *and* the reallocation triggered a copy to another location, we end-up returning early, with the kvm structure still containing the (now stale) old pointer. Weeee! Cure it by assigning the pointer early, and use this to perform the initialisation. If everything succeeds, we adjust the size. Otherwise, we just leave the size as it was, no harm done, and the new memory is as good as the ol' one (we hope...). Fixes: 4f128f8e ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Reported-by:
Alexander Potapenko <glider@google.com> Tested-by:
Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 5417a2e9b130a78bf48cb4cf92630efcee5ccf38) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 Protected mode assumes that at minimum vgic-v3 is present, however KVM fails to actually enforce this at the time of initialization. As such, when running protected mode in a half-baked state on GICv2 hardware we see the hyp go belly up at vcpu_load() when it tries to restore the vgic-v3 cpuif: $ ./arch_timer_edge_cases [ 130.599140] kvm [4518]: nVHE hyp panic at: [<ffff800081102b58>] __kvm_nvhe___vgic_v3_restore_vmcr_aprs+0x8/0x84! [ 130.603685] kvm [4518]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE [ 130.611962] kvm [4518]: Hyp Offset: 0xfffeca95ed000000 [ 130.617053] Kernel panic - not syncing: HYP panic: [ 130.617053] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000 [ 130.617053] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000 [ 130.617053] VCPU:0000000000000000 [ 130.638013] CPU: 0 UID: 0 PID: 4518 Comm: arch_timer_edge Tainted: G C 6.13.0-rc3-00009-gf7d03fcbf1f4 #1 [ 130.648790] Tainted: [C]=CRAP [ 130.651721] Hardware name: Libre Computer AML-S905X-CC (DT) [ 130.657242] Call trace: [ 130.659656] show_stack+0x18/0x24 (C) [ 130.663279] dump_stack_lvl+0x38/0x90 [ 130.666900] dump_stack+0x18/0x24 [ 130.670178] panic+0x388/0x3e8 [ 130.673196] nvhe_hyp_panic_handler+0x104/0x208 [ 130.677681] kvm_arch_vcpu_load+0x290/0x548 [ 130.681821] vcpu_load+0x50/0x80 [ 130.685013] kvm_arch_vcpu_ioctl_run+0x30/0x868 [ 130.689498] kvm_vcpu_ioctl+0x2e0/0x974 [ 130.693293] __arm64_sys_ioctl+0xb4/0xec [ 130.697174] invoke_syscall+0x48/0x110 [ 130.700883] el0_svc_common.constprop.0+0x40/0xe0 [ 130.705540] do_el0_svc+0x1c/0x28 [ 130.708818] el0_svc+0x30/0xd0 [ 130.711837] el0t_64_sync_handler+0x10c/0x138 [ 130.716149] el0t_64_sync+0x198/0x19c [ 130.719774] SMP: stopping secondary CPUs [ 130.723660] Kernel Offset: disabled [ 130.727103] CPU features: 0x000,00000800,02800000,0200421b [ 130.732537] Memory Limit: none [ 130.735561] ---[ end Kernel panic - not syncing: HYP panic: [ 130.735561] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000 [ 130.735561] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000 [ 130.735561] VCPU:0000000000000000 ]--- Fix it by failing KVM initialization if the system doesn't implement vgic-v3, as protected mode will never do anything useful on such hardware. Reported-by:
Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/5ca7588c-7bf2-4352-8661-e4a56a9cd9aa@sirena.org.uk/ Signed-off-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250203231543.233511-1-oliver.upton@linux.dev Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 32392e04cb50d87bb7a6a7d9213f44a1a0961820) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 The recent changes to debug state management broke self-hosted debug for guests when running in protected mode, since both the debug owner and the debug state itself aren't shared with the hyp's view of the vcpu. Fix it by flushing/syncing the relevant bits with the hyp vcpu. Fixes: beb470d96cec ("KVM: arm64: Use debug_owner to track if debug regs need save/restore") Reported-by:
Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/5f62740f-a065-42d9-9f56-8fb648b9c63f@sirena.org.uk/ Signed-off-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250131222922.1548780-3-oliver.upton@linux.dev Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 0f1a6c5c9784eff7e31e4915e17285fb89ad3644) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-
Sebastian Ott authored
JIRA: https://issues.redhat.com/browse/RHEL-82297 To determine CPU features during initialization, the nVHE hypervisor utilizes sanitized values of the host's CPU features registers. These values, stored in u64 idaa64*_el1_sys_val variables are updated by the kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility with the MMU off, the data cache needs to be flushed after these updates. However, individually flushing each variable using kvm_flush_dcache_to_poc() is inefficient. These cpu feature variables would be part of the bss section of the hypervisor. Hence, flush the entire bss section of hypervisor once the initialization is complete. Fixes: 6c30bfb1 ("KVM: arm64: Add handlers for protected VM System Registers") Suggested-by:
Fuad Tabba <tabba@google.com> Signed-off-by:
Lokesh Vutla <lokeshvutla@google.com> Link: https://lore.kernel.org/r/20250121044016.2219256-1-lokeshvutla@google.com Signed-off-by:
Marc Zyngier <maz@kernel.org> (cherry picked from commit 9bcbb6104a344d3526e185ee1e7b985509914e90) Signed-off-by:
Sebastian Ott <sebott@redhat.com>
-