summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/i40e
AgeCommit message (Collapse)Author
6 daysMerge tag 'pci-v6.19-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan Williams) - Switch vmd from custom domain number allocator to the common allocator to prevent a potential race with new non-VMD buses (Dan Williams) - Enable Precision Time Measurement (PTM) only if device advertises support for a relevant role, to prevent invalid PTM Requests that cause ACS violations that are reported as AER Uncorrectable Non-Fatal errors (Mika Westerberg) Resource management: - Prevent resource tree corruption when BAR resize fails (Ilpo Järvinen) - Restore BARs to the original size if a BAR resize fails (Ilpo Järvinen) - Remove BAR release from BAR resize attempts by the xe, i915, and amdgpu drivers so the PCI core can restore BARs if the resize fails (Ilpo Järvinen) - Move Resizable BAR code to rebar.c (Ilpo Järvinen) - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo Järvinen) - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo Järvinen) Power management and error handling: - For drivers using PCI legacy suspend, save config state at suspend so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - For devices with no driver or a driver that lacks power management, save config state at hibernate so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - Save device config space on device addition, before driver binding, so error recovery works more reliably (Lukas Wunner) - Drop pci_save_state() from several drivers that no longer need it since the PCI core always does it and pci_restore_state() no longer invalidates the saved state (Lukas Wunner) - Document use of pci_save_state() by drivers to capture the state they want restored during error recovery (Lukas Wunner) Power control: - Add a struct pci_ops.assert_perst() function pointer to assert/deassert PCIe PERST# and implement it for the qcom driver (Krishna Chaitanya Chundru) - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe switch, which must be held in reset after poweron so the pwrctrl driver can configure the switch via I2C before bringing up the links (Krishna Chaitanya Chundru) Endpoint framework: - Convert the endpoint doorbell test to use a threaded IRQ to fix a 'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri) - Add endpoint VNTB MSI doorbell support to reduce latency between host and endpoint (Frank Li) New native PCIe controller drivers: - Add CIX Sky1 host controller DT binding and driver (Hans Zhang) - Add NXP S32G host controller DT binding and driver (Vincent Guittot) - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu Beznea) - Add SpacemiT K1 host controller DT binding and driver (Alex Elder) Amlogic Meson PCIe controller driver: - Update DT binding to name DBI region 'dbi', not 'elbi', and update driver to support both (Manivannan Sadhasivam) Apple PCIe controller driver: - Move struct pci_host_bridge allocation from pci_host_common_init() to callers, which significantly simplifies pcie-apple (Marc Zyngier) Broadcom STB PCIe controller driver: - Disable advertising ASPM L0s support correctly (Jim Quinlan) - Add a panic/die handler to print diagnostic info in case PCIe caused an unrecoverable abort (Jim Quinlan) Cadence PCIe controller driver: - Add module support for Cadence platform host and endpoint controller driver (Manikandan K Pillai) - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare for new CIX Sky1 driver (Manikandan K Pillai) MediaTek PCIe controller driver: - Convert DT binding to YAML schema (Christian Marangi) - Add Airoha AN7583 DT compatible and driver support (Christian Marangi) Qualcomm PCIe controller driver: - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu) - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280, sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT schemas (Krzysztof Kozlowski) - Look up OPP using both frequency and data rate (not just frequency) so RPMh votes can account for both (Krishna Chaitanya Chundru) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi) STMicroelectronics STM32MP25 PCIe controller driver: - Fix a race between link training and endpoint register initialization (Christian Bruel) - Align endpoint allocations to match the ATU requirements (Christian Bruel) Synopsys DesignWare PCIe controller driver: - Clear L1 PM Substate Capability 'Supported' bits unless glue driver says it's supported, which prevents users from enabling non-working L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas) - Remove now-superfluous L1SS disable code from tegra194 (Bjorn Helgaas) - Configure L1SS support in dw-rockchip when DT says 'supports-clkreq' (Shawn Lin) TI Keystone PCIe controller driver: - Fail the probe instead of silently succeeding if ks_pcie_of_data didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli) - Make keystone buildable as a loadable module, except on ARM32 where hook_fault_code() is __init (Siddharth Vadapalli)" * tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits) MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer PCI: sky1: Add PCIe host support for CIX Sky1 dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings PCI: cadence: Add support for High Perf Architecture (HPA) controller MAINTAINERS: Add NXP S32G PCIe controller driver maintainer PCI: s32g: Add NXP S32G PCIe controller driver (RC) PCI: dwc: Add register and bitfield definitions dt-bindings: PCI: s32g: Add NXP S32G PCIe controller PCI: Add Renesas RZ/G3S host controller driver PCI: host-generic: Move bridge allocation outside of pci_host_common_init() dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding PCI: Validate pci_rebar_size_supported() input Documentation: PCI: Amend error recovery doc with pci_save_state() rules treewide: Drop pci_save_state() after pci_restore_state() PCI/ERR: Ensure error recoverability at all times PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths PCI: dw-rockchip: Configure L1SS support PCI: tegra194: Remove unnecessary L1SS disable code ...
14 daysi40e: extract GRXRINGS from .get_rxnfcBreno Leitao
Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to optimize RX ring queries") added specific support for GRXRINGS callback, simplifying .get_rxnfc. Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new .get_rx_ring_count(). This simplifies the RX ring count retrieval and aligns i40e with the new ethtool API for querying RX ring parameters. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20251125-gxring_intel-v2-1-f55cd022d28b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24i40e: delete a stray tabDan Carpenter
This return statement is indented one tab too far. Delete a tab. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/aSBqjtA8oF25G1OG@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24treewide: Drop pci_save_state() after pci_restore_state()Lukas Wunner
In 2009, commit c82f63e411f1 ("PCI: check saved state before restore") changed the behavior of pci_restore_state() such that it became necessary to call pci_save_state() afterwards, lest recovery from subsequent PCI errors fails. The commit has just been reverted and so all the pci_save_state() after pci_restore_state() calls that have accumulated in the tree are now superfluous. Drop them. Two drivers chose a different approach to achieve the same result: drivers/scsi/ipr.c and drivers/net/ethernet/intel/e1000e/netdev.c set the pci_dev's "state_saved" flag to true before calling pci_restore_state(). Drop this as well. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # qat Link: https://patch.msgid.link/c2b28cc4defa1b743cf1dedee23c455be98b397a.1760274044.git.lukas@wunner.de
2025-11-20devlink: pass extack through to devlink_param::get()Daniel Zahka
Allow devlink_param::get() handlers to report error messages via extack. This function is called in a few different contexts, but not all of them will have an valid extack to use. When devlink_param::get() is called from param_get_doit or param_get_dumpit contexts, pass the extack through so that drivers can report errors when retrieving param values. devlink_param::get() is called from the context of devlink_param_notify(), pass NULL in for the extack. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-2-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06i40e: support generic devlink param "max_mac_per_vf"Mohammad Heib
Currently the i40e driver enforces its own internally calculated per-VF MAC filter limit, derived from the number of allocated VFs and available hardware resources. This limit is not configurable by the administrator, which makes it difficult to control how many MAC addresses each VF may use. This patch adds support for the new generic devlink runtime parameter "max_mac_per_vf" which provides administrators with a way to cap the number of MAC addresses a VF can use: - When the parameter is set to 0 (default), the driver continues to use its internally calculated limit. - When set to a non-zero value, the driver applies this value as a strict cap for VFs, overriding the internal calculation. Important notes: - The configured value is a theoretical maximum. Hardware limits may still prevent additional MAC addresses from being added, even if the parameter allows it. - Since MAC filters are a shared hardware resource across all VFs, setting a high value may cause resource contention and starve other VFs. - This change gives administrators predictable and flexible control over VF resource allocation, while still respecting hardware limitations. - Previous discussion about this change: https://lore.kernel.org/netdev/20250805134042.2604897-2-dhill@redhat.com https://lore.kernel.org/netdev/20250823094952.182181-1-mheib@redhat.com Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-10-29i40e: avoid redundant VF link state updatesJay Vosburgh
Multiple sources can request VF link state changes with identical parameters. For example, OpenStack Neutron may request to set the VF link state to IFLA_VF_LINK_STATE_AUTO during every initialization or user can issue: `ip link set <ifname> vf 0 state auto` multiple times. Currently, the i40e driver processes each of these requests, even if the requested state is the same as the current one. This leads to unnecessary VF resets and can cause performance degradation or instability in the VF driver, particularly in environment using Data Plane Development Kit (DPDK). With this patch i40e will skip VF link state change requests when the desired link state matches the current configuration. This prevents unnecessary VF resets and reduces PF-VF communication overhead. To reproduce the problem run following command multiple times on the same interface: 'ip link set <ifname> vf 0 state auto' Every time command is executed, PF driver will trigger VF reset. Co-developed-by: Robert Malz <robert.malz@canonical.com> Signed-off-by: Robert Malz <robert.malz@canonical.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.17-rc8). Conflicts: drivers/net/can/spi/hi311x.c 6b6968084721 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled") 27ce71e1ce81 ("net: WQ_PERCPU added to alloc_workqueue users") https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org net/smc/smc_loopback.c drivers/dibs/dibs_loopback.c a35c04de2565 ("net/smc: fix warning in smc_rx_splice() when calling get_page()") cc21191b584c ("dibs: Move data path to dibs layer") https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org Adjacent changes: drivers/net/dsa/lantiq/lantiq_gswip.c c0054b25e2f1 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()") 7a1eaef0a791 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22net: WQ_PERCPU added to alloc_workqueue usersMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag at the network subsystem, to explicitly request the use of the per-CPU behavior. Both flags coexist for one release cycle to allow callers to transition their calls. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. All existing users have been updated accordingly. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20250918142427.309519-4-marco.crivellari@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18i40e: improve VF MAC filters accountingLukasz Czapnik
When adding new VM MAC, driver checks only *active* filters in vsi->mac_filter_hash. Each MAC, even in non-active state is using resources. To determine number of MACs VM uses, count VSI filters in *any* state. Add i40e_count_all_filters() to simply count all filters, and rename i40e_count_filters() to i40e_count_active_filters() to avoid ambiguity. Fixes: cfb1d572c986 ("i40e: Add ensurance of MacVlan resources for every trusted VF") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: add mask to apply valid bits for itr_idxLukasz Czapnik
The ITR index (itr_idx) is only 2 bits wide. When constructing the register value for QINT_RQCTL, all fields are ORed together. Without masking, higher bits from itr_idx may overwrite adjacent fields in the register. Apply I40E_QINT_RQCTL_ITR_INDX_MASK to ensure only the intended bits are set. Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: add max boundary check for VF filtersLukasz Czapnik
There is no check for max filters that VF can request. Add it. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: fix validation of VF state in get resourcesLukasz Czapnik
VF state I40E_VF_STATE_ACTIVE is not the only state in which VF is actually active so it should not be used to determine if a VF is allowed to obtain resources. Use I40E_VF_STATE_RESOURCES_LOADED that is set only in i40e_vc_get_vf_resources_msg() and cleared during reset. Fixes: 61125b8be85d ("i40e: Fix failed opcode appearing if handling messages from VF") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: fix input validation logic for action_metaLukasz Czapnik
Fix condition to check 'greater or equal' to prevent OOB dereference. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: fix idx validation in config queues msgLukasz Czapnik
Ensure idx is within range of active/initialized TCs when iterating over vf->ch[idx] in i40e_vc_config_queues_msg(). Fixes: c27eac48160d ("i40e: Enable ADq and create queue channel/s on VF") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Kamakshi Nellore <nellorex.kamakshi@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: fix idx validation in i40e_validate_queue_mapLukasz Czapnik
Ensure idx is within range of active/initialized TCs when iterating over vf->ch[idx] in i40e_validate_queue_map(). Fixes: c27eac48160d ("i40e: Enable ADq and create queue channel/s on VF") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Kamakshi Nellore <nellorex.kamakshi@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: add validation for ring_len paramLukasz Czapnik
The `ring_len` parameter provided by the virtual function (VF) is assigned directly to the hardware memory context (HMC) without any validation. To address this, introduce an upper boundary check for both Tx and Rx queue lengths. The maximum number of descriptors supported by the hardware is 8k-32. Additionally, enforce alignment constraints: Tx rings must be a multiple of 8, and Rx rings must be a multiple of 32. Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.17-rc7). No conflicts. Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en/fs.h 9536fbe10c9d ("net/mlx5e: Add PSP steering in local NIC RX") 7601a0a46216 ("net/mlx5e: Add a miss level for ipsec crypto offload") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16i40e: remove redundant memory barrier when cleaning Tx descsMaciej Fijalkowski
i40e has a feature which writes to memory location last descriptor successfully sent. Memory barrier in i40e_clean_tx_irq() was used to avoid forward-reading descriptor fields in case DD bit was not set. Having mentioned feature in place implies that such situation will not happen as we know in advance how many descriptors HW has dealt with. Besides, this barrier placement was wrong. Idea is to have this protection *after* reading DD bit from HW descriptor, not before. Digging through git history showed me that indeed barrier was before DD bit check, anyways the commit introducing i40e_get_head() should have wiped it out altogether. Also, there was one commit doing s/read_barrier_depends/smp_rmb when get head feature was already in place, but it was only theoretical based on ixgbe experiences, which is different in these terms as that driver has to read DD bit from HW descriptor. Fixes: 1943d8ba9507 ("i40e/i40evf: enable hardware feature head write back") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.17-rc6). Conflicts: net/netfilter/nft_set_pipapo.c net/netfilter/nft_set_pipapo_avx2.c c4eaca2e1052 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups") 84c1da7b38d9 ("netfilter: nft_set_pipapo: use avx2 algorithm for insertions too") Only trivial adjacent changes (in a doc and a Makefile). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-11net: xdp: pass full flags to xdp_update_skb_shared_info()Jakub Kicinski
xdp_update_skb_shared_info() needs to update skb state which was maintained in xdp_buff / frame. Pass full flags into it, instead of breaking it out bit by bit. We will need to add a bit for unreadable frags (even tho XDP doesn't support those the driver paths may be common), at which point almost all call sites would become: xdp_update_skb_shared_info(skb, num_frags, sinfo->xdp_frags_size, MY_PAGE_SIZE * num_frags, xdp_buff_is_frag_pfmemalloc(xdp), xdp_buff_is_frag_unreadable(xdp)); Keep a helper for accessing the flags, in case we need to transform them somehow in the future (e.g. to cover up xdp_buff vs xdp_frame differences). While we are touching call callers - rename the helper to xdp_update_skb_frags_info(), previous name may have implied that it's shinfo that's updated. We are updating flags in struct sk_buff based on frags that got attched. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20250905221539.2930285-2-kuba@kernel.org Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-09i40e: fix Jumbo Frame support after iPXE bootJacob Keller
The i40e hardware has multiple hardware settings which define the Maximum Frame Size (MFS) of the physical port. The firmware has an AdminQ command (0x0603) to configure the MFS, but the i40e Linux driver never issues this command. In most cases this is no problem, as the NVM default value has the device configured for its maximum value of 9728. Unfortunately, recent versions of the iPXE intelxl driver now issue the 0x0603 Set Mac Config command, modifying the MFS and reducing it from its default value of 9728. This occurred as part of iPXE commit 6871a7de705b ("[intelxl] Use admin queue to set port MAC address and maximum frame size"), a prerequisite change for supporting the E800 series hardware in iPXE. Both the E700 and E800 firmware support the AdminQ command, and the iPXE code shares much of the logic between the two device drivers. The ice E800 Linux driver already issues the 0x0603 Set Mac Config command early during probe, and is thus unaffected by the iPXE change. Since commit 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set"), the i40e driver does check the I40E_PRTGL_SAH register, but it only logs a warning message if its value is below the 9728 default. This register also only covers received packets and not transmitted packets. A warning can inform system administrators, but does not correct the issue. No interactions from userspace cause the driver to write to PRTGL_SAH or issue the 0x0603 AdminQ command. Only a GLOBR reset will restore the value to its default value. There is no obvious method to trigger a GLOBR reset from user space. To fix this, introduce the i40e_aq_set_mac_config() function, similar to the one from the ice driver. Call this during early probe to ensure that the device configuration matches driver expectation. Unlike E800, the E700 firmware also has a bit to control whether the MAC should append CRC data. It is on by default, but setting a 0 to this bit would disable CRC. The i40e implementation must set this bit to ensure CRC will be appended by the MAC. In addition to the AQ command, instead of just checking the I40E_PRTGL_SAH register, update its value to the 9728 default and write it back. This ensures that the hardware is in the expected state, regardless of whether the iPXE (or any other early boot driver) has modified this state. This is a better user experience, as we now fix the issues with larger MTU instead of merely warning. It also aligns with the way the ice E800 series driver works. A final note: The Fixes tag provided here is not strictly accurate. The issue occurs as a result of an external entity (the iPXE intelxl driver), and this is not a regression specifically caused by the mentioned change. However, I believe the original change to just warn about PRTGL_SAH being too low was an insufficient fix. Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") Link: https://github.com/ipxe/ipxe/commit/6871a7de705b6f6a4046f0d19da9bcd689c3bc8e Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-09i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error pathMichal Schmidt
If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration later than the first, the error path wants to free the IRQs requested so far. However, it uses the wrong dev_id argument for free_irq(), so it does not free the IRQs correctly and instead triggers the warning: Trying to free already-free IRQ 173 WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0 Modules linked in: i40e(+) [...] CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy) Hardware name: [...] RIP: 0010:__free_irq+0x192/0x2c0 [...] Call Trace: <TASK> free_irq+0x32/0x70 i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e] i40e_vsi_request_irq+0x79/0x80 [i40e] i40e_vsi_open+0x21f/0x2f0 [i40e] i40e_open+0x63/0x130 [i40e] __dev_open+0xfc/0x210 __dev_change_flags+0x1fc/0x240 netif_change_flags+0x27/0x70 do_setlink.isra.0+0x341/0xc70 rtnl_newlink+0x468/0x860 rtnetlink_rcv_msg+0x375/0x450 netlink_rcv_skb+0x5c/0x110 netlink_unicast+0x288/0x3c0 netlink_sendmsg+0x20d/0x430 ____sys_sendmsg+0x3a2/0x3d0 ___sys_sendmsg+0x99/0xe0 __sys_sendmsg+0x8a/0xf0 do_syscall_64+0x82/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]--- Use the same dev_id for free_irq() as for request_irq(). I tested this with inserting code to fail intentionally. Fixes: 493fb30011b3 ("i40e: Move q_vectors from pointer to array to array of pointers") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-02i40e: Fix potential invalid access when MAC list is emptyZhen Ni
list_first_entry() never returns NULL - if the list is empty, it still returns a pointer to an invalid object, leading to potential invalid memory access when dereferenced. Fix this by using list_first_entry_or_null instead of list_first_entry. Fixes: e3219ce6a775 ("i40e: Add support for client interface for IWARP driver") Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-02i40e: remove read access to debugfs filesJacob Keller
The 'command' and 'netdev_ops' debugfs files are a legacy debugging interface supported by the i40e driver since its early days by commit 02e9c290814c ("i40e: debugfs interface"). Both of these debugfs files provide a read handler which is mostly useless, and which is implemented with questionable logic. They both use a static 256 byte buffer which is initialized to the empty string. In the case of the 'command' file this buffer is literally never used and simply wastes space. In the case of the 'netdev_ops' file, the last command written is saved here. On read, the files contents are presented as the name of the device followed by a colon and then the contents of their respective static buffer. For 'command' this will always be "<device>: ". For 'netdev_ops', this will be "<device>: <last command written>". But note the buffer is shared between all devices operated by this module. At best, it is mostly meaningless information, and at worse it could be accessed simultaneously as there doesn't appear to be any locking mechanism. We have also recently received multiple reports for both read functions about their use of snprintf and potential overflow that could result in reading arbitrary kernel memory. For the 'command' file, this is definitely impossible, since the static buffer is always zero and never written to. For the 'netdev_ops' file, it does appear to be possible, if the user carefully crafts the command input, it will be copied into the buffer, which could be large enough to cause snprintf to truncate, which then causes the copy_to_user to read beyond the length of the buffer allocated by kzalloc. A minimal fix would be to replace snprintf() with scnprintf() which would cap the return to the number of bytes written, preventing an overflow. A more involved fix would be to drop the mostly useless static buffers, saving 512 bytes and modifying the read functions to stop needing those as input. Instead, lets just completely drop the read access to these files. These are debug interfaces exposed as part of debugfs, and I don't believe that dropping read access will break any script, as the provided output is pretty useless. You can find the netdev name through other more standard interfaces, and the 'netdev_ops' interface can easily result in garbage if you issue simultaneous writes to multiple devices at once. In order to properly remove the i40e_dbg_netdev_ops_buf, we need to refactor its write function to avoid using the static buffer. Instead, use the same logic as the i40e_dbg_command_write, with an allocated buffer. Update the code to use this instead of the static buffer, and ensure we free the buffer on exit. This fixes simultaneous writes to 'netdev_ops' on multiple devices, and allows us to remove the now unused static buffer along with removing the read access. Fixes: 02e9c290814c ("i40e: debugfs interface") Reported-by: Kunwu Chan <chentao@kylinos.cn> Closes: https://lore.kernel.org/intel-wired-lan/20231208031950.47410-1-chentao@kylinos.cn/ Reported-by: Wang Haoran <haoranwangsec@gmail.com> Closes: https://lore.kernel.org/all/CANZ3JQRRiOdtfQJoP9QM=6LS1Jto8PGBGw6y7-TL=BcnzHQn1Q@mail.gmail.com/ Reported-by: Amir Mohammad Jahangirzad <a.jahangirzad@gmail.com> Closes: https://lore.kernel.org/all/20250722115017.206969-1-a.jahangirzad@gmail.com/ Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-25Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== libie: commonize adminq structure Michal Swiatkowski says: It is a prework to allow reusing some specific Intel code (eq. fwlog). Move common *_aq_desc structure to libie header and changing it in ice, ixgbe, i40e and iavf. Only generic adminq commands can be easily moved to common header, as rest is slightly different. Format remains the same. It will be better to correctly move it when it will be needed to commonize other part of the code. Move *_aq_str() to new libie module (libie_adminq) and use it across drivers. The functions are exactly the same in each driver. Some more adminq helpers/functions can be moved to libie_adminq when needed. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: use libie_aq_str iavf: use libie_aq_str ice: use libie_aq_str libie: add adminq helper for converting err to str iavf: use libie adminq descriptors i40e: use libie adminq descriptors ixgbe: use libie adminq descriptors ice, libie: move generic adminq descriptors to lib ==================== Link: https://patch.msgid.link/20250724182826.3758850-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25net: Fix typosBjorn Helgaas
Fix typos in comments and error messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250723201528.2908218-1-helgaas@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc8). Conflicts: drivers/net/ethernet/microsoft/mana/gdma_main.c 9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion") 755391121038 ("net: mana: Allocate MSI-X vectors dynamically") https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/icssg/icssg_prueth.h 6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG") ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24i40e: use libie_aq_strMichal Swiatkowski
There is no need to store the err string in hw->err_str. Simplify it and use common helper. hw->err_str is still used for other purpouse. It should be marked that previously for unknown error the numeric value was passed as a string. Now the "LIBIE_AQ_RC_UNKNOWN" is used for such cases. Add libie_aminq module in i40e Kconfig. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-24i40e: use libie adminq descriptorsMichal Swiatkowski
Use libie_aq_desc instead of i40e_aq_desc. Do needed changes to allow clean build. Get version descriptor is a little less detailed on i40e. To not mess up with shifting or union inside libie desc use get version descriptor from i40e. Move additional caps for i40e to libie. Fix RCT in declaration that is using libie_aq_desc; Use libie_aq_raw() wherever it can be used. The libie aq error is extended, cover it in ice driver just to clean build. In next patches the libie code for that will be used in each of intel driver. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-21i40e: When removing VF MAC filters, only check PF-set MACJamie Bainbridge
When the PF is processing an Admin Queue message to delete a VF's MACs from the MAC filter, we currently check if the PF set the MAC and if the VF is trusted. This results in undesirable behaviour, where if a trusted VF with a PF-set MAC sets itself down (which sends an AQ message to delete the VF's MAC filters) then the VF MAC is erased from the interface. This results in the VF losing its PF-set MAC which should not happen. There is no need to check for trust at all, because an untrusted VF cannot change its own MAC. The only check needed is whether the PF set the MAC. If the PF set the MAC, then don't erase the MAC on link-down. Resolve this by changing the deletion check only for PF-set MAC. (the out-of-tree driver has also intentionally removed the check for VF trust here with OOT driver version 2.26.8, this changes the Linux kernel driver behaviour and comment to match the OOT driver behaviour) Fixes: ea2a1cfc3b201 ("i40e: Fix VF MAC filter removal") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-21i40e: report VF tx_dropped with tx_errors instead of tx_discardsDennis Chen
Currently the tx_dropped field in VF stats is not updated correctly when reading stats from the PF. This is because it reads from i40e_eth_stats.tx_discards which seems to be unused for per VSI stats, as it is not updated by i40e_update_eth_stats() and the corresponding register, GLV_TDPC, is not implemented[1]. Use i40e_eth_stats.tx_errors instead, which is actually updated by i40e_update_eth_stats() by reading from GLV_TEPC. To test, create a VF and try to send bad packets through it: $ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs $ cat test.py from scapy.all import * vlan_pkt = Ether(dst="ff:ff:ff:ff:ff:ff") / Dot1Q(vlan=999) / IP(dst="192.168.0.1") / ICMP() ttl_pkt = IP(dst="8.8.8.8", ttl=0) / ICMP() print("Send packet with bad VLAN tag") sendp(vlan_pkt, iface="enp2s0f0v0") print("Send packet with TTL=0") sendp(ttl_pkt, iface="enp2s0f0v0") $ ip -s link show dev enp2s0f0 16: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 3c:ec:ef:b7:e0:ac brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 0 0 0 0 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 vf 0 link/ether e2:c6:fd:c1:1e:92 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off RX: bytes packets mcast bcast dropped 0 0 0 0 0 TX: bytes packets dropped 0 0 0 $ python test.py Send packet with bad VLAN tag . Sent 1 packets. Send packet with TTL=0 . Sent 1 packets. $ ip -s link show dev enp2s0f0 16: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 3c:ec:ef:b7:e0:ac brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 0 0 0 0 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 vf 0 link/ether e2:c6:fd:c1:1e:92 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off RX: bytes packets mcast bcast dropped 0 0 0 0 0 TX: bytes packets dropped 0 0 0 A packet with non-existent VLAN tag and a packet with TTL = 0 are sent, but tx_dropped is not incremented. After patch: $ ip -s link show dev enp2s0f0 19: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 3c:ec:ef:b7:e0:ac brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 0 0 0 0 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 vf 0 link/ether 4a:b7:3d:37:f7:56 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off RX: bytes packets mcast bcast dropped 0 0 0 0 0 TX: bytes packets dropped 0 0 2 Fixes: dc645daef9af5bcbd9c ("i40e: implement VF stats NDO") Signed-off-by: Dennis Chen <dechen@redhat.com> Link: https://www.intel.com/content/www/us/en/content-details/596333/intel-ethernet-controller-x710-tm4-at2-carlsville-datasheet.html Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") af52020fc599 ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c a44312d58e78 ("net: phy: Don't register LEDs for genphy") f0f2b992d818 ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c 5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15ethernet: intel: fix building with large NR_CPUSArnd Bergmann
With large values of CONFIG_NR_CPUS, three Intel ethernet drivers fail to compile like: In function ‘i40e_free_q_vector’, inlined from ‘i40e_vsi_alloc_q_vectors’ at drivers/net/ethernet/intel/i40e/i40e_main.c:12112:3: 571 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) include/linux/rcupdate.h:1084:17: note: in expansion of macro ‘BUILD_BUG_ON’ 1084 | BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \ drivers/net/ethernet/intel/i40e/i40e_main.c:5113:9: note: in expansion of macro ‘kfree_rcu’ 5113 | kfree_rcu(q_vector, rcu); | ^~~~~~~~~ The problem is that the 'rcu' member in 'q_vector' is too far from the start of the structure. Move this member before the CPU mask instead, in all three drivers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-03i40e: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the Intel i40e driver to the new API, so that timestamping configuration can be removed from the ndo_eth_ioctl() path completely. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Milena Olech <milena.olech@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-18udp_tunnel: remove rtnl_lock dependencyStanislav Fomichev
Drivers that are using ops lock and don't depend on RTNL lock still need to manage it because udp_tunnel's RTNL dependency. Introduce new udp_tunnel_nic_lock and use it instead of rtnl_lock. Drop non-UDP_TUNNEL_NIC_INFO_MAY_SLEEP mode from udp_tunnel infra (udp_tunnel_nic_device_sync_work needs to grab udp_tunnel_nic_lock mutex and might sleep). Cover more places in v4: - netlink - udp_tunnel_notify_add_rx_port (ndo_open) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_notify_del_rx_port (ndo_stop) - triggers udp_tunnel_nic_device_sync_work - udp_tunnel_get_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - udp_tunnel_drop_rx_info (__netdev_update_features) - triggers NETDEV_UDP_TUNNEL_DROP_INFO - udp_tunnel_nic_reset_ntf (ndo_open) - notifiers - udp_tunnel_nic_netdevice_event, depending on the event: - triggers NETDEV_UDP_TUNNEL_PUSH_INFO - triggers NETDEV_UDP_TUNNEL_DROP_INFO - ethnl_tunnel_info_reply_size - udp_tunnel_nic_set_port_priv (two intel drivers) Cc: Michael Chan <michael.chan@broadcom.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Link: https://patch.msgid.link/20250616162117.287806-4-stfomichev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16eth: i40e: migrate to new RXFH callbacksJakub Kicinski
Migrate to new callbacks added by commit 9bb00786fc61 ("net: ethtool: add dedicated callbacks for getting and setting rxfh fields"). I'm deleting all the boilerplate kdoc from the affected functions. It is somewhere between pointless and incorrect, just a burden for people refactoring the code. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250614180907.4167714-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-12Merge tag 'net-6.16-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth and wireless. Current release - regressions: - af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD Current release - new code bugs: - eth: airoha: correct enable mask for RX queues 16-31 - veth: prevent NULL pointer dereference in veth_xdp_rcv when peer disappears under traffic - ipv6: move fib6_config_validate() to ip6_route_add(), prevent invalid routes Previous releases - regressions: - phy: phy_caps: don't skip better duplex match on non-exact match - dsa: b53: fix untagged traffic sent via cpu tagged with VID 0 - Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused transient packet loss, exact reason not fully understood, yet Previous releases - always broken: - net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6) - sched: sfq: fix a potential crash on gso_skb handling - Bluetooth: intel: improve rx buffer posting to avoid causing issues in the firmware - eth: intel: i40e: make reset handling robust against multiple requests - eth: mlx5: ensure FW pages are always allocated on the local NUMA node, even when device is configure to 'serve' another node - wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850, prevent kernel crashes - wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request() for 3 sec if fw_stats_done is not set" * tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context net: ethtool: Don't check if RSS context exists in case of context 0 af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD. ipv6: Move fib6_config_validate() to ip6_route_add(). net: drv: netdevsim: don't napi_complete() from netpoll net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get() veth: prevent NULL pointer dereference in veth_xdp_rcv net_sched: remove qdisc_tree_flush_backlog() net_sched: ets: fix a race in ets_qdisc_change() net_sched: tbf: fix a race in tbf_change() net_sched: red: fix a race in __red_change() net_sched: prio: fix a race in prio_tune() net_sched: sch_sfq: reject invalid perturb period net: phy: phy_caps: Don't skip better duplex macth on non-exact match MAINTAINERS: Update Kuniyuki Iwashima's email address. selftests: net: add test case for NAT46 looping back dst net: clear the dst when changing skb protocol net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper net/mlx5e: Fix leak of Geneve TLV option object net/mlx5: HWS, make sure the uplink is the last destination ...
2025-06-10i40e: retry VFLR handling if there is ongoing VF resetRobert Malz
When a VFLR interrupt is received during a VF reset initiated from a different source, the VFLR may be not fully handled. This can leave the VF in an undefined state. To address this, set the I40E_VFLR_EVENT_PENDING bit again during VFLR handling if the reset is not yet complete. This ensures the driver will properly complete the VF reset in such scenarios. Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF") Signed-off-by: Robert Malz <robert.malz@canonical.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-10i40e: return false from i40e_reset_vf if reset is in progressRobert Malz
The function i40e_vc_reset_vf attempts, up to 20 times, to handle a VF reset request, using the return value of i40e_reset_vf as an indicator of whether the reset was successfully triggered. Currently, i40e_reset_vf always returns true, which causes new reset requests to be ignored if a different VF reset is already in progress. This patch updates the return value of i40e_reset_vf to reflect when another VF reset is in progress, allowing the caller to properly use the retry mechanism. Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF") Signed-off-by: Robert Malz <robert.malz@canonical.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-09i40e: add link_down_events statisticDawid Osuchowski
Introduce a link_down_events counter to the i40e driver, incremented each time the link transitions from up to down. This counter can help diagnose issues related to link stability, such as port flapping or unexpected link drops. The value is exposed via ethtool's get_link_ext_stats() interface. Co-developed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-09net: intel: move RSS packet classifier types to libieJacob Keller
The Intel i40e, iavf, and ice drivers all include a definition of the packet classifier filter types used to program RSS hash enable bits. For i40e, these bits are used for both the PF and VF to configure the PFQF_HENA and VFQF_HENA registers. For ice and iAVF, these bits are used to communicate the desired hash enable filter over virtchnl via its struct virtchnl_rss_hashena. The virtchnl.h header makes no mention of where the bit definitions reside. Maintaining a separate copy of these bits across three drivers is cumbersome. Move the definition to libie as a new pctype.h header file. Each driver can include this, and drop its own definition. The ice implementation also defined a ICE_AVF_FLOW_FIELD_INVALID, intending to use this to indicate when there were no hash enable bits set. This is confusing, since the enumeration is using bit positions. A value of 0 *should* indicate the first bit. Instead, rewrite the code that uses ICE_AVF_FLOW_FIELD_INVALID to just check if the avf_hash is zero. From context this should be clear that we're checking if none of the bits are set. The values are kept as bit positions instead of encoding the BIT_ULL directly into their value. While most users will simply use BIT_ULL immediately, i40e uses the macros both with BIT_ULL and test_bit/set_bit calls. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-09net: intel: rename 'hena' to 'hashcfg' for clarityJacob Keller
i40e, ice, and iAVF all use 'hena' as a shorthand for the "hash enable" configuration. This comes originally from the X710 datasheet 'xxQF_HENA' registers. In the context of the registers the meaning is fairly clear. However, on its own, hena is a weird name that can be more difficult to understand. This is especially true in ice. The E810 hardware doesn't even have registers with HENA in the name. Replace the shorthand 'hena' with 'hashcfg'. This makes it clear the variables deal with the Hash configuration, not just a single boolean on/off for all hashing. Do not update the register names. These come directly from the datasheet for X710 and X722, and it is more important that the names can be searched. Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-04-11i40e: fix MMIO write access to an invalid page in i40e_clear_hwKyungwook Boo
When the device sends a specific input, an integer underflow can occur, leading to MMIO write access to an invalid page. Prevent the integer underflow by changing the type of related variables. Signed-off-by: Kyungwook Boo <bookyungwook@gmail.com> Link: https://lore.kernel.org/lkml/ffc91764-1142-4ba2-91b6-8c773f6f7095@gmail.com/T/ Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-10i40e: use generic unrolled_count() macroAlexander Lobakin
i40e, as well as ice, has a custom loop unrolling macro for unrolling Tx descriptors filling on XSk xmit. Replace i40e defs with generic unrolled_count(), which is also more convenient as it allows passing defines as its argument, not hardcoded values, while the loop declaration will still be a usual for-loop. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07i40e: add ability to reset VF for Tx and Rx MDD eventsAleksandr Loktionov
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs. This flag is also used in other network adapters like ICE. Usage: - "on" - The problematic VF will be automatically reset if a malformed descriptor is detected. - "off" - The problematic VF will be disabled. In cases where a VF sends malformed packets classified as malicious, it can cause the Tx queue to freeze, rendering it unusable for several minutes. When an MDD event occurs, this new implementation allows for a graceful VF reset to quickly restore operational state. Currently, VF queues are disabled if an MDD event occurs. This patch adds the ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD event logging throttling to avoid dmesg pollution and unifies the format of Tx and Rx MDD messages. Note: Standard message rate limiting functions like dev_info_ratelimited() do not meet our requirements. Custom rate limiting is implemented, please see the code for details. Co-developed-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Co-developed-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-13-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-06i40e: Remove unused i40e_dcb_hw_get_num_tcDr. David Alan Gilbert
The last useof i40e_dcb_hw_get_num_tc() was removed in 2022 by commit fe20371578ef ("Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250102173717.200359-10-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>