summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-07-17neighbour: Update pneigh_entry in pneigh_create().Kuniyuki Iwashima
neigh_add() updates pneigh_entry() found or created by pneigh_create(). This update is serialised by RTNL, but we will remove it. Let's move the update part to pneigh_create() and make it return errno instead of a pointer of pneigh_entry. Now, the pneigh code is RTNL free. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-16-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.Kuniyuki Iwashima
tbl->phash_buckets[] is only modified in the slow path by pneigh_create() and pneigh_delete() under the table lock. Both of them are called under RTNL, so no extra lock is needed, but we will remove RTNL from the paths. pneigh_create() looks up a pneigh_entry, and this part can be lockless, but it would complicate the logic like 1. lookup 2. allocate pengih_entry for GFP_KERNEL 3. lookup again but under lock 4. if found, return it after freeing the allocated memory 5. else, return the new one Instead, let's add a per-table mutex and run lookup and allocation under it. Note that updating pneigh_entry part in neigh_add() is still protected by RTNL and will be moved to pneigh_create() in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-15-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Remove __pneigh_lookup().Kuniyuki Iwashima
__pneigh_lookup() is the lockless version of pneigh_lookup(), but its only caller pndisc_is_router() holds the table lock and reads pneigh_netry.flags. This is because accessing pneigh_entry after pneigh_lookup() was illegal unless the caller holds RTNL or the table lock. Now, pneigh_entry is guaranteed to be alive during the RCU critical section. Let's call pneigh_lookup() and use READ_ONCE() for n->flags in pndisc_is_router() and remove __pneigh_lookup(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Free pneigh_entry after RCU grace period.Kuniyuki Iwashima
We will convert RTM_GETNEIGH to RCU. neigh_get() looks up pneigh_entry by pneigh_lookup() and passes it to pneigh_fill_info(). Then, we must ensure that the entry is alive till pneigh_fill_info() completes, but read_lock_bh(&tbl->lock) in pneigh_lookup() does not guarantee that. Also, we will convert all readers of tbl->phash_buckets[] to RCU. Let's use call_rcu() to free pneigh_entry and update phash_buckets[] and ->next by rcu_assign_pointer(). pneigh_ifdown_and_unlock() uses list_head to avoid overwriting ->next and moving RCU iterators to another list. pndisc_destructor() (only IPv6 ndisc uses this) uses a mutex, so it is not delayed to call_rcu(), where we cannot sleep. This is fine because the mcast code works with RCU and ipv6_dev_mc_dec() frees mcast objects after RCU grace period. While at it, we change the return type of pneigh_ifdown_and_unlock() to void. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Annotate neigh_table.phash_buckets and pneigh_entry.next with __rcu.Kuniyuki Iwashima
The next patch will free pneigh_entry with call_rcu(). Then, we need to annotate neigh_table.phash_buckets[] and pneigh_entry.next with __rcu. To make the next patch cleaner, let's annotate the fields in advance. Currently, all accesses to the fields are under the neigh table lock, so rcu_dereference_protected() is used with 1 for now, but most of them (except in pneigh_delete() and pneigh_ifdown_and_unlock()) will be replaced with rcu_dereference() and rcu_dereference_check(). Note that pneigh_ifdown_and_unlock() changes pneigh_entry.next to a local list, which is illegal because the RCU iterator could be moved to another list. This part will be fixed in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17neighbour: Split pneigh_lookup().Kuniyuki Iwashima
pneigh_lookup() has ASSERT_RTNL() in the middle of the function, which is confusing. When called with the last argument, creat, 0, pneigh_lookup() literally looks up a proxy neighbour entry. This is the case of the reader path as the fast path and RTM_GETNEIGH. pneigh_lookup(), however, creates a pneigh_entry when called with creat 1 from RTM_NEWNEIGH and SIOCSARP, which require RTNL. Let's split pneigh_lookup() into two functions. We will convert all the reader paths to RCU, and read_lock_bh(&tbl->lock) in the new pneigh_lookup() will be dropped. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17ethtool: rss: initial RSS_SET (indirection table handling)Jakub Kicinski
Add initial support for RSS_SET, for now only operations on the indirection table are supported. Unlike the ioctl don't check if at least one parameter is being changed. This is how other ethtool-nl ops behave, so pick the ethtool-nl consistency vs copying ioctl behavior. There are two special cases here: 1) resetting the table to defaults; 2) support for tables of different size. For (1) I use an empty Netlink attribute (array of size 0). (2) may require some background. AFAICT a lot of modern devices allow allocating RSS tables of different sizes. mlx5 can upsize its tables, bnxt has some "table size calculation", and Intel folks asked about RSS table sizing in context of resource allocation in the past. The ethtool IOCTL API has a concept of table size, but right now the user is expected to provide a table exactly the size the device requests. Some drivers may change the table size at runtime (in response to queue count changes) but the user is not in control of this. What's not great is that all RSS contexts share the same table size. For example a device with 128 queues enabled, 16 RSS contexts 8 queues in each will likely have 256 entry tables for each of the 16 contexts, while 32 would be more than enough given each context only has 8 queues. To address this the Netlink API should avoid enforcing table size at the uAPI level, and should allow the user to express the min table size they expect. To fully solve (2) we will need more driver plumbing but at the uAPI level this patch allows the user to specify a table size smaller than what the device advertises. The device table size must be a multiple of the user requested table size. We then replicate the user-provided table to fill the full device size table. This addresses the "allow the user to express the min table size" objective, while not enforcing any fixed size. From Netlink perspective .get_rxfh_indir_size() is now de facto the "max" table size supported by the device. We may choose to support table replication in ethtool, too, when we actually plumb this thru the device APIs. Initially I was considering moving full pattern generation to the kernel (which queues to use, at which frequency and what min sequence length). I don't think this complexity would buy us much and most if not all devices have pow-2 table sizes, which simplifies the replication a lot. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") af52020fc599 ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c a44312d58e78 ("net: phy: Don't register LEDs for genphy") f0f2b992d818 ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c 5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge tag 'net-6.16-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth, CAN, WiFi and Netfilter. More code here than I would have liked. That said, better now than next week. Nothing particularly scary stands out. The improvement to the OpenVPN input validation is a bit large but better get them in before the code makes it to a final release. Some of the changes we got from sub-trees could have been split better between the fix and -next refactoring, IMHO, that has been communicated. We have one known regression in a TI AM65 board not getting link. The investigation is going a bit slow, a number of people are on vacation. We'll try to wrap it up, but don't think it should hold up the release. Current release - fix to a fix: - Bluetooth: L2CAP: fix attempting to adjust outgoing MTU, it broke some headphones and speakers Current release - regressions: - wifi: ath12k: fix packets received in WBM error ring with REO LUT enabled, fix Rx performance regression - wifi: iwlwifi: - fix crash due to a botched indexing conversion - mask reserved bits in chan_state_active_bitmap, avoid FW assert() Current release - new code bugs: - nf_conntrack: fix crash due to removal of uninitialised entry - eth: airoha: fix potential UaF in airoha_npu_get() Previous releases - regressions: - net: fix segmentation after TCP/UDP fraglist GRO - af_packet: fix the SO_SNDTIMEO constraint not taking effect and a potential soft lockup waiting for a completion - rpl: fix UaF in rpl_do_srh_inline() for sneaky skb geometry - virtio-net: fix recursive rtnl_lock() during probe() - eth: stmmac: populate entire system_counterval_t in get_time_fn() - eth: libwx: fix a number of crashes in the driver Rx path - hv_netvsc: prevent IPv6 addrconf after IFF_SLAVE lost that meaning Previous releases - always broken: - mptcp: fix races in handling connection fallback to pure TCP - rxrpc: assorted error handling and race fixes - sched: another batch of "security" fixes for qdiscs (QFQ, HTB) - tls: always refresh the queue when reading sock, avoid UaF - phy: don't register LEDs for genphy, avoid deadlock - Bluetooth: btintel: check if controller is ISO capable on btintel_classify_pkt_type(), work around FW returning incorrect capabilities Misc: - make OpenVPN Netlink input checking more strict before it makes it to a final release - wifi: cfg80211: remove scan request n_channels __counted_by, it's only yielding false positives" * tag 'net-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) rxrpc: Fix to use conn aborts for conn-wide failures rxrpc: Fix transmission of an abort in response to an abort rxrpc: Fix notification vs call-release vs recvmsg rxrpc: Fix recv-recv race of completed call rxrpc: Fix irq-disabled in local_bh_enable() selftests/tc-testing: Test htb_dequeue_tree with deactivation and row emptying net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree net: bridge: Do not offload IGMP/MLD messages selftests: Add test cases for vlan_filter modification during runtime net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime tls: always refresh the queue when reading sock virtio-net: fix recursived rtnl_lock() during probe() net/mlx5: Update the list of the PCI supported devices hv_netvsc: Set VF priv_flags to IFF_NO_ADDRCONF before open to prevent IPv6 addrconf phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept() Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU netfilter: nf_conntrack: fix crash due to removal of uninitialised entry net: fix segmentation after TCP/UDP fraglist GRO ipv6: mcast: Delay put pmc->idev in mld_del_delrec() net: airoha: fix potential use-after-free in airoha_npu_get() ...
2025-07-17Merge tag 'for-net-2025-07-17' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: fix connectable extended advertising when using static random address - hci_core: fix typos in macros - hci_core: add missing braces when using macro parameters - hci_core: replace 'quirks' integer by 'quirk_flags' bitmap - SMP: If an unallowed command is received consider it a failure - SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout - L2CAP: Fix null-ptr-deref in l2cap_sock_resume_cb() - L2CAP: Fix attempting to adjust outgoing MTU - btintel: Check if controller is ISO capable on btintel_classify_pkt_type - btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID * tag 'for-net-2025-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmap Bluetooth: hci_core: add missing braces when using macro parameters Bluetooth: hci_core: fix typos in macros Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout Bluetooth: SMP: If an unallowed command is received consider it a failure Bluetooth: btintel: Check if controller is ISO capable on btintel_classify_pkt_type Bluetooth: hci_sync: fix connectable extended advertising when using static random address Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb() ==================== Link: https://patch.msgid.link/20250717142849.537425-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix notification vs call-release vs recvmsgDavid Howells
When a call is released, rxrpc takes the spinlock and removes it from ->recvmsg_q in an effort to prevent racing recvmsg() invocations from seeing the same call. Now, rxrpc_recvmsg() only takes the spinlock when actually removing a call from the queue; it doesn't, however, take it in the lead up to that when it checks to see if the queue is empty. It *does* hold the socket lock, which prevents a recvmsg/recvmsg race - but this doesn't prevent sendmsg from ending the call because sendmsg() drops the socket lock and relies on the call->user_mutex. Fix this by firstly removing the bit in rxrpc_release_call() that dequeues the released call and, instead, rely on recvmsg() to simply discard released calls (done in a preceding fix). Secondly, rxrpc_notify_socket() is abandoned if the call is already marked as released rather than trying to be clever by setting both pointers in call->recvmsg_link to NULL to trick list_empty(). This isn't perfect and can still race, resulting in a released call on the queue, but recvmsg() will now clean that up. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix recv-recv race of completed callDavid Howells
If a call receives an event (such as incoming data), the call gets placed on the socket's queue and a thread in recvmsg can be awakened to go and process it. Once the thread has picked up the call off of the queue, further events will cause it to be requeued, and once the socket lock is dropped (recvmsg uses call->user_mutex to allow the socket to be used in parallel), a second thread can come in and its recvmsg can pop the call off the socket queue again. In such a case, the first thread will be receiving stuff from the call and the second thread will be blocked on call->user_mutex. The first thread can, at this point, process both the event that it picked call for and the event that the second thread picked the call for and may see the call terminate - in which case the call will be "released", decoupling the call from the user call ID assigned to it (RXRPC_USER_CALL_ID in the control message). The first thread will return okay, but then the second thread will wake up holding the user_mutex and, if it sees that the call has been released by the first thread, it will BUG thusly: kernel BUG at net/rxrpc/recvmsg.c:474! Fix this by just dequeuing the call and ignoring it if it is seen to be already released. We can't tell userspace about it anyway as the user call ID has become stale. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge tag 'wireless-next-2025-07-17' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Another set of changes, notably: - cfg80211: fix double-free introduced earlier - mac80211: fix RCU iteration in CSA - iwlwifi: many cleanups (unused FW APIs, PCIe code, WoWLAN) - mac80211: some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - cfg/mac80211: improvements for unsolicated probe response handling * tag 'wireless-next-2025-07-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (64 commits) wifi: cfg80211: fix double free for link_sinfo in nl80211_station_dump() wifi: cfg80211: fix off channel operation allowed check for MLO wifi: mac80211: use RCU-safe iteration in ieee80211_csa_finish wifi: mac80211_hwsim: Update comments in header wifi: mac80211: parse unsolicited broadcast probe response data wifi: cfg80211: parse attribute to update unsolicited probe response template wifi: mac80211: don't use TPE data from assoc response wifi: mac80211: handle WLAN_HT_ACTION_NOTIFY_CHANWIDTH async wifi: mac80211: simplify __ieee80211_rx_h_amsdu() loop wifi: mac80211: don't mark keys for inactive links as uploaded wifi: mac80211: only assign chanctx in reconfig wifi: mac80211_hwsim: Declare support for AP scanning wifi: mac80211: clean up cipher suite handling wifi: mac80211: don't send keys to driver when fips_enabled wifi: cfg80211: Fix interface type validation wifi: mac80211: remove ieee80211_link_unreserve_chanctx() return value wifi: mac80211: don't unreserve never reserved chanctx mwl8k: Add missing check after DMA map wifi: mac80211: make VHT opmode NSS ignore a debug message wifi: iwlwifi: remove support of several iwl_ppag_table_cmd versions ... ==================== Link: https://patch.msgid.link/20250717094610.20106-47-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17Merge tag 'wireless-2025-07-17' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of fixes: - ath12k performance regression from -rc1 - cfg80211 counted_by() removal for scan request as it doesn't match usage and keeps complaining - iwlwifi crash with certain older devices - iwlwifi missing an error path unlock - iwlwifi compatibility with certain BIOS updates * tag 'wireless-2025-07-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: Fix botched indexing conversion wifi: cfg80211: remove scan request n_channels counted_by wifi: ath12k: Fix packets received in WBM error ring with REO LUT enabled wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap wifi: iwlwifi: pcie: fix locking on invalid TOP reset ==================== Link: https://patch.msgid.link/20250717091831.18787-5-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17Merge branch 'for-next' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Add RDMA support for Intel IPU E2000 in idpf Tatyana Nikolova says: This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ IWL reviews: v3: https://lore.kernel.org/all/20250708210554.1662-1-tatyana.e.nikolova@intel.com/ v2: https://lore.kernel.org/all/20250612220002.1120-1-tatyana.e.nikolova@intel.com/ v1 (split from previous series): https://lore.kernel.org/all/20250523170435.668-1-tatyana.e.nikolova@intel.com/ v3: https://lore.kernel.org/all/20250207194931.1569-1-tatyana.e.nikolova@intel.com/ RFC v2: https://lore.kernel.org/all/20240824031924.421-1-tatyana.e.nikolova@intel.com/ RFC: https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: idpf: implement get LAN MMIO memory regions idpf: implement IDC vport aux driver MTU change handler idpf: implement remaining IDC RDMA core callbacks and handlers idpf: implement RDMA vport auxiliary dev create, init, and destroy idpf: implement core RDMA auxiliary dev create, init, and destroy idpf: use reserved RDMA vectors from control plane ==================== Link: https://patch.msgid.link/20250714181002.2865694-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17netfilter: nf_conntrack: fix crash due to removal of uninitialised entryFlorian Westphal
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-16Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmapChristian Eggers
The 'quirks' member already ran out of bits on some platforms some time ago. Replace the integer member by a bitmap in order to have enough bits in future. Replace raw bit operations by accessor macros. Fixes: ff26b2dd6568 ("Bluetooth: Add quirk for broken READ_VOICE_SETTING") Fixes: 127881334eaa ("Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE") Suggested-by: Pauli Virtanen <pav@iki.fi> Tested-by: Ivan Pravdin <ipravdin.official@gmail.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: add missing braces when using macro parametersChristian Eggers
Macro parameters should always be put into braces when accessing it. Fixes: 4fc9857ab8c6 ("Bluetooth: hci_sync: Add check simultaneous roles support") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: fix typos in macrosChristian Eggers
The provided macro parameter is named 'dev' (rather than 'hdev', which may be a variable on the stack where the macro is used). Fixes: a9a830a676a9 ("Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE") Fixes: 6126ffabba6b ("Bluetooth: Introduce HCI_CONN_FLAG_DEVICE_PRIVACY device flag") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-15bnxt: move bnxt_hsi.h to include/linux/bnxt/hsi.hAndy Gospodarek
This moves bnxt_hsi.h contents to a common location so it can be properly referenced by bnxt_en, bnxt_re, and bnge. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250714170202.39688-1-gospo@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15net: mctp: Allow limiting binds to a peer addressMatt Johnston
Prior to calling bind() a program may call connect() on a socket to restrict to a remote peer address. Using connect() is the normal mechanism to specify a remote network peer, so we use that here. In MCTP connect() is only used for bound sockets - send() is not available for MCTP since a tag must be provided for each message. The smctp_type must match between connect() and bind() calls. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://patch.msgid.link/20250710-mctp-bind-v4-6-8ec2f6460c56@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15net: mctp: Use hashtable for bindsMatt Johnston
Ensure that a specific EID (remote or local) bind will match in preference to a MCTP_ADDR_ANY bind. This adds infrastructure for binding a socket to receive messages from a specific remote peer address, a future commit will expose an API for this. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://patch.msgid.link/20250710-mctp-bind-v4-5-8ec2f6460c56@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-15wifi: cfg80211: parse attribute to update unsolicited probe response templateYuvarani V
At present, the updated unsolicited broadcast probe response template is not processed during userspace commands such as channel switch or color change. This leads to an issue where older incorrect unsolicited probe response is still used during these events. Add support to parse the netlink attribute and store it so that mac80211/drivers can use it to set the BSS_CHANGED_UNSOL_BCAST_PROBE_RESP flag in order to send the updated unsolicited broadcast probe response templates during these events. Signed-off-by: Yuvarani V <quic_yuvarani@quicinc.com> Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com> Link: https://patch.msgid.link/20250710-update_unsol_bcast_probe_resp-v2-1-31aca39d3b30@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-15wifi: cfg80211: Fix interface type validationIlan Peer
Fix a condition that verified valid values of interface types. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250709233537.7ad199ca5939.I0ac1ff74798bf59a87a57f2e18f2153c308b119b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-15wifi: cfg80211: remove scan request n_channels counted_byJohannes Berg
This reverts commit e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by"). This really has been a completely failed experiment. There were no actual bugs found, and yet at this point we already have four "fixes" to it, with nothing to show for but code churn, and it never even made the code any safer. In all of the cases that ended up getting "fixed", the structure is also internally inconsistent after the n_channels setting as the channel list isn't actually filled yet. You cannot scan with such a structure, that's just wrong. In mac80211, the struct is also reused multiple times, so initializing it once is no good. Some previous "fixes" (e.g. one in brcm80211) are also just setting n_channels before accessing the array, under the assumption that the code is correct and the array can be accessed, further showing that the whole thing is just pointless when the allocation count and use count are not separate. If we really wanted to fix it, we'd need to separately track the number of channels allocated and the number of channels currently used, but given that no bugs were found despite the numerous syzbot reports, that'd just be a waste of time. Remove the __counted_by() annotation. We really should also remove a number of the n_channels settings that are setting up a structure that's inconsistent, but that can wait. Reported-by: syzbot+e834e757bd9b3d3e1251@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e834e757bd9b3d3e1251 Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Link: https://patch.msgid.link/20250714142130.9b0bbb7e1f07.I09112ccde72d445e11348fc2bef68942cb2ffc94@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-14tcp: add const to tcp_try_rmem_schedule() and sk_rmem_schedule() skbEric Dumazet
These functions to not modify the skb, add a const qualifier. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711114006.480026-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14tcp: add LINUX_MIB_BEYOND_WINDOWEric Dumazet
Add a new SNMP MIB : LINUX_MIB_BEYOND_WINDOW Incremented when an incoming packet is received beyond the receiver window. nstat -az | grep TcpExtBeyondWindow Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711114006.480026-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14tcp: do not accept packets beyond windowEric Dumazet
Currently, TCP accepts incoming packets which might go beyond the offered RWIN. Add to tcp_sequence() the validation of packet end sequence. Add the corresponding check in the fast path. We relax this new constraint if the receive queue is empty, to not freeze flows from buggy peers. Add a new drop reason : SKB_DROP_REASON_TCP_INVALID_END_SEQUENCE. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711114006.480026-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14Add support to set NAPI threaded for individual NAPISamiullah Khawaja
A net device has a threaded sysctl that can be used to enable threaded NAPI polling on all of the NAPI contexts under that device. Allow enabling threaded NAPI polling at individual NAPI level using netlink. Extend the netlink operation `napi-set` and allow setting the threaded attribute of a NAPI. This will enable the threaded polling on a NAPI context. Add a test in `nl_netdev.py` that verifies various cases of threaded NAPI being set at NAPI and at device level. Tested ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-07-14 * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: IFC updates for disabled host PF net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifc RDMA/mlx5: Fix UMR modifying of mkey page size net/mlx5: Expose HCA capability bits for mkey max page size ==================== Link: https://patch.msgid.link/1752481357-34780-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14net/x25: Remove unused x25_terminate_link()Dr. David Alan Gilbert
x25_terminate_link() has been unused since the last use was removed in 2020 by: commit 7eed751b3b2a ("net/x25: handle additional netdev events") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://patch.msgid.link/20250712205759.278777-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14dev: Pass netdevice_tracker to dev_get_by_flags_rcu().Kuniyuki Iwashima
This is a follow-up for commit eb1ac9ff6c4a5 ("ipv6: anycast: Don't hold RTNL for IPV6_JOIN_ANYCAST."). We should not add a new device lookup API without netdevice_tracker. Let's pass netdevice_tracker to dev_get_by_flags_rcu() and rename it with netdev_ prefix to match other newer APIs. Note that we always use GFP_ATOMIC for netdev_hold() as it's expected to be called under RCU. Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/netdev/20250708184053.102109f6@kernel.org/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250711051120.2866855-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14idpf: implement get LAN MMIO memory regionsJoshua Hay
The RDMA driver needs to map its own MMIO regions for the sake of performance, meaning the IDPF needs to avoid mapping portions of the BAR space. However, to be HW agnostic, the IDPF cannot assume where these are and must avoid mapping hard coded regions as much as possible. The IDPF maps the bare minimum to load and communicate with the control plane, i.e., the mailbox registers and the reset state registers. Because of how and when mailbox register offsets are initialized, it is easier to adjust the existing defines to be relative to the mailbox region starting address. Use a specific mailbox register write function that uses these relative offsets. The reset state register addresses are calculated the same way as for other registers, described below. The IDPF then calls a new virtchnl op to fetch a list of MMIO regions that it should map. The addresses for the registers in these regions are calculated by determining what region the register resides in, adjusting the offset to be relative to that region, and then adding the register's offset to that region's mapped address. If the new virtchnl op is not supported, the IDPF will fallback to mapping the whole bar. However, it will still map them as separate regions outside the mailbox and reset state registers. This way we can use the same logic in both cases to access the MMIO space. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement RDMA vport auxiliary dev create, init, and destroyJoshua Hay
Implement the functions to create, initialize, and destroy an RDMA vport auxiliary device. The vport aux dev creation is dependent on the core aux device to call idpf_idc_vport_dev_ctrl to signal that it is ready for vport aux devices. Implement that core callback to either create and initialize the vport aux dev or deinitialize. RDMA vport aux dev creation is also dependent on the control plane to tell us the vport is RDMA enabled. Add a flag in the create vport message to signal individual vport RDMA capabilities. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14idpf: implement core RDMA auxiliary dev create, init, and destroyJoshua Hay
Add the initial idpf_idc.c file with the functions to kick off the IDC initialization, create and initialize a core RDMA auxiliary device, and destroy said device. The RDMA core has a dependency on the vports being created by the control plane before it can be initialized. Therefore, once all the vports are up after a hard reset (either during driver load a function level reset), the core RDMA device info will be created. It is populated with the function type (as distinguished by the IDC initialization function pointer), the core idc_ops function points (just stubs for now), the reserved RDMA MSIX table, and various other info the core RDMA auxiliary driver will need. It is then plugged on to the bus. During a function level reset or driver unload, the device will be unplugged from the bus and destroyed. Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-07-14Revert "netfilter: nf_tables: Add notifications for hook changes"Phil Sutter
This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82. Such notifications fit better into core or nfnetlink_hook code, following the NFNL_MSG_HOOK_GET message format. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-13Merge tag 'irq_urgent_for_v6.16_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a case of recursive locking in the MSI code - Fix a randconfig build failure in armada-370-xp irqchip * tag 'irq_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-msi-lib: Fix build with PCI disabled PCI/MSI: Prevent recursive locking in pci_msix_write_tph_tag()
2025-07-13net/mlx5: IFC updates for disabled host PFDaniel Jurgens
The port 2 host PF can be disabled, this bit reflects that setting. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752064867-16874-3-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-13net/mlx5: Expose disciplined_fr_counter through HCA capabilities in mlx5_ifcCarolina Jubran
Introduce the `disciplined_fr_counter` capability bit to indicate that the device’s free-running cycle counter is disciplined to real-time. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752064867-16874-2-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-13RDMA/mlx5: Fix UMR modifying of mkey page sizeEdward Srouji
When changing the page size on an mkey, the driver needs to set the appropriate bits in the mkey mask to indicate which fields are being modified. The 6th bit of a page size in mlx5 driver is considered an extension, and this bit has a dedicated capability and mask bits. Previously, the driver was not setting this mask in the mkey mask when performing page size changes, regardless of its hardware support, potentially leading to an incorrect page size updates. This fixes the issue by setting the relevant bit in the mkey mask when performing page size changes on an mkey and the 6th bit of this field is supported by the hardware. Fixes: cef7dde8836a ("net/mlx5: Expand mkey page size to support 6 bits") Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/9f43a9c73bf2db6085a99dc836f7137e76579f09.1751979184.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-13net/mlx5: Expose HCA capability bits for mkey max page sizeMichael Guralnik
Expose the HCA capability for maximal page size that can be configured for an mkey. Used for enforcing capabilities when working with highly contiguous memory and using large page sizes. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/3e4d3fda37934430f65f72601519e22bf396fd05.1751979184.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-12Merge tag 'mm-hotfixes-stable-2025-07-11-16-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. A whopping 16 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 14 are for MM. Three gdb-script fixes and a kallsyms build fix" * tag 'mm-hotfixes-stable-2025-07-11-16-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "sched/numa: add statistics of numa balance task" mm: fix the inaccurate memory statistics issue for users mm/damon: fix divide by zero in damon_get_intervals_score() samples/damon: fix damon sample mtier for start failure samples/damon: fix damon sample wsse for start failure samples/damon: fix damon sample prcl for start failure kasan: remove kasan_find_vm_area() to prevent possible deadlock scripts: gdb: vfs: support external dentry names mm/migrate: fix do_pages_stat in compat mode mm/damon/core: handle damon_call_control as normal under kdmond deactivation mm/rmap: fix potential out-of-bounds page table access during batched unmap mm/hugetlb: don't crash when allocating a folio if there are no resv scripts/gdb: de-reference per-CPU MCE interrupts scripts/gdb: fix interrupts.py after maple tree conversion maple_tree: fix mt_destroy_walk() on root leaf node mm/vmalloc: leave lazy MMU mode on PTE mapping error scripts/gdb: fix interrupts display after MCP on x86 lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() kallsyms: fix build without execinfo
2025-07-11Merge tag 'drm-fixes-2025-07-12' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Simona Vetter: "Cross-subsystem Changes: - agp/amd64 binding dmesg noise regression fix Core Changes: - fix race in gem_handle_create_tail - fixup handle_count fb refcount regression from -rc5, popular with reports ... - call rust dtor for drm_device release Driver Changes: - nouveau: magic 50ms suspend fix, acpi leak fix - tegra: dma api error in nvdec - pvr: fix device reset - habanalbs maintainer update - intel display: fix some dsi mipi sequences - xe fixes: SRIOV fixes, small GuC fixes, disable indirect ring due to issues, compression fix for fragmented BO, doc update * tag 'drm-fixes-2025-07-12' of https://gitlab.freedesktop.org/drm/kernel: (22 commits) drm/xe/guc: Default log level to non-verbose drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VF drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2 drm/xe/pm: Correct comment of xe_pm_set_vram_threshold() drm/xe: Release runtime pm for error path of xe_devcoredump_read() drm/xe/pm: Restore display pm if there is error after display suspend drm/i915/bios: Apply vlv_fixup_mipi_sequences() to v2 mipi-sequences too drm/gem: Fix race in drm_gem_handle_create_tail() drm/framebuffer: Acquire internal references on GEM handles agp/amd64: Check AGP Capability before binding to unsupported devices drm/xe/bmg: fix compressed VRAM handling Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2" drm/xe: Allocate PF queue size on pow2 boundary drm/xe/pf: Clear all LMTT pages on alloc drm/nouveau/gsp: fix potential leak of memory used during acpi init rust: drm: remove unnecessary imports MAINTAINERS: Change habanalabs maintainer drm/imagination: Fix kernel crash when hard resetting the GPU drm/tegra: nvdec: Fix dma_alloc_coherent error check rust: drm: device: drop_in_place() the drm::Device in release() ...
2025-07-11net_sched: act_skbedit: use RCU in tcf_skbedit_dump()Eric Dumazet
Also storing tcf_action into struct tcf_skbedit_params makes sure there is no discrepancy in tcf_skbedit_act(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250709090204.797558-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11net_sched: act_police: use RCU in tcf_police_dump()Eric Dumazet
Also storing tcf_action into struct tcf_police_params makes sure there is no discrepancy in tcf_police_act(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250709090204.797558-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11net_sched: act_pedit: use RCU in tcf_pedit_dump()Eric Dumazet
Also storing tcf_action into struct tcf_pedit_params makes sure there is no discrepancy in tcf_pedit_act(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250709090204.797558-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11net_sched: act_nat: use RCU in tcf_nat_dump()Eric Dumazet
Also storing tcf_action into struct tcf_nat_params makes sure there is no discrepancy in tcf_nat_act(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250709090204.797558-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11net_sched: act_mpls: use RCU in tcf_mpls_dump()Eric Dumazet
Also storing tcf_action into struct tcf_mpls_params makes sure there is no discrepancy in tcf_mpls_act(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250709090204.797558-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11net_sched: act_ctinfo: use RCU in tcf_ctinfo_dump()Eric Dumazet
Also storing tcf_action into struct tcf_ctinfo_params makes sure there is no discrepancy in tcf_ctinfo_act(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250709090204.797558-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-11net_sched: act_ctinfo: use atomic64_t for three countersEric Dumazet
Commit 21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats") missed that stats_dscp_set, stats_dscp_error and stats_cpmark_set might be written (and read) locklessly. Use atomic64_t for these three fields, I doubt act_ctinfo is used heavily on big SMP hosts anyway. Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250709090204.797558-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>