| Age | Commit message (Collapse) | Author |
|
Convert struct proto pre_connect(), connect(), bind(), and bind_add()
callback function prototypes from struct sockaddr to struct sockaddr_unsized.
This does not change per-implementation use of sockaddr for passing around
an arbitrarily sized sockaddr struct. Those will be addressed in future
patches.
Additionally removes the no longer referenced struct sockaddr from
include/net/inet_common.h.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-5-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update all struct proto_ops connect() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update all struct proto_ops bind() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This adds a dlm_release_lockspace() flag to request that node-failure
recovery be performed for the node leaving the lockspace.
The implementation of this flag requires coordination with userland
clustering components. It's been requested for use by GFS2"
* tag 'dlm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: check for undefined release_option values
dlm: handle release_option as unsigned
dlm: move to rinfo for all middle conversion cases
dlm: handle invalid lockspace member remove
dlm: add new flag DLM_RELEASE_RECOVER for dlm_lockspace_release
dlm: add new configfs entry release_recover for lockspace members
dlm: add new RELEASE_RECOVER uevent attribute for release_lockspace
dlm: use defines for force values in dlm_release_lockspace
dlm: check for defined force value in dlm_lockspace_release
|
|
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This patch adds a new WQ_PERCPU flag to all the fs subsystem users to
explicitly request the use of the per-CPU behavior. Both flags coexist
for one release cycle to allow callers to transition their calls.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
All existing users have been updated accordingly.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://lore.kernel.org/20250916082906.77439-4-marco.crivellari@suse.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Checking on all undefined release_option values to return -EINVAL in
case a user is providing them to dlm_release_lockspace().
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Future patches will introduce a invalid argument check for undefined
values. All values for release_option are positive integer values to not
check on negative values as well we just change the parameter to
unsigned int.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Since commit f74dacb4c8116 ("dlm: fix recovery of middle conversions")
we introduced additional debugging information if we hit the middle
conversion by using log_limit(). The DLM log_limit() functionality
requires a DLM debug option being enabled. As this case is so rarely and
excempt any potential introduced new issue with recovery we switching it
to log_rinfo() ad this is ratelimited under normal DLM loglevel.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Since commit de7b4869b4ecf ("dlm: add new configfs entry release_recover
for lockspace members") we are moving lockspace members into a gone list
before removing them to get additional removing attributes from
configfs. There is still a very unlikely possibility when
find_config_node() returns NULL, then for some reason the node wasn't
marked as gone but it was removed. We will just handle this case and drop
an error to observe if this case can ever happen.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/gfs2/aJ2Ssuh8xlsTutrA@stanley.mountain/T/#u
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
When dlm_lockspace_release() is passed DLM_RELEASE_RECOVER, it
tells the dlm to handle the release/leave as if the node had failed,
i.e. perform recovery steps for a failed node, like recover_slot().
When DLM_RELEASE_RECOVER is set:
- dlm_release_lockspace() includes RELEASE_RECOVER=1 in the OFFLINE
uevent sent to userspace.
- userspace/dlm_controld sends a message to all lockspace members
indicating that the subsequent node removal should be handled as
if the node had failed.
- when dlm_controld on all nodes receives the new message, it sets
the release_recover configfs entry to 1 for the node.
- when the dlm/kernel next performs recovery and removes the node,
it will see that release_recover has been set, and will perform
recovery steps for the node as if it had failed, e.g. the
recover_slot() callback is called to notify the fs.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
A new configfs entry is added for a lockspace member:
/config/dlm/<cluster>/spaces/<space>/nodes/<node>/release_recover
release_recover can be set to 1 by userspace (dlm_controld process)
prior to removing the lockspace member (rmdir of the <node>).
This tells the kernel to handle the removed member as if it had failed,
i.e. recovery steps for a failed node should be perfomed, as opposed
to the recovery steps for a node doing a controlled leave.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
RELEASE_RECOVER=0|1 is passed to user space as part of the OFFLINE
uevent for leaving a lockspace, and used by subsequent patches.
RELEASE_RECOVER=1 tells user space that the release_lockspace/leave
should be handled by the remaining lockspace members as if the
leaving node had failed.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Clarify the use of the force parameter by renaming it to
"release_option" and adding defines (with descriptions) for
each of the accepted values.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Force values over 3 are undefined, so don't treat them as 3.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
DLM does not use any exported SCTP function. SCTP is registered
dynamically as protocol to the kernel and can be used over the right
protocol identifiers on the socket api. We drop the SCTP dependency as
DLM can also be used with TCP only.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Heming zhao <heming.zhao@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Reject SCTP dlm configuration if the kernel was never build with SCTP.
Currently the only one known user space tool "dlm_controld" will drop an
error in the logs and getting stuck. This behaviour should be fixed to
deliver an error to the user or fallback to TCP.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Heming zhao <heming.zhao@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Currently SCTP shutdown() call gets stuck because there is no incoming
EOF indicator on its socket. On the peer side the EOF indicator as
recvmsg() returns 0 will be triggered as mechanism to flush the socket
queue on the receive side. In SCTP recvmsg() function sctp_recvmsg() we
can see that only if sk_shutdown has the bit RCV_SHUTDOWN set SCTP will
recvmsg() will return EOF. The RCV_SHUTDOWN bit will only be set when
shutdown with SHUT_RD is called. We use now SHUT_RDWR to also get a EOF
indicator from recvmsg() call on the shutdown() initiator.
SCTP does not support half closed sockets and the semantic of SHUT_WR is
different here, it seems that calling SHUT_WR on sctp sockets keeps the
socket open to have the possibility to do some specific SCTP operations on
it that we don't do here.
There exists still a difference in the limitations of TCP vs SCTP in
case if we are required to have a half closed socket functionality. This
was tried to archieve with DLM protocol changes in the past and
hopefully we really don't require half closed socket functionality.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Tested-by: Heming zhao <heming.zhao@suse.com>
Reviewed-by: Heming zhao <heming.zhao@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The sk->sk_shutdown value is flag value so use masking to check if
RCV_SHUTDOWN is set as other possible values like SEND_SHUTDOWN can set
as well.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Tested-by: Heming zhao <heming.zhao@suse.com>
Reviewed-by: Heming zhao <heming.zhao@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch bypasses multi-link errors in TCP mode, allowing dlm
to operate on the first tcp link.
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
If an active rsb is not hashed anymore and this could occur because we
releases and acquired locks we need to signal the followed code that
the lookup failed. Since the lookup was successful, but it isn't part of
the rsb hash anymore we need to signal it by setting error to -EBADR as
dlm_search_rsb_tree() does it.
Cc: stable@vger.kernel.org
Fixes: 5be323b0c64d ("dlm: move dlm_search_rsb_tree() out of lock")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
If an inactive rsb is not hashed anymore and this could occur because we
releases and acquired locks we need to signal the followed code that the
lookup failed. Since the lookup was successful, but it isn't part of the
rsb hash anymore we need to signal it by setting error to -EBADR as
dlm_search_rsb_tree() does it.
Cc: stable@vger.kernel.org
Fixes: 01fdeca1cc2d ("dlm: use rcu to avoid an extra rsb struct lookup")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
do_uevent returns the value written to event_done. In case it is a
positive value, new_lockspace would undo all the work, and lockspace
would not be set. __dlm_new_lockspace, however, would treat that
positive value as a success due to commit 8511a2728ab8 ("dlm: fix use
count with multiple joins").
Down the line, device_create_lockspace would pass that NULL lockspace to
dlm_find_lockspace_local, leading to a NULL pointer dereference.
Treating such positive values as successes prevents the problem. Given
this has been broken for so long, this is unlikely to break userspace
expectations.
Fixes: 8511a2728ab8 ("dlm: fix use count with multiple joins")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch increases the maximum number of links that can be
used with corosync3/knet. The majority of the changes are in
user space dlm_tools/dlm_controld.
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Currently if no comm can be found dlm_comm_seq() returns -EEXIST which
means entry already exists for a lookup it makes no sense to return
-EEXIST. We change it to -ENOENT. There is no user that will evaluate
the return value on a specific value so this should be fine.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The return type of srcu_read_lock() is int and not bool. Whereas we
using the ret variable only to evaluate a bool type of
dlm_lowcomms_con_has_addr() to check if an address is already being set.
Fixes: 6f0b0b5d7ae7 ("fs: dlm: remove dlm_node_addrs lookup list")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
An rsb struct was not being removed in the case where it
was both the master and the dir record. This case (master
and dir node) was missed in the condition for doing add_scan()
from deactivate_rsb(). Fixing this triggers a related WARN_ON
that needs to be fixed, and requires adjusting where two
del_scan() calls are made.
Fixes: c217adfc8caa ("dlm: fix add_scan and del_scan usage")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
If dlm_recover_members() fails we don't drop the references of the
previous created root_list that holds and keep all rsbs alive during the
recovery. It might be not an unlikely event because ping_members() could
run into an -EINTR if another recovery progress was triggered again.
Fixes: 3a747f4a2ee8 ("dlm: move rsb root_list to ls_recover() stack")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
In one special case, recovery is unable to reliably rebuild
lock state by simply recreating lkb structs as sent from the
lock holders. That case is when the lkb's include conversions
between PR and CW modes.
The recovery code has always recognized this special case,
but the implemention has always been broken, and would set
invalid modes in recovered lkb's. Unpredictable or bogus
errors could then be returned for further locking calls on
these locks.
This bug has gone unnoticed for so long due to some
combination of:
- applications never or infrequently converting between PR/CW
- recovery not occuring during these conversions
- if the recovery bug does occur, the caller may not notice,
depending on what further locking calls are made, e.g. if
the lock is simply unlocked it may go unnoticed
However, a core analysis from a recent gfs2 bug report points
to this broken code.
PR = Protected Read
CW = Concurrent Write
PR and CW are incompatible
PR and PR are compatible
CW and CW are compatible
Example 1
node C, resource R
granted: PR node A
granted: PR node B
granted: NL node C
granted: NL node D
- A sends convert PR->CW to C
- C fails before A gets a reply
- recovery occurs
At this point, A does not know if it still holds
the lock in PR, or if its conversion to CW was granted:
- If A's conversion to CW was granted, then another
node's CW lock may also have been granted.
- If A's conversion to CW was not granted, it still
holds a PR lock, and other nodes may also hold PR locks.
So, the new master of R cannot simply recreate the lock
from A using granted mode PR and requested mode CW.
The new master must look at all the recovered locks to
determine the correct granted modes, and ensure that all
the recovered locks are recreated in compatible states.
The correct lock recovery steps in this example are:
- node D becomes the new master of R
- node B sends D its lkb, granted PR
- node A sends D its lkb, convert PR->CW
- D determines the correct lock state is:
granted: PR node B
convert: PR->CW node A
The lkb sent by each node was recreated without
any change on the new master node.
Example 2
node C, resource R
granted: PR node A
granted: NL node C
granted: NL node D
waiting: CW node B
- A sends convert PR->CW to C
- C grants the conversion to CW for A
- C grants the waiting request for CW to B
- C sends granted message to B, but fails
before it can send the granted message to A
- B receives the granted message from C
At this point:
- A believes it is converting PR->CW
- B believes it is holding a CW lock
The correct lock recovery steps in this example are:
- node D becomes the new master of R
- node A sends D its lkb, convert PR->CW
- node B sends D its lkb, granted CW
- D determins the correct lock state is:
granted: CW node B
granted: CW node A
The lkb sent by B is recreated without change,
but the lkb sent by A is changed because the
granted mode was not compatible.
Fixes to make this work correctly:
recover_convert_waiter: should not make any changes
to a converting lkb that is still waiting for a reply
message. It was previously setting grmode to IV, which
is invalid state, so the lkb would not be handled
correctly by other code.
receive_rcom_lock_args: was checking the wrong lkb field
(wait_type instead of status) to determine if the lkb is
being converted, and in need of inspection for this special
recovery. It was also setting grmode to IV in the lkb,
causing it to be mishandled by other code.
Now, this function just puts the lkb, directly as sent,
onto the convert queue of the resource being recovered,
and corrects it in recover_conversion() later, if needed.
recover_conversion: the job of this function is to detect
and correct lkb states for the special PR/CW conversions.
The new code now checks for recovered lkbs on the granted
queue with grmode PR or CW, and takes the real grmode from
that. Then it looks for lkbs on the convert queue with an
incompatible grmode (i.e. grmode PR when the real grmode is
CW, or v.v.) These converting lkbs need to be fixed.
They are fixed by temporarily setting their grmode to NL,
so that grmodes are not incompatible and won't confuse other
locking code. The converting lkb will then be granted at
the end of recovery, replacing the temporary NL grmode.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
If add_to_waiters() fails we have a problem because the previous called
functions such as validate_lock_args() or validate_unlock_args() sets
specific lkb values that are set for a request, there exists no way back
to revert those changes. When there is a pending lock request the
original request arguments will be overwritten with unknown
consequences.
The good news are that I believe those cases that we fail in
add_to_waiters() can't happen or very unlikely to happen (only if the DLM
user does stupid API things), but if so we have the above mentioned
problem.
There are two conditions that will be removed here. The first one is the
-EINVAL case which contains is_overlap_unlock() or (is_overlap_cancel()
and mstype == DLM_MSG_CANCEL).
The is_overlap_unlock() is missing for the normal UNLOCK case which is
moved to validate_unlock_args(). The is_overlap_cancel() already happens
in validate_unlock_args() when DLM_LKF_CANCEL is set. In case of
validate_lock_args() we check on is_overlap() when it is not a new request,
on a new request the lkb is always new and does not have those values set.
The -EBUSY check can't happen in case as for non new lock requests (when
DLM_LKF_CONVERT is set) we already check in validate_lock_args() for
lkb_wait_type and is_overlap(). Then there is only
validate_unlock_args() that will never hit the default case because
dlm_unlock() will produce DLM_MSG_UNLOCK and DLM_MSG_CANCEL messages.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
We are using kstrtouint() to parse common integer fields. This patch
will switch to use unsigned int instead of int as we are parsing
unsigned integer values.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch removes the configfs storage fields from the dlm_cluster
structure to store per cluster values. Those fields also exists for the
dlm_config global variable and get stored in both when setting configfs
values. To read values it will always be read out from the dlm_cluster
configfs structure but this patch changes it to only use the global
dlm_config variable. Storing them in two places makes no sense as both
are able to be changed under certain conditions during DLM runtime.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch handles the DLM listen port setting internally as byte order
as it is a value that is used as network byte on the wire. The user
space still sets this value as host byte order for configfs as we don't
break UAPI here.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The DLM configfs path has usually a nodeid in it's directory path and
again a file to store the nodeid again in a separate storage. It is
forced that the user space will set both (the directory name and nodeid
file) storage to the same value if it doesn't do that we run in some
kind of broken state.
This patch will simply represent the file storage to it's upper
directory nodeid name. It will force the user now to use a valid
unsigned int as nodeid directory name and will ignore all nodeid writes
in the nodeid file storage as this will now always represent the upper
nodeid directory name.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch fixes a possible null pointer dereference when this function is
called from request_lock() as lkb->lkb_resource is not assigned yet,
only after validate_lock_args() by calling attach_lkb(). Another issue
is that a resource name could be a non printable bytearray and we cannot
assume to be ASCII coded.
The log functionality is probably never being hit when DLM is used in
normal way and no debug logging is enabled. The null pointer dereference
can only occur on a new created lkb that does not have the resource
assigned yet, it probably never hits the null pointer dereference but we
should be sure that other changes might not change this behaviour and we
actually can hit the mentioned null pointer dereference.
In this patch we just drop the printout of the resource name, the lkb id
is enough to make a possible connection to a resource name if this
exists.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The arguments got swapped by commit 986ae3c2a8df ("dlm: fix race between
final callback and remove") fixing this now.
Fixes: 986ae3c2a8df ("dlm: fix race between final callback and remove")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch sets an missing -ENOMEM as error return value when the
allocation of the dlm workqueue fails.
Fixes: 94e180d6255f ("dlm: async freeing of lockspace resources")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202408110800.OsoP8TB9-lkp@intel.com/
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
To avoid -EINPROGRESS cases on connect that just ends in a retry we just
call connect in a synchronized way to wait until its done. Since commit
dbb751ffab0b ("fs: dlm: parallelize lowcomms socket handling") we have a
non ordered workqueue running for serving the DLM sockets that allows us
to call send/recv for each DLM socket connection in parallel. Before
each worker needed to wait until the previous worker was done and
probably the reason why connect() was called in an asynchronous way to
not block other workers. This is however not necessary anymore as other
socket handling workers don't need to wait.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch moves the xarray lookup functionality for the lkb out of the
ls_lkbxa_lock read lock handling. We can do that as the xarray should be
possible to access lockless in case of reader like xa_load(). We confirm
under ls_lkbxa_lock that the lkb is still part of the data structure and
take a reference when its still part of ls_lkbxa to avoid being freed
after doing the lookup. To do a check if the lkb is still part of the
ls_lkbxa data structure we use a kref_read() as the last put will remove
it from the ls_lkbxa data structure and any reference taken means it is
still part of ls_lkbxa.
A similar approach was done with the DLM rsb rhashtable just with a flag
instead of the refcounter because the refcounter has a slightly
different meaning.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The rhashtable structure is lockless for readers such as
rhashtable_lookup_fast(). It should be save to call this lookup
functionality out of holding ls_rsbtbl_lock to get the rsb pointer out
of the hash. This reduce the contention time of ls_rsbtbl_lock in some
cases. We still need to check if the rsb is part of the check as this
state can be changed while ls_rsbtbl_lock is not held. If its part of
the rhashtable data structure we take a reference to be sure it will not
be freed after we drop the ls_rsbtbl_lock read lock.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Since commit 01fdeca1cc2d ("dlm: use rcu to avoid an extra rsb struct
lookup") _dlm_master_lookup() is called under rcu lock that prevents
that the rsb structure is being freed. There was a missing change to
avoid an additional lookup and just check that the rsb is still part of
the ls_rsbtbl structure. This patch is doing such check instead of
lookup the rsb structure again.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch handles freeing of lockspace resources asynchronously besides
the release_lockspace() context. The release_lockspace() context is
sometimes called in a time critical context, e.g. umount syscall. Most
every user space init system will timeout if it takes too long. To
reduce the potential waiting time we deregister in release_lockspace()
the lockspace from the DLM subsystem and do the actual releasing of
lockspace resource in a worker of a workqueue following recommendation
of:
https://lore.kernel.org/all/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp/T/#u
as flushing of system workqueues are not allowed. The most time to
release the DLM resources are spent to release the data structures
"ls->ls_lkbxa" and "ls->ls_rsbtbl" as they iterate over each entries and
those data structures can contain millions of entries. This patch handles
for now only freeing of those data structures as those operations are
the most reason why release_lockspace() blocking of being returned.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch removes the releasing of the "struct dlm ls" resource out of
the kobject handling. Instead we run kfree() after kobject_put() of the
lockspace kobject structure that should always being the last put call.
This prepares to split the releasing of all lockspace resources
asynchronously in the background and just deregister everything in
release_lockspace().
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch adds a warn on if is_master() and dlm_is_removed() checks on
invalid nodeid states that are probably not what the caller wants to do
here. The is_master() function checking on r->res_nodeid is invalid when
it is set to -1, whereas the dlm_is_removed() has a different meaning
as "nodeid member" and also 0 is invalid.
We run into these cases and this patch changes those cases as we never
will run into them. There should be no functional changes as the
condition should return the same result. However this patch signals now
on caller level that there might be an "extra" case to handle here.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch will remote the return of an invalid nodeid value when
local_comm is not set. This case should never happen as the DLM stack
tries to compare valid nodeids with an invalid nodeid returned by
dlm_our_nodeid(). Instead we let it crash to getting at least recognized
if we running into such state.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch removes unnecessary refcounts that are obviously not
necessary because either when the pointer is passed as parameter or it
is part of a list we should already hold a reference to it.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch removes a unnecessary parameter from DLM memory allocation
helpers and reduce some functions by just directly reply the pointer
address of the allocated memory.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
In the case we trigger dlm_free_rsb() that does a call_rcu() and the
responding kfree() of res_lvbptr and a kmem_cache_free() of the rsb
pointer we need to wait until this pending operation is done before
calling kmem_cache_destroy(). We doing that by using rcu_barrier() that
waits until all pending call_rcu() are done. This avoids that
kmem_cache_destroy() complains about active objects around that are not
being freed yet by call_rcu().
There is currently more discussions about to make this behaviour better,
see:
https://lore.kernel.org/netdev/20240609082726.32742-1-Julia.Lawall@inria.fr/
However this is only for call_rcu() if the callback calls
kmem_cache_destroy() only to replace it by kfree_rcu() call which has
currently some issue. This isn't our case because we also free the
res_lvbptr if being set.
For our case, to avoid the above race rcu_barrier() should be used before
calling kmem_cache_destroy() to be sure that there are no active objects
around. This is exactly what net/batman-adv is also doing before calling their
kmem_cache_destroy() in module unloading.
Fixes: 01fdeca1cc2d ("dlm: use rcu to avoid an extra rsb struct lookup")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The DLM rcom handling has a check that all exflags are the same for the
whole lockspace membership nodes. There are some flags that requires
such handling, however DLM_LSFL_SOFTIRQ does not require this handling
and it should be backwards compatibility with other lockspaces that does
not set this flag.
Fixes: f328a26eeb53 ("dlm: introduce DLM_LSFL_SOFTIRQ_SAFE")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|