diff options
| author | Philipp Stanner <phasta@kernel.org> | 2025-08-13 10:56:55 +0200 |
|---|---|---|
| committer | Philipp Stanner <phasta@kernel.org> | 2025-08-28 10:27:18 +0200 |
| commit | f4c75f975cf50fa2e1fd96c5aafe5aa62e55fbe4 (patch) | |
| tree | 494f25b990bcdc696d9eded97a8aaf818b0d89a0 | |
| parent | 77a62e557f54ecf6305e7ed6fb05d02e8748bffb (diff) | |
drm/sched: Document race condition in drm_sched_fini()
In drm_sched_fini() all entities are marked as stopped - without taking
the appropriate lock, because that would deadlock. That means that
drm_sched_fini() and drm_sched_entity_push_job() can race against each
other.
This should most likely be fixed by establishing the rule that all
entities associated with a scheduler must be torn down first. Then,
however, the locking should be removed from drm_sched_fini() alltogether
with an appropriate comment.
Reported-by: James Flowers <bold.zone2373@fastmail.com>
Link: https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2373@fastmail.com/
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250813085654.102504-2-phasta@kernel.org
| -rw-r--r-- | drivers/gpu/drm/scheduler/sched_main.c | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 5a550fd76bf0..46119aacb809 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1424,6 +1424,22 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) * Prevents reinsertion and marks job_queue as idle, * it will be removed from the rq in drm_sched_entity_fini() * eventually + * + * FIXME: + * This lacks the proper spin_lock(&s_entity->lock) and + * is, therefore, a race condition. Most notably, it + * can race with drm_sched_entity_push_job(). The lock + * cannot be taken here, however, because this would + * lead to lock inversion -> deadlock. + * + * The best solution probably is to enforce the life + * time rule of all entities having to be torn down + * before their scheduler. Then, however, locking could + * be dropped alltogether from this function. + * + * For now, this remains a potential race in all + * drivers that keep entities alive for longer than + * the scheduler. */ s_entity->stopped = true; spin_unlock(&rq->lock); |