Delay and sleep mechanisms¶
This document seeks to answer the common question: “What is the RightWay (TM) to insert a delay?”
This question is most often faced by driver writers who have to deal with hardware delays and who may not be the most intimately familiar with the inner workings of the Linux Kernel.
The following table gives a rough overview about the existing function ‘families’ and their limitations. This overview table does not replace the reading of the function description before usage!
*delay() |
usleep_range*() |
*sleep() |
||
---|---|---|---|---|
busy-wait loop |
hrtimers based |
timer list timers based |
combines the others |
|
Usage in atomic Context |
yes |
no |
no |
no |
precise on “short intervals” |
yes |
yes |
depends |
yes |
precise on “long intervals” |
Do not use! |
yes |
max 12.5% slack |
yes |
interruptible variant |
no |
yes |
yes |
no |
A generic advice for non atomic contexts could be:
Use
fsleep()
whenever unsure (as it combines all the advantages of the others)Use *sleep() whenever possible
Use usleep_range*() whenever accuracy of *sleep() is not sufficient
Use *delay() for very, very short delays
Find some more detailed information about the function ‘families’ in the next sections.
*delay() family of functions¶
These functions use the jiffy estimation of clock speed and will busy wait for
enough loop cycles to achieve the desired delay. udelay()
is the basic
implementation and ndelay()
as well as mdelay()
are variants.
These functions are mainly used to add a delay in atomic context. Please make sure to ask yourself before adding a delay in atomic context: Is this really required?
-
void udelay(unsigned long usec)¶
Inserting a delay based on microseconds with busy waiting
Parameters
unsigned long usec
requested delay in microseconds
Description
When delaying in an atomic context ndelay()
, udelay()
and mdelay()
are the
only valid variants of delaying/sleeping to go with.
When inserting delays in non atomic context which are shorter than the time
which is required to queue e.g. an hrtimer and to enter then the scheduler,
it is also valuable to use udelay()
. But it is not simple to specify a
generic threshold for this which will fit for all systems. An approximation
is a threshold for all delays up to 10 microseconds.
When having a delay which is larger than the architecture specific
MAX_UDELAY_MS
value, please make sure mdelay()
is used. Otherwise a overflow
risk is given.
Please note that ndelay()
, udelay()
and mdelay()
may return early for several
reasons (https://lists.openwall.net/linux-kernel/2011/01/09/56):
computed loops_per_jiffy too low (due to the time taken to execute the timer interrupt.)
cache behaviour affecting the time it takes to execute the loop function.
CPU clock rate changes.
-
void ndelay(unsigned long nsec)¶
Inserting a delay based on nanoseconds with busy waiting
Parameters
unsigned long nsec
requested delay in nanoseconds
Description
See udelay()
for basic information about ndelay()
and it’s variants.
-
mdelay¶
mdelay (n)
Inserting a delay based on milliseconds with busy waiting
usleep_range*() and *sleep() family of functions¶
These functions use hrtimers or timer list timers to provide the requested sleeping duration. In order to decide which function is the right one to use, take some basic information into account:
hrtimers are more expensive as they are using an rb-tree (instead of hashing)
hrtimers are more expensive when the requested sleeping duration is the first timer which means real hardware has to be programmed
timer list timers always provide some sort of slack as they are jiffy based
The generic advice is repeated here:
Use
fsleep()
whenever unsure (as it combines all the advantages of the others)Use *sleep() whenever possible
Use usleep_range*() whenever accuracy of *sleep() is not sufficient
First check fsleep()
function description and to learn more about accuracy,
please check msleep()
function description.
usleep_range*()¶
-
void usleep_range(unsigned long min, unsigned long max)¶
Sleep for an approximate time
Parameters
unsigned long min
Minimum time in microseconds to sleep
unsigned long max
Maximum time in microseconds to sleep
Description
For basic information please refere to usleep_range_state()
.
The task will be in the state TASK_UNINTERRUPTIBLE during the sleep.
-
void usleep_range_idle(unsigned long min, unsigned long max)¶
Sleep for an approximate time with idle time accounting
Parameters
unsigned long min
Minimum time in microseconds to sleep
unsigned long max
Maximum time in microseconds to sleep
Description
For basic information please refere to usleep_range_state()
.
The sleeping task has the state TASK_IDLE during the sleep to prevent contribution to the load avarage.
-
void usleep_range_state(unsigned long min, unsigned long max, unsigned int state)¶
Sleep for an approximate time in a given state
Parameters
unsigned long min
Minimum time in usecs to sleep
unsigned long max
Maximum time in usecs to sleep
unsigned int state
State of the current task that will be while sleeping
Description
usleep_range_state()
sleeps at least for the minimum specified time but not
longer than the maximum specified amount of time. The range might reduce
power usage by allowing hrtimers to coalesce an already scheduled interrupt
with this hrtimer. In the worst case, an interrupt is scheduled for the upper
bound.
The sleeping task is set to the specified state before starting the sleep.
In non-atomic context where the exact wakeup time is flexible, use
usleep_range()
or its variants instead of udelay()
. The sleep improves
responsiveness by avoiding the CPU-hogging busy-wait of udelay()
.
*sleep()¶
-
void msleep(unsigned int msecs)¶
sleep safely even with waitqueue interruptions
Parameters
unsigned int msecs
Requested sleep duration in milliseconds
Description
msleep()
uses jiffy based timeouts for the sleep duration. Because of the
design of the timer wheel, the maximum additional percentage delay (slack) is
12.5%. This is only valid for timers which will end up in level 1 or a higher
level of the timer wheel. For explanation of those 12.5% please check the
detailed description about the basics of the timer wheel.
The slack of timers which will end up in level 0 depends on sleep duration (msecs) and HZ configuration and can be calculated in the following way (with the timer wheel design restriction that the slack is not less than 12.5%):
slack = MSECS_PER_TICK / msecs
When the allowed slack of the callsite is known, the calculation could be turned around to find the minimal allowed sleep duration to meet the constraints. For example:
HZ=1000
withslack=25%
:MSECS_PER_TICK / slack = 1 / (1/4) = 4
: all sleep durations greater or equal 4ms will meet the constraints.HZ=1000
withslack=12.5%
:MSECS_PER_TICK / slack = 1 / (1/8) = 8
: all sleep durations greater or equal 8ms will meet the constraints.HZ=250
withslack=25%
:MSECS_PER_TICK / slack = 4 / (1/4) = 16
: all sleep durations greater or equal 16ms will meet the constraints.HZ=250
withslack=12.5%
:MSECS_PER_TICK / slack = 4 / (1/8) = 32
: all sleep durations greater or equal 32ms will meet the constraints.
See also the signal aware variant msleep_interruptible()
.
-
unsigned long msleep_interruptible(unsigned int msecs)¶
sleep waiting for signals
Parameters
unsigned int msecs
Requested sleep duration in milliseconds
Description
See msleep()
for some basic information.
The difference between msleep()
and msleep_interruptible()
is that the sleep
could be interrupted by a signal delivery and then returns early.
Return
The remaining time of the sleep duration transformed to msecs (see schedule_timeout() for details).
-
void ssleep(unsigned int seconds)¶
wrapper for seconds around msleep
Parameters
unsigned int seconds
Requested sleep duration in seconds
Description
Please refere to msleep()
for detailed information.
-
void fsleep(unsigned long usecs)¶
flexible sleep which autoselects the best mechanism
Parameters
unsigned long usecs
requested sleep duration in microseconds
Description
flseep() selects the best mechanism that will provide maximum 25% slack to the requested sleep duration. Therefore it uses:
udelay()
loop for sleep durations <= 10 microseconds to avoid hrtimer overhead for really short sleep durations.usleep_range()
for sleep durations which would lead with the usage ofmsleep()
to a slack larger than 25%. This depends on the granularity of jiffies.msleep()
for all other sleep durations.
Note
When CONFIG_HIGH_RES_TIMERS
is not set, all sleeps are processed with
the granularity of jiffies and the slack might exceed 25% especially for
short sleep durations.