Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


rjw at sisk

Jun 30, 2009, 3:30 PM

Post #1 of 14 (300 views)
Permalink
[RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6))

On Tuesday 30 June 2009, Alan Stern wrote:
...
> That's enough to give you the general idea. I think this design is
> a lot cleaner than the current one.

Well, I'm not really happy with starting over, but if you think we should do
that, then let's do it.

I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
and ->runtime_resume() make sense. Now, the role of the framework, IMO, is to
provide a mechanism by which it is possible:
(1) to schedule a delayed execution of ->runtime_suspend(), possibly from
interrupt context,
(2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
from interrupt context,
(3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
synchronous way (I'm not sure about ->runtime_idle())
_and_ to ensure that these callbacks will be executed when it makes sense.
There's no other point, because the core has no information to make choices,
it can only prevent wrong things from happening, if possible.

I think you will agree that the users of the framework should be able to
prevent ->runtime_suspend() from being called and that's what the usage counter
is for. Also, IMO it should be impossible to execute ->runtime_idle(), via the
framework, when the usage counter is nonzero.

BTW, I don't think resume_count is the best name; it used to be in the version
of my patch where it was automatically incremented when ->runtime_resume() was
about to called. usage_count is probably better.

Next, I think that the framework should refuse to call ->runtime_suspend() and
->runtime_idle() if the children of the device are not suspended and the
"ignore children" flag is unset. The counter of unsuspended children is used
for that. I think the rule should be that it is decremented for the parent
whenever ->runtime_suspend() is called for a child and it is incremented
for the parent whenever ->runtime_resume() is called for a child.

Now, the question is what rules should apply to the ordering and possible
simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
->runtime_resume(). I think the following rules make sense:

* It is forbidden to run ->runtime_suspend() twice in a row.

* It is forbidden to run ->runtime_suspend() in parallel with another instance
of ->runtime_suspend().

* It is forbidden to run ->runtime_resume() twice in a row.

* It is forbidden to run ->runtime_resume() in parallel with another instance
of ->runtime_resume().

* It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
->runtime_idle(), but the latter case is preferred.

* It is allowed to run ->runtime_resume() after ->runtime_suspend().

* It is forbidden to run ->runtime_resume() after ->runtime_idle().

* It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
parallel with each other.

* It is forbidden to run ->runtime_idle() twice in a row.

* It is forbidden to run ->runtime_idle() in parallel with another instance
of ->runtime_idle().

* It is forbidden to run ->runtime_idle() after ->runtime_suspend().

* It is allowed to run ->runtime_idle() after ->runtime_resume().

* It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
->runtime_idle() is running. In particular, it is allowed to (indirectly)
call ->runtime_suspend() from within ->runtime_idle().

* It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
->runtime_suspend() is running.

* If ->runtime_resume() is about to be called immediately after
->runtime_suspend(), the execution of ->runtime_suspend() should be
prevented from happening, if possible, in which case the execution of
->runtime_resume() shouldn't happen.

* If ->runtime_suspend() is about to be called immediately after
->runtime_resume(), the execution of ->runtime_resume() should be
prevented from happening, if possible, in which case the execution of
->runtime_suspend() shouldn't happen.

[.Are there any more rules related to these callbacks we should take into
account?]

Next, if we agree about the rules above, the question is what helper functions
should be provided by the core allowing these rules to be followed
automatically and what error codes should be returned by them in case it
wasn't possible to proceed without breaking the rules.

IMO, it is reasonable to provide:

* pm_schedule_suspend(dev, delay) - schedule the execution of
->runtime_suspend(dev) after delay.

* pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.

* pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.

* pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
into the run-time PM workqueue.

* pm_runtime_get(dev) - increment the device's usage counter.

* pm_runtime_put(dev) - decrement the device's usage counter.

* pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
counter is zero and all of the device's children are suspended (or the
"ignore children" flag is set).

* pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
into the run-time PM workqueue. The usage counter and children will be
checked immediately before executing ->runtime_idle(dev).

I'm not sure if it is really necessary to combine pm_runtime_idle() or
pm_request_idle() with pm_runtime_put(). At least right now I don't see any
real value of that.

I also am not sure what error codes should be returned by the above helper
functions and in what conditions.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 1, 2009, 8:35 AM

Post #2 of 14 (285 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:

> On Tuesday 30 June 2009, Alan Stern wrote:
> ...
> > That's enough to give you the general idea. I think this design is
> > a lot cleaner than the current one.
>
> Well, I'm not really happy with starting over, but if you think we should do
> that, then let's do it.

It's not a complete restart. Much of the existing interface and quite
a bit of code would remain the same.

> I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> and ->runtime_resume() make sense. Now, the role of the framework, IMO, is to
> provide a mechanism by which it is possible:
> (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
> interrupt context,
> (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
> from interrupt context,
> (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
> synchronous way (I'm not sure about ->runtime_idle())

Yes, runtime_idle also, for drivers that require minimal overhead.

> _and_ to ensure that these callbacks will be executed when it makes sense.

Thus if the situation changes before the callback can be made, so that
it no longer makes sense, the framework should cancel the callback.

> There's no other point, because the core has no information to make choices,
> it can only prevent wrong things from happening, if possible.

Exactly.

> I think you will agree that the users of the framework should be able to
> prevent ->runtime_suspend() from being called and that's what the usage counter
> is for. Also, IMO it should be impossible to execute ->runtime_idle(), via the
> framework, when the usage counter is nonzero.

Right, because then by definition the device is in use so it can't be
idle.

> BTW, I don't think resume_count is the best name; it used to be in the version
> of my patch where it was automatically incremented when ->runtime_resume() was
> about to called. usage_count is probably better.

Fine.

> Next, I think that the framework should refuse to call ->runtime_suspend() and
> ->runtime_idle() if the children of the device are not suspended and the
> "ignore children" flag is unset.

Yes; this is part of the "makes sense" requirement.

> The counter of unsuspended children is used
> for that. I think the rule should be that it is decremented for the parent
> whenever ->runtime_suspend() is called for a child and it is incremented
> for the parent whenever ->runtime_resume() is called for a child.

Of course. (Minor change: decremented when runtime_suspend _succeeds_
for a child.)

> Now, the question is what rules should apply to the ordering and possible
> simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> ->runtime_resume(). I think the following rules make sense:

Oh dear. I wouldn't attempt to make a complete list of all possible
interactions. It's too hard to know whether you have really covered
all the cases.

> * It is forbidden to run ->runtime_suspend() twice in a row.
>
> * It is forbidden to run ->runtime_suspend() in parallel with another instance
> of ->runtime_suspend().
>
> * It is forbidden to run ->runtime_resume() twice in a row.
>
> * It is forbidden to run ->runtime_resume() in parallel with another instance
> of ->runtime_resume().
>
> * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
> ->runtime_idle(), but the latter case is preferred.
>
> * It is allowed to run ->runtime_resume() after ->runtime_suspend().
>
> * It is forbidden to run ->runtime_resume() after ->runtime_idle().
>
> * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
> parallel with each other.
>
> * It is forbidden to run ->runtime_idle() twice in a row.
>
> * It is forbidden to run ->runtime_idle() in parallel with another instance
> of ->runtime_idle().
>
> * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
>
> * It is allowed to run ->runtime_idle() after ->runtime_resume().
>
> * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
> ->runtime_idle() is running. In particular, it is allowed to (indirectly)
> call ->runtime_suspend() from within ->runtime_idle().
>
> * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
> ->runtime_suspend() is running.

We can summarize these rules as follows:

Never allow more than one callback at a time, except that
runtime_suspend may be invoked while runtime_idle is running.

Don't call runtime_resume while the device is active.

Don't call runtime_suspend or runtime_idle while the device
is suspended.

Don't invoke any callbacks if the device state is unknown
(RPM_ERROR).

Implicit is the notion that the device is suspended when
runtime_suspend returns successfully, it is active when runtime_resume
returns successfully, and it is unknown when either returns an error.

> * If ->runtime_resume() is about to be called immediately after
> ->runtime_suspend(), the execution of ->runtime_suspend() should be
> prevented from happening, if possible, in which case the execution of
> ->runtime_resume() shouldn't happen.
>
> * If ->runtime_suspend() is about to be called immediately after
> ->runtime_resume(), the execution of ->runtime_resume() should be
> prevented from happening, if possible, in which case the execution of
> ->runtime_suspend() shouldn't happen.

These could be considered optional optimizations. Or if you prefer,
they could be covered by a "New requests override previous requests"
rule.

> [.Are there any more rules related to these callbacks we should take into
> account?]

Runtime PM callbacks are mutually exclusive with other driver
core callbacks (probe, remove, dev_pm_ops, etc.).

If a callback occurs asynchronously then it will be invoked
in process context. If it occurs as part of a synchronous
request then it is invoked in the caller's context.

Related to this is the requirement that pm_runtime_idle,
pm_runtime_suspend, and pm_runtime_resume must always be called in
process context whereas pm_runtime_idle_atomic,
pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
in any context.

> Next, if we agree about the rules above, the question is what helper functions
> should be provided by the core allowing these rules to be followed
> automatically and what error codes should be returned by them in case it
> wasn't possible to proceed without breaking the rules.
>
> IMO, it is reasonable to provide:
>
> * pm_schedule_suspend(dev, delay) - schedule the execution of
> ->runtime_suspend(dev) after delay.
>
> * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
>
> * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
>
> * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
> into the run-time PM workqueue.
>
> * pm_runtime_get(dev) - increment the device's usage counter.
>
> * pm_runtime_put(dev) - decrement the device's usage counter.
>
> * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
> counter is zero and all of the device's children are suspended (or the
> "ignore children" flag is set).
>
> * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
> into the run-time PM workqueue. The usage counter and children will be
> checked immediately before executing ->runtime_idle(dev).

Should the counters also be checked when the request is submitted?
And should the same go for pm_schedule_suspend? These are nontrivial
questions; good arguments can be made both ways.

> I'm not sure if it is really necessary to combine pm_runtime_idle() or
> pm_request_idle() with pm_runtime_put(). At least right now I don't see any
> real value of that.

Likewise combining pm_runtime_get with pm_runtime_resume. The only
value is to make things easier for drivers, because these will be very
common idioms.

> I also am not sure what error codes should be returned by the above helper
> functions and in what conditions.

The error codes you have been using seem okay to me, in general.

However, some of those requests would violate the rules in a trivial
way. For these we might return a positive value rather than a negative
error code. For example, calling pm_runtime_resume while the device is
already active shouldn't be considered an error. But it can't be
considered a complete success either, because it won't invoke the
runtime_resume method.

To be determined: How runtime PM will interact with system sleep.


About all I can add is the "New requests override previous requests"
policy. This would apply to all the non-synchronous requests, whether
they are delayed or added directly to the workqueue. If a new request
(synchronous or not) is received before the old one has started to run,
the old one will be cancelled. This holds even if the new request is
redundant, like a resume request received while the device is active.

There is one exception to this rule: An idle_notify request does not
cancel a delayed or queued suspend request.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rjw at sisk

Jul 1, 2009, 3:19 PM

Post #3 of 14 (286 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Wednesday 01 July 2009, Alan Stern wrote:
> On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
>
> > On Tuesday 30 June 2009, Alan Stern wrote:
> > ...
> > > That's enough to give you the general idea. I think this design is
> > > a lot cleaner than the current one.
> >
> > Well, I'm not really happy with starting over, but if you think we should do
> > that, then let's do it.
>
> It's not a complete restart. Much of the existing interface and quite
> a bit of code would remain the same.
>
> > I think we both agree that the callbacks, ->runtime_idle(), ->runtime_suspend()
> > and ->runtime_resume() make sense. Now, the role of the framework, IMO, is to
> > provide a mechanism by which it is possible:
> > (1) to schedule a delayed execution of ->runtime_suspend(), possibly from
> > interrupt context,
> > (2) to schedule execution of ->runtime_resume() or ->runtime_idle(), possibly
> > from interrupt context,
> > (3) to execute ->runtime_suspend() or ->runtime_resume() directly in a
> > synchronous way (I'm not sure about ->runtime_idle())
>
> Yes, runtime_idle also, for drivers that require minimal overhead.
>
> > _and_ to ensure that these callbacks will be executed when it makes sense.
>
> Thus if the situation changes before the callback can be made, so that
> it no longer makes sense, the framework should cancel the callback.

Yes, but there's one thing to consider. Suppose a remote wake-up causes a
resume request to be queued up and pm_runtime_resume() is called synchronously
exactly at the time the request's work function is started. There are two
attempts to resume in progress, but only one of them can call
->runtime_resume(), so what's the other one supposed to do? The asynchronous
one can just return error code, but the the caller of the synchronous
pm_runtime_resume() must know whether or not the resume was successful.
So, perhaps, if the synchronous resume happens to lose the race, it should
wait for the other one to complete, check the device's status and return 0 if
it's active? That wouldn't cause the workqueue thread to wait.

> > There's no other point, because the core has no information to make choices,
> > it can only prevent wrong things from happening, if possible.
>
> Exactly.
>
> > I think you will agree that the users of the framework should be able to
> > prevent ->runtime_suspend() from being called and that's what the usage counter
> > is for. Also, IMO it should be impossible to execute ->runtime_idle(), via the
> > framework, when the usage counter is nonzero.
>
> Right, because then by definition the device is in use so it can't be
> idle.
>
> > BTW, I don't think resume_count is the best name; it used to be in the version
> > of my patch where it was automatically incremented when ->runtime_resume() was
> > about to called. usage_count is probably better.
>
> Fine.
>
> > Next, I think that the framework should refuse to call ->runtime_suspend() and
> > ->runtime_idle() if the children of the device are not suspended and the
> > "ignore children" flag is unset.
>
> Yes; this is part of the "makes sense" requirement.
>
> > The counter of unsuspended children is used
> > for that. I think the rule should be that it is decremented for the parent
> > whenever ->runtime_suspend() is called for a child and it is incremented
> > for the parent whenever ->runtime_resume() is called for a child.
>
> Of course. (Minor change: decremented when runtime_suspend _succeeds_
> for a child.)
>
> > Now, the question is what rules should apply to the ordering and possible
> > simultaneous execution of ->runtime_idle(), ->runtime_suspend() and
> > ->runtime_resume(). I think the following rules make sense:
>
> Oh dear. I wouldn't attempt to make a complete list of all possible
> interactions. It's too hard to know whether you have really covered
> all the cases.
>
> > * It is forbidden to run ->runtime_suspend() twice in a row.
> >
> > * It is forbidden to run ->runtime_suspend() in parallel with another instance
> > of ->runtime_suspend().
> >
> > * It is forbidden to run ->runtime_resume() twice in a row.
> >
> > * It is forbidden to run ->runtime_resume() in parallel with another instance
> > of ->runtime_resume().
> >
> > * It is allowed to run ->runtime_suspend() after ->runtime_resume() or after
> > ->runtime_idle(), but the latter case is preferred.
> >
> > * It is allowed to run ->runtime_resume() after ->runtime_suspend().
> >
> > * It is forbidden to run ->runtime_resume() after ->runtime_idle().
> >
> > * It is forbidden to run ->runtime_suspend() and ->runtime_resume() in
> > parallel with each other.
> >
> > * It is forbidden to run ->runtime_idle() twice in a row.
> >
> > * It is forbidden to run ->runtime_idle() in parallel with another instance
> > of ->runtime_idle().
> >
> > * It is forbidden to run ->runtime_idle() after ->runtime_suspend().
> >
> > * It is allowed to run ->runtime_idle() after ->runtime_resume().
> >
> > * It is allowed to execute ->runtime_suspend() or ->runtime_resume() when
> > ->runtime_idle() is running. In particular, it is allowed to (indirectly)
> > call ->runtime_suspend() from within ->runtime_idle().
> >
> > * It is forbidden to execute ->runtime_idle() when ->runtime_resume() or
> > ->runtime_suspend() is running.
>
> We can summarize these rules as follows:
>
> Never allow more than one callback at a time, except that
> runtime_suspend may be invoked while runtime_idle is running.

Caution here. If ->runtime_idle() runs ->runtime_suspend() and immediately
after that resume is requested by remote wake-up, ->runtime_resume() may also
be run while ->runtime_idle() is still running.

OTOH, we need to know when ->runtime_idle() has completed, because we have to
ensure it won't still be running after run-time PM has been disabled for the
device.

IMO, we need two flags, one indicating that either ->runtime_suspend(), or
->runtime_resume() is being executed (they are mutually exclusive) and the
the other one indicating that ->runtime_idle() is being executed. For the
purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
RPM_IN_TRANSITION.

With this notation, the above rule may be translated as:

Don't run any of the callbacks if RPM_IN_TRANSITION is set. Don't run
->runtime_idle() if RPM_IDLE_RUNNING is set.

Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

> Don't call runtime_resume while the device is active.
>
> Don't call runtime_suspend or runtime_idle while the device
> is suspended.
>
> Don't invoke any callbacks if the device state is unknown
> (RPM_ERROR).
>
> Implicit is the notion that the device is suspended when
> runtime_suspend returns successfully, it is active when runtime_resume
> returns successfully, and it is unknown when either returns an error.

Yes.

There are two possible "final" states, so I'd use one flag to indicate the
current status. Let's call it RPM_SUSPENDED for now (which means that the
device is suspended when it's set and active otherwise) and I think we can make
the rule that this flag is only changed after successful execution of
->runtime_suspend() or ->runtime_resume().

Whether the device is suspending or resuming follows from the values of
RPM_SUSPENDED and RPM_IN_TRANSITION.

> > * If ->runtime_resume() is about to be called immediately after
> > ->runtime_suspend(), the execution of ->runtime_suspend() should be
> > prevented from happening, if possible, in which case the execution of
> > ->runtime_resume() shouldn't happen.
> >
> > * If ->runtime_suspend() is about to be called immediately after
> > ->runtime_resume(), the execution of ->runtime_resume() should be
> > prevented from happening, if possible, in which case the execution of
> > ->runtime_suspend() shouldn't happen.
>
> These could be considered optional optimizations. Or if you prefer,
> they could be covered by a "New requests override previous requests"
> rule.

I'm not sure if I agree with this rule yet.

> > [.Are there any more rules related to these callbacks we should take into
> > account?]
>
> Runtime PM callbacks are mutually exclusive with other driver
> core callbacks (probe, remove, dev_pm_ops, etc.).

OK

> If a callback occurs asynchronously then it will be invoked
> in process context. If it occurs as part of a synchronous
> request then it is invoked in the caller's context.
>
> Related to this is the requirement that pm_runtime_idle,
> pm_runtime_suspend, and pm_runtime_resume must always be called in
> process context whereas pm_runtime_idle_atomic,
> pm_runtime_suspend_atomic, and pm_runtime_resume_atomic may be called
> in any context.

OK

> > Next, if we agree about the rules above, the question is what helper functions
> > should be provided by the core allowing these rules to be followed
> > automatically and what error codes should be returned by them in case it
> > wasn't possible to proceed without breaking the rules.
> >
> > IMO, it is reasonable to provide:
> >
> > * pm_schedule_suspend(dev, delay) - schedule the execution of
> > ->runtime_suspend(dev) after delay.
> >
> > * pm_runtime_suspend(dev) - execute ->runtime_suspend(dev) right now.
> >
> > * pm_runtime_resume(dev) - execute ->runtime_resume(dev) right now.
> >
> > * pm_request_resume(dev) - put a request to execute ->runtime_resume(dev)
> > into the run-time PM workqueue.
> >
> > * pm_runtime_get(dev) - increment the device's usage counter.
> >
> > * pm_runtime_put(dev) - decrement the device's usage counter.
> >
> > * pm_runtime_idle(dev) - execute ->runtime_idle(dev) right now if the usage
> > counter is zero and all of the device's children are suspended (or the
> > "ignore children" flag is set).
> >
> > * pm_request_idle(dev) - put a request to execute ->runtime_idle(dev)
> > into the run-time PM workqueue. The usage counter and children will be
> > checked immediately before executing ->runtime_idle(dev).
>
> Should the counters also be checked when the request is submitted?
> And should the same go for pm_schedule_suspend? These are nontrivial
> questions; good arguments can be made both ways.

That's the difficult part. :-)

First, I think a delayed suspend should be treated in a special way, because
it's not really a request to suspend. Namely, as long as the timer hasn't
triggered yet, nothing happens and there's nothing against the rules above.
A request to suspend is queued up after the timer has triggered and the timer
function is where the rules come into play. IOW, it consists of two
operations, setting up a timer and queuing up a request to suspend when the
timer triggers. IMO the first of them can be done at any time, while the other
one may be affected by the rules.

It implies that we should really introduce a timer and a timer function that
will queue up suspend requests, instead of using struct delayed_work.

Second, I think it may be a good idea to use the usage counter to block further
requests while submitting a resume request.

Namely, suppose that pm_request_resume() increments usage_count and returns 0,
if the resume was not necessary and the caller can do the I/O by itself, or
error code, which means that it was necessary to queue up a resume request.
If 0 is returned, the caller is supposed to do the I/O and call
pm_runtime_put() when done. Otherwise it just quits and ->runtime_resume() is
supposed to take care of the I/O, in which case the request's work function
should call pm_runtime_put() when done. [.If it was impossible to queue up a
request, error code is returned, but the usage counter is decremented by
pm_request_resume(), so that the caller need not handle that special case,
hopefully rare.]

This implies that it may be a good idea to check usage_count when submitting
idle notification and suspend requests (where in case of suspend a request is
submitted by the timer function, when the timer has already triggered, so
there's no need to check the counter while setting up the timer).

The counter of unsuspended children may change after a request has been
submitted and before its work function has a chance to run, so I don't see much
point checking it when submitting requests.

So, if the above idea is adopted, idle notification and suspend requests
won't be queued up when a resume request is pending (there's the question what
the timer function attempting to queue up a suspend request is supposed to do
in such a case) and in the other cases we can use the following rules:

Any pending request takes precedence over a new idle notification request.

If a new request is not an idle notification request, it takes precedence
over the pending one, so it cancels it with the help of cancel_work().

[.In the latter case, if a suspend request is canceled, we may want to set up the
timer for another one.] For that, we're going to need a single flag, say
RPM_PENDING, which is set whenever a request is queued up.

> > I'm not sure if it is really necessary to combine pm_runtime_idle() or
> > pm_request_idle() with pm_runtime_put(). At least right now I don't see any
> > real value of that.
>
> Likewise combining pm_runtime_get with pm_runtime_resume. The only
> value is to make things easier for drivers, because these will be very
> common idioms.
>
> > I also am not sure what error codes should be returned by the above helper
> > functions and in what conditions.
>
> The error codes you have been using seem okay to me, in general.
>
> However, some of those requests would violate the rules in a trivial
> way. For these we might return a positive value rather than a negative
> error code. For example, calling pm_runtime_resume while the device is
> already active shouldn't be considered an error. But it can't be
> considered a complete success either, because it won't invoke the
> runtime_resume method.

That need not matter from the caller's point of view, though. In the case of
pm_runtime_resume() the caller will probably be mostly interested whether or
not it can do I/O after the function has returned.

> To be determined: How runtime PM will interact with system sleep.

Yes. My first idea was to disable run-time PM before entering a system sleep
state, but that would involve canceling all of the pending requests.

> About all I can add is the "New requests override previous requests"
> policy. This would apply to all the non-synchronous requests, whether
> they are delayed or added directly to the workqueue. If a new request
> (synchronous or not) is received before the old one has started to run,
> the old one will be cancelled. This holds even if the new request is
> redundant, like a resume request received while the device is active.
>
> There is one exception to this rule: An idle_notify request does not
> cancel a delayed or queued suspend request.

I'm not sure if such a rigid rule will be really useful.

Also, as I said above, I think we shouldn't regard setting up the suspend
timer as queuing up a request, but as a totally separate operation.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rjw at sisk

Jul 2, 2009, 8:42 AM

Post #4 of 14 (273 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Thursday 02 July 2009, Rafael J. Wysocki wrote:
> On Wednesday 01 July 2009, Alan Stern wrote:
> > On Wed, 1 Jul 2009, Rafael J. Wysocki wrote:
...
> > Should the counters also be checked when the request is submitted?
> > And should the same go for pm_schedule_suspend? These are nontrivial
> > questions; good arguments can be made both ways.
>
> That's the difficult part. :-)
>
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend. Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play. IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers. IMO the first of them can be done at any time, while the other
> one may be affected by the rules.
>
> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.
>
> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
>
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done. Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done. [.If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]
>
> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
>
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.
>
> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
>
> Any pending request takes precedence over a new idle notification request.
>
> If a new request is not an idle notification request, it takes precedence
> over the pending one, so it cancels it with the help of cancel_work().
>
> [.In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.] For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

After some reconsideration I'd like to change that in the following way:

Any pending request takes precedence over a new idle notification request.

A pending request takes precedence over a new request of the same type.

If the new request is not an idle notification request and is not of the
same type as the pending one, it takes precedence over the pending one, so
it cancels the pending request with the help of cancel_work().

So, instead of a single flag, I'd like to use a 2-bit field to store
information about pending requests, where the 4 values are RPM_REQ_NONE,
RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME.

Also, IMO it makes sense to queue up an idle notification or suspend request
regardless of the current status of the device, as long as the usage counter is
greater than 0, because the status can always change after the request has been
submitted and before its work function is executed.

So, I think we can use something like this:

struct dev_pm_info {
pm_message_t power_state;
unsigned can_wakeup:1;
unsigned should_wakeup:1;
enum dpm_state status; /* Owned by the PM core */
#ifdef CONFIG_PM_SLEEP
struct list_head entry;
#endif
#ifdef CONFIG_PM_RUNTIME
struct timer_list suspend_timer;
wait_queue_head_t wait_queue;
struct work_struct work;
spinlock_t lock;
atomic_t usage_count;
atomic_t child_count;
unsigned int ignore_children:1;
unsigned int enabled:1; /* 'true' if run-time PM is enabled */
unsigned int idle_notification:1; /* 'true' if ->runtime_idle() is running */
unsigned int in_transition:1; /* 'true' if ->runtime_[suspend|resume]() is running */
unsigned int suspended:1; /* 'true' if current status is 'suspended' */
unsigned int pending_request:2; /* RPM_REQ_NONE, RPM_REQ_IDLE, etc. */
unsigned int runtime_error:1; /* 'ture' if the last transition failed */
int error_code; /* Error code returned by the last executed callback */
#endif
};

with the following rules regarding the (most important) helper functions:

pm_schedule_suspend(dev, delay) is always successful. It adds a new timer
with pm_request_suspend() as the timer function, dev as the data and
jiffies + delay as the expiration time. If the timer is pending when this
function is called, the timer is deactivated using del_timer() and replaced
by the new timer.

pm_request_suspend() checks if 'usage_count' is zero and returns -EAGAIN
if not. Next, it checks if 'pending_request' is RPM_REQ_SUSPEND and returns
-EALREADY in that case. Next, if 'pending_request' is RPM_REQ_IDLE,
the request is cancelled. Finally, a new suspend request is submitted.

pm_runtime_suspend() checks if 'usage_count' is zero and returns -EAGAIN
if not. Next, it checks 'in_transition' and 'suspended' and returns 0 if the
former is unset and the latter is set. If 'in_transition' is set and 'suspended'
is not set (the device is currently suspending), the behavior depends on
whether or not the function was called synchronously, in which case it waits
for the other suspend to finish. If it was called via the workqueue,
-EINPROGRESS is returned. Next, 'in_transition' is set, ->runtime_suspend()
is executed amd 'in_transition' is unset. If ->runtime_suspend() returned 0,
'suspended' is set and 0 is returned. Otherwise, if the error code was
-EAGAIN or -EBUSY, 'suspended' is not set and the error code is returned.
Otherwise, 'runtime_error' is set and the error code is returned ('suspended'
is not set).

pm_request_resume() increments 'usage_count' and checks 'suspended' and
'in_transition'. If both 'suspended' and 'in_transition" are not set, 0 is
returned and the caller is supposed to decrement 'usage_count', with the
help of pm_runtime_put(). Otherwise, the function checks if
'pending_request' is different from zero, in which case the pending request
is canceled. Finally, a new resume request is submitted and -EBUSY is
returned. In that case, 'usage_count' will be decremented by the request's
work function (not by pm_runtime_resume(), but by the wrapper function that
calls it).

pm_runtime_resume() increments 'usage_count' and checks 'in_transition' and
'suspended'. If both are unset, 0 is returned. If both are set (the device
is resuming) the behavior depends on whether or not the function was called
synchronously, in which case it waits for the concurrent resume to complete,
while it immediately returns -EINPROGRESS in the other case. If 'suspended'
is not set, but 'in_transision' is set (the device is suspending), the
function waits for the suspend to complete and starts over. Next,
'in_transition' is set, ->runtime_resume() is executed and 'in_transition'
is unset. If ->runtime_resume() returned 0, 'suspended' is unset and 0 is
returned. Otherwise, 'runtime_error' is set and the error code from
->runtime_resume() is returned ('suspended' is not unset). 'usage_count' is
always decremented before return, regardless of the return value.

pm_request_idle() checks 'usage_count' and returns -EAGAIN if it's greater
than 0. Next, it checks 'pending_request' and immediately returns -EBUSY, if
it's different from RPM_REQ_NONE and RPM_REQ_IDLE, or -EALREADY, if it's
equal to RPM_REQ_IDLE. Finally, new idle notification request is submitted.

pm_runtime_idle() checks 'usage_count' and returns -EAGAIN if it's greater
than 0. Next, it checks 'suspended' and 'in_transition' and returns -EBUSY
if any of them is set. Next, it checks 'idle_notification' and returns
-EINPROGRESS is it's set. Finally, 'idle_notification' is set,
->runtime_idle() is executed and 'idle_notification' is unset.

Additionally, all of the helper functions return -EINVAL immediately if
'runtime_error' is set.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 2, 2009, 8:55 AM

Post #5 of 14 (273 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > > _and_ to ensure that these callbacks will be executed when it makes sense.
> >
> > Thus if the situation changes before the callback can be made, so that
> > it no longer makes sense, the framework should cancel the callback.
>
> Yes, but there's one thing to consider. Suppose a remote wake-up causes a
> resume request to be queued up and pm_runtime_resume() is called synchronously
> exactly at the time the request's work function is started. There are two
> attempts to resume in progress, but only one of them can call
> ->runtime_resume(), so what's the other one supposed to do? The asynchronous
> one can just return error code, but the the caller of the synchronous
> pm_runtime_resume() must know whether or not the resume was successful.
> So, perhaps, if the synchronous resume happens to lose the race, it should
> wait for the other one to complete, check the device's status and return 0 if
> it's active? That wouldn't cause the workqueue thread to wait.

I didn't address this explicitly in the previous message, but yes.
This is no different from the way your current version works.

Similarly, if a synchronous resume call occurs while a suspend is in
progress, it should wait until the suspend finishes and then carry out
a resume.

> > We can summarize these rules as follows:
> >
> > Never allow more than one callback at a time, except that
> > runtime_suspend may be invoked while runtime_idle is running.
>
> Caution here. If ->runtime_idle() runs ->runtime_suspend() and immediately
> after that resume is requested by remote wake-up, ->runtime_resume() may also
> be run while ->runtime_idle() is still running.

Yes, I didn't think of that case. We have to allow either of the other
two to be invoked while runtime_idle is running. But we can rule out
calling runtime_idle recursively.

> OTOH, we need to know when ->runtime_idle() has completed, because we have to
> ensure it won't still be running after run-time PM has been disabled for the
> device.
>
> IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> ->runtime_resume() is being executed (they are mutually exclusive) and the
> the other one indicating that ->runtime_idle() is being executed. For the
> purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> RPM_IN_TRANSITION.

The RPM_IN_TRANSITION flag is unnecessary. It would always be equal to
(status == RPM_SUSPENDING || status == RPM_RESUMING).

> With this notation, the above rule may be translated as:
>
> Don't run any of the callbacks if RPM_IN_TRANSITION is set. Don't run
> ->runtime_idle() if RPM_IDLE_RUNNING is set.
>
> Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.

That is equivalent to my conclusion above.

> There are two possible "final" states, so I'd use one flag to indicate the
> current status. Let's call it RPM_SUSPENDED for now (which means that the
> device is suspended when it's set and active otherwise) and I think we can make
> the rule that this flag is only changed after successful execution of
> ->runtime_suspend() or ->runtime_resume().
>
> Whether the device is suspending or resuming follows from the values of
> RPM_SUSPENDED and RPM_IN_TRANSITION.

You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a
single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).
It doesn't make much difference which you choose.


> > Should the counters also be checked when the request is submitted?
> > And should the same go for pm_schedule_suspend? These are nontrivial
> > questions; good arguments can be made both ways.
>
> That's the difficult part. :-)
>
> First, I think a delayed suspend should be treated in a special way, because
> it's not really a request to suspend. Namely, as long as the timer hasn't
> triggered yet, nothing happens and there's nothing against the rules above.
> A request to suspend is queued up after the timer has triggered and the timer
> function is where the rules come into play. IOW, it consists of two
> operations, setting up a timer and queuing up a request to suspend when the
> timer triggers. IMO the first of them can be done at any time, while the other
> one may be affected by the rules.

I don't agree. For example, suppose the device has an active child
when the driver says: Suspend it in 30 seconds. If the child is then
removed after only 10 seconds, does it make sense to go ahead with
suspending the parent 20 seconds later? No -- if the parent is going
to be suspended, the decision as to when should be made at the time the
child is removed, not beforehand.

(Even more concretely, suppose there is a 30-second inactivity timeout
for autosuspend. Removing the child counts as activity and so should
restart the timer.)

To put it another way, suppose you accept a delayed request under
inappropriate conditions. If the conditions don't change, the whole
thing was a waste of effort. And if the conditions do change, then the
whole delayed request should be reconsidered anyhow. So why accept it?

> It implies that we should really introduce a timer and a timer function that
> will queue up suspend requests, instead of using struct delayed_work.

Yes, this was part of my proposal.

> Second, I think it may be a good idea to use the usage counter to block further
> requests while submitting a resume request.
>
> Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> if the resume was not necessary and the caller can do the I/O by itself, or
> error code, which means that it was necessary to queue up a resume request.
> If 0 is returned, the caller is supposed to do the I/O and call
> pm_runtime_put() when done. Otherwise it just quits and ->runtime_resume() is
> supposed to take care of the I/O, in which case the request's work function
> should call pm_runtime_put() when done. [.If it was impossible to queue up a
> request, error code is returned, but the usage counter is decremented by
> pm_request_resume(), so that the caller need not handle that special case,
> hopefully rare.]

Trying to keep track of reasons for incrementing and decrementing
usage_count is very difficult to do in the core. What happens if
pm_request_resume increments the count but then the driver calls
pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work
routine can run?

It's better to make the driver responsible for maintaining the counter
value. Forcing the driver to do pm_runtime_get, pm_request_resume is
better than having the core automatically change the counter.

> This implies that it may be a good idea to check usage_count when submitting
> idle notification and suspend requests (where in case of suspend a request is
> submitted by the timer function, when the timer has already triggered, so
> there's no need to check the counter while setting up the timer).
>
> The counter of unsuspended children may change after a request has been
> submitted and before its work function has a chance to run, so I don't see much
> point checking it when submitting requests.

As I said above, if the counters don't change then the submission was
unnecessary, and if they do change then the submission should be
reconsidered. Therefore they _should_ be checked in submissions.

> So, if the above idea is adopted, idle notification and suspend requests
> won't be queued up when a resume request is pending (there's the question what
> the timer function attempting to queue up a suspend request is supposed to do
> in such a case) and in the other cases we can use the following rules:
>
> Any pending request takes precedence over a new idle notification request.

For pending resume requests this rule is unnecessary; it's invalid to
submit an idle notification request while a resume request is pending
(since resume requests can be pending only in the RPM_SUSPENDING and
RPM_SUSPENDED states while idle notification requests are accepted only
in RPM_RESUMING and RPM_ACTIVE).

For pending suspends, I think we should allow synchronous idle
notifications while the suspend is pending. The runtime_idle callback
might then start its own suspend before the workqueue can get around to
it. You're right about async idle requests though; that was the
exception I noted below.

> If a new request is not an idle notification request, it takes precedence
> over the pending one, so it cancels it with the help of cancel_work().
>
> [.In the latter case, if a suspend request is canceled, we may want to set up the
> timer for another one.] For that, we're going to need a single flag, say
> RPM_PENDING, which is set whenever a request is queued up.

That's what I called work_pending in my proposal.

> > The error codes you have been using seem okay to me, in general.
> >
> > However, some of those requests would violate the rules in a trivial
> > way. For these we might return a positive value rather than a negative
> > error code. For example, calling pm_runtime_resume while the device is
> > already active shouldn't be considered an error. But it can't be
> > considered a complete success either, because it won't invoke the
> > runtime_resume method.
>
> That need not matter from the caller's point of view, though. In the case of
> pm_runtime_resume() the caller will probably be mostly interested whether or
> not it can do I/O after the function has returned.

Yes. But the driver might depend on something happening inside the
runtime_resume method, so it would need to know if a successful
pm_runtime_resume wasn't going to invoke the callback.

> > To be determined: How runtime PM will interact with system sleep.
>
> Yes. My first idea was to disable run-time PM before entering a system sleep
> state, but that would involve canceling all of the pending requests.

Or simply freezing the workqueue.

> > About all I can add is the "New requests override previous requests"
> > policy. This would apply to all the non-synchronous requests, whether
> > they are delayed or added directly to the workqueue. If a new request
> > (synchronous or not) is received before the old one has started to run,
> > the old one will be cancelled. This holds even if the new request is
> > redundant, like a resume request received while the device is active.
> >
> > There is one exception to this rule: An idle_notify request does not
> > cancel a delayed or queued suspend request.
>
> I'm not sure if such a rigid rule will be really useful.

A rigid rule is easier to understand and apply than one with a large
number of special cases. However, in the statement of the rule above,
I forgot to mention that this applies only if the new request is valid,
i.e., if it's not forbidden by the current status or the counter
values.

> Also, as I said above, I think we shouldn't regard setting up the suspend
> timer as queuing up a request, but as a totally separate operation.

Well, there can't be any pending resume requests when the suspend timer
is set up, so we have to consider only pending idle notifications or
pending suspends. I agree, we would want to allow an idle notification
to remain pending when the suspend timer is set up. As for pending
suspends, we _should_ allow the new request to override the old one.
This will come up whenever the timeout value is changed.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rjw at sisk

Jul 2, 2009, 10:50 AM

Post #6 of 14 (272 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
>
> > > > _and_ to ensure that these callbacks will be executed when it makes sense.
> > >
> > > Thus if the situation changes before the callback can be made, so that
> > > it no longer makes sense, the framework should cancel the callback.
> >
> > Yes, but there's one thing to consider. Suppose a remote wake-up causes a
> > resume request to be queued up and pm_runtime_resume() is called synchronously
> > exactly at the time the request's work function is started. There are two
> > attempts to resume in progress, but only one of them can call
> > ->runtime_resume(), so what's the other one supposed to do? The asynchronous
> > one can just return error code, but the the caller of the synchronous
> > pm_runtime_resume() must know whether or not the resume was successful.
> > So, perhaps, if the synchronous resume happens to lose the race, it should
> > wait for the other one to complete, check the device's status and return 0 if
> > it's active? That wouldn't cause the workqueue thread to wait.
>
> I didn't address this explicitly in the previous message, but yes.
> This is no different from the way your current version works.
>
> Similarly, if a synchronous resume call occurs while a suspend is in
> progress, it should wait until the suspend finishes and then carry out
> a resume.

Agreed.

> > > We can summarize these rules as follows:
> > >
> > > Never allow more than one callback at a time, except that
> > > runtime_suspend may be invoked while runtime_idle is running.
> >
> > Caution here. If ->runtime_idle() runs ->runtime_suspend() and immediately
> > after that resume is requested by remote wake-up, ->runtime_resume() may also
> > be run while ->runtime_idle() is still running.
>
> Yes, I didn't think of that case. We have to allow either of the other
> two to be invoked while runtime_idle is running. But we can rule out
> calling runtime_idle recursively.
>
> > OTOH, we need to know when ->runtime_idle() has completed, because we have to
> > ensure it won't still be running after run-time PM has been disabled for the
> > device.
> >
> > IMO, we need two flags, one indicating that either ->runtime_suspend(), or
> > ->runtime_resume() is being executed (they are mutually exclusive) and the
> > the other one indicating that ->runtime_idle() is being executed. For the
> > purpose of further discussion below I'll call them RPM_IDLE_RUNNING and
> > RPM_IN_TRANSITION.
>
> The RPM_IN_TRANSITION flag is unnecessary. It would always be equal to
> (status == RPM_SUSPENDING || status == RPM_RESUMING).

I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

> > With this notation, the above rule may be translated as:
> >
> > Don't run any of the callbacks if RPM_IN_TRANSITION is set. Don't run
> > ->runtime_idle() if RPM_IDLE_RUNNING is set.
> >
> > Which implies that RPM_IDLE_RUNNING cannot be set when RPM_IN_TRANSITION is
> > set, but it is valid to set RPM_IN_TRANSITION when RPM_IDLE_RUNNING is set.
>
> That is equivalent to my conclusion above.
>
> > There are two possible "final" states, so I'd use one flag to indicate the
> > current status. Let's call it RPM_SUSPENDED for now (which means that the
> > device is suspended when it's set and active otherwise) and I think we can make
> > the rule that this flag is only changed after successful execution of
> > ->runtime_suspend() or ->runtime_resume().
> >
> > Whether the device is suspending or resuming follows from the values of
> > RPM_SUSPENDED and RPM_IN_TRANSITION.
>
> You can use two single-bit flags (SUSPEND and IN_TRANSITION) or a
> single two-bit state value (ACTIVE, SUSPENDING, SUSPENDED, RESUMING).
> It doesn't make much difference which you choose.

No, it doesn't.

Still, the additional flag for 'idle notification is in progress' is still
necessary for the following two reasons:

(1) Idle notifications cannot be run (synchronously) when one is already in
progress, so we need a means to determine whether or not this is the case.

(2) If run-time PM is to be disabled, the function doing that must guarantee
that ->runtime_idle() won't be running after it's returned, so it needs to
know how to check that.

> > > Should the counters also be checked when the request is submitted?
> > > And should the same go for pm_schedule_suspend? These are nontrivial
> > > questions; good arguments can be made both ways.
> >
> > That's the difficult part. :-)
> >
> > First, I think a delayed suspend should be treated in a special way, because
> > it's not really a request to suspend. Namely, as long as the timer hasn't
> > triggered yet, nothing happens and there's nothing against the rules above.
> > A request to suspend is queued up after the timer has triggered and the timer
> > function is where the rules come into play. IOW, it consists of two
> > operations, setting up a timer and queuing up a request to suspend when the
> > timer triggers. IMO the first of them can be done at any time, while the other
> > one may be affected by the rules.
>
> I don't agree. For example, suppose the device has an active child
> when the driver says: Suspend it in 30 seconds. If the child is then
> removed after only 10 seconds, does it make sense to go ahead with
> suspending the parent 20 seconds later? No -- if the parent is going
> to be suspended, the decision as to when should be made at the time the
> child is removed, not beforehand.

There are two functions, on that sets up the timer and the other that queues
up the request. This is the second one that makes the decision if the request
is still worth queuing up.

> (Even more concretely, suppose there is a 30-second inactivity timeout
> for autosuspend. Removing the child counts as activity and so should
> restart the timer.)
>
> To put it another way, suppose you accept a delayed request under
> inappropriate conditions. If the conditions don't change, the whole
> thing was a waste of effort. And if the conditions do change, then the
> whole delayed request should be reconsidered anyhow.

The problem is, even if you always accept a delayed request under appropriate
conditions, you still have to reconsider it before putting it into the work
queue, because the conditions might have changed. So, you'd like to do this:

(1) Check if the conditions are appropriate, set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

while I think it will be simpler to do this:

(1) Set up a timer.
(2) Check if the conditions are appropriate, queue up a suspend request.

In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
between (1) and (2), so I don't really see a practical difference.

> So why accept it?

Beacuse that simplifies things?

For example, suppose ->runtime_resume() has been called as
a result of a remote wake-up (ie. after pm_request_resume()) and it has some
I/O to process, but it is known beforehand that the device will most likely be
inactive after the I/O is done. So, it's tempting to call
pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
inappropriate (the device is not regarded as suspended). However, calling
pm_schedule_suspend() with a long enough delay doesn't break any rules related
to the ->runtime_*() callbacks, so why should it be forbidden?

Next, suppose pm_schedule_suspend() is called, but it fails because the
conditions are inappropriate. What's the caller supposed to do? Wait for the
conditions to change and repeat? But why should it bother if the conditions
may still change before ->runtime_suspend() is actually called?

IMO, it's the caller's problem whether or not what it does is useful or
efficient. The core's problem is to ensure that it doesn't break things.

> > It implies that we should really introduce a timer and a timer function that
> > will queue up suspend requests, instead of using struct delayed_work.
>
> Yes, this was part of my proposal.
>
> > Second, I think it may be a good idea to use the usage counter to block further
> > requests while submitting a resume request.
> >
> > Namely, suppose that pm_request_resume() increments usage_count and returns 0,
> > if the resume was not necessary and the caller can do the I/O by itself, or
> > error code, which means that it was necessary to queue up a resume request.
> > If 0 is returned, the caller is supposed to do the I/O and call
> > pm_runtime_put() when done. Otherwise it just quits and ->runtime_resume() is
> > supposed to take care of the I/O, in which case the request's work function
> > should call pm_runtime_put() when done. [.If it was impossible to queue up a
> > request, error code is returned, but the usage counter is decremented by
> > pm_request_resume(), so that the caller need not handle that special case,
> > hopefully rare.]
>
> Trying to keep track of reasons for incrementing and decrementing
> usage_count is very difficult to do in the core. What happens if
> pm_request_resume increments the count but then the driver calls
> pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work
> routine can run?

Nothing wrong, as long as the increments and decrements are balanced (if they
aren't balanced, there is a bug in the driver anyway). In fact, for this to
work we need the rule that a new request of the same type doesn't replace an
existing one. Then, the submitted resume request cannot be canceled, so the
work function will run and drop the usage counter.

> It's better to make the driver responsible for maintaining the counter
> value. Forcing the driver to do pm_runtime_get, pm_request_resume is
> better than having the core automatically change the counter.

So the caller will do:

pm_runtime_get(dev);
error = pm_request_resume(dev);
if (error)
goto out;
<process I/O>
pm_runtime_put();

but how is it supposed to ensure that pm_runtime_put() will be called after
executing the 'goto out' thing?

Anyway, we don't need to use the usage counter for that (although it's cheap).
Instead, we can make pm_request_suspend() and pm_request_idle() check if a
resume request is pending and fail if that's the case.

> > This implies that it may be a good idea to check usage_count when submitting
> > idle notification and suspend requests (where in case of suspend a request is
> > submitted by the timer function, when the timer has already triggered, so
> > there's no need to check the counter while setting up the timer).
> >
> > The counter of unsuspended children may change after a request has been
> > submitted and before its work function has a chance to run, so I don't see much
> > point checking it when submitting requests.
>
> As I said above, if the counters don't change then the submission was
> unnecessary, and if they do change then the submission should be
> reconsidered. Therefore they _should_ be checked in submissions.

Let's put it another way. What's the practical benefit to the caller if we
always check the counters in submissions?

> > So, if the above idea is adopted, idle notification and suspend requests
> > won't be queued up when a resume request is pending (there's the question what
> > the timer function attempting to queue up a suspend request is supposed to do
> > in such a case) and in the other cases we can use the following rules:
> >
> > Any pending request takes precedence over a new idle notification request.
>
> For pending resume requests this rule is unnecessary; it's invalid to
> submit an idle notification request while a resume request is pending
> (since resume requests can be pending only in the RPM_SUSPENDING and
> RPM_SUSPENDED states while idle notification requests are accepted only
> in RPM_RESUMING and RPM_ACTIVE).

It is correct nevertheless. :-)

> For pending suspends, I think we should allow synchronous idle
> notifications while the suspend is pending.

Sure, I was talking only about requests here, where by 'request' I understood
a work item put into the workqueue.

> The runtime_idle callback might then start its own suspend before the
> workqueue can get around to it. You're right about async idle requests
> though; that was the exception I noted below.
>
> > If a new request is not an idle notification request, it takes precedence
> > over the pending one, so it cancels it with the help of cancel_work().
> >
> > [.In the latter case, if a suspend request is canceled, we may want to set up the
> > timer for another one.] For that, we're going to need a single flag, say
> > RPM_PENDING, which is set whenever a request is queued up.
>
> That's what I called work_pending in my proposal.

Well, after some reconsideration I think it's not enough (as I wrote in my last
message), becuase it generally makes sense to make the following rule:

A pending request always takes precedence over a new request of the same
type.

So, for example, if pm_request_resume() is called and there's a resume request
pending already, the new pm_request_resume() should just let the pending
request alone and quit.

Thus, it seems reasonable to remember what type of a request is pending
(i don't think we can figure it out from the status fields in 100% of the
cases).

> > > The error codes you have been using seem okay to me, in general.
> > >
> > > However, some of those requests would violate the rules in a trivial
> > > way. For these we might return a positive value rather than a negative
> > > error code. For example, calling pm_runtime_resume while the device is
> > > already active shouldn't be considered an error. But it can't be
> > > considered a complete success either, because it won't invoke the
> > > runtime_resume method.
> >
> > That need not matter from the caller's point of view, though. In the case of
> > pm_runtime_resume() the caller will probably be mostly interested whether or
> > not it can do I/O after the function has returned.
>
> Yes. But the driver might depend on something happening inside the
> runtime_resume method, so it would need to know if a successful
> pm_runtime_resume wasn't going to invoke the callback.

Hmm. That would require the driver to know that the device was suspended,
but in that case pm_runtime_resume() returning 0 would mean that _someone_
ran ->runtime_resume() for it in any case.

If the driver doesn't know if the device was suspended beforehand, it cannot
depend on the execution of ->runtime_resume().

> > > To be determined: How runtime PM will interact with system sleep.
> >
> > Yes. My first idea was to disable run-time PM before entering a system sleep
> > state, but that would involve canceling all of the pending requests.
>
> Or simply freezing the workqueue.

Well, what about the synchronous calls? How are we going to prevent them
from happening after freezing the workqueue?

> > > About all I can add is the "New requests override previous requests"
> > > policy. This would apply to all the non-synchronous requests, whether
> > > they are delayed or added directly to the workqueue. If a new request
> > > (synchronous or not) is received before the old one has started to run,
> > > the old one will be cancelled. This holds even if the new request is
> > > redundant, like a resume request received while the device is active.
> > >
> > > There is one exception to this rule: An idle_notify request does not
> > > cancel a delayed or queued suspend request.
> >
> > I'm not sure if such a rigid rule will be really useful.
>
> A rigid rule is easier to understand and apply than one with a large
> number of special cases. However, in the statement of the rule above,
> I forgot to mention that this applies only if the new request is valid,
> i.e., if it's not forbidden by the current status or the counter
> values.

Ah, OK. I'd also like to add the rule about requests of the same type
(if there's one pending already, the new one is discareded).

> > Also, as I said above, I think we shouldn't regard setting up the suspend
> > timer as queuing up a request, but as a totally separate operation.
>
> Well, there can't be any pending resume requests when the suspend timer
> is set up, so we have to consider only pending idle notifications or
> pending suspends. I agree, we would want to allow an idle notification
> to remain pending when the suspend timer is set up. As for pending
> suspends, we _should_ allow the new request to override the old one.
> This will come up whenever the timeout value is changed.

Now there's a point in which allowing to set up the suspend timer at any time
simplifies things quite a bit. Namely, in that case, if pm_schedule_suspend()
is called and it sees a timer pending, it deactivates the timer with
del_timer() and sets up a new one with add_timer(). It doesn't need to worry
about whether the suspend request has been queued up already or
pm_runtime_suspend() is running or something. Things will work themselves out
anyway eventually.

Otherwise, after calling del_timer() we'll need to check if the timer was pending
and if it wasn't, then if the suspend requests has been queued up already, and
if it has, then if pm_runtime_suspend() is running (the current status is
RPM_SUSPENDING) etc. That doesn't look particularly clean.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 2, 2009, 12:53 PM

Post #7 of 14 (270 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:

> > The RPM_IN_TRANSITION flag is unnecessary. It would always be equal to
> > (status == RPM_SUSPENDING || status == RPM_RESUMING).
>
> I thought of replacing the old flags with RPM_IN_TRANSITION, actually.

Okay, but hopefully you won't mind if I continue to use the old state
names in conversation.

> Still, the additional flag for 'idle notification is in progress' is still
> necessary for the following two reasons:
>
> (1) Idle notifications cannot be run (synchronously) when one is already in
> progress, so we need a means to determine whether or not this is the case.
>
> (2) If run-time PM is to be disabled, the function doing that must guarantee
> that ->runtime_idle() won't be running after it's returned, so it needs to
> know how to check that.

Agreed.


> > I don't agree. For example, suppose the device has an active child
> > when the driver says: Suspend it in 30 seconds. If the child is then
> > removed after only 10 seconds, does it make sense to go ahead with
> > suspending the parent 20 seconds later? No -- if the parent is going
> > to be suspended, the decision as to when should be made at the time the
> > child is removed, not beforehand.
>
> There are two functions, on that sets up the timer and the other that queues
> up the request. This is the second one that makes the decision if the request
> is still worth queuing up.
>
> > (Even more concretely, suppose there is a 30-second inactivity timeout
> > for autosuspend. Removing the child counts as activity and so should
> > restart the timer.)
> >
> > To put it another way, suppose you accept a delayed request under
> > inappropriate conditions. If the conditions don't change, the whole
> > thing was a waste of effort. And if the conditions do change, then the
> > whole delayed request should be reconsidered anyhow.
>
> The problem is, even if you always accept a delayed request under appropriate
> conditions, you still have to reconsider it before putting it into the work
> queue, because the conditions might have changed. So, you'd like to do this:
>
> (1) Check if the conditions are appropriate, set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
>
> while I think it will be simpler to do this:
>
> (1) Set up a timer.
> (2) Check if the conditions are appropriate, queue up a suspend request.
>
> In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> between (1) and (2), so I don't really see a practical difference.

A cycle like that would cancel the timer anyway. Maybe that's what you
meant...

Hmm. What sort of conditions are we talking about? One possiblity is
that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED. It's
completely useless to start a timer then; if the state changes the
timer will be cancelled, and if it doesn't change then the request
won't be queued when the timer expires.

The other possiblity is that either the children or usage counter is
positive. If the counter decrements to 0 so that a suspend is feasible
then we would send an idle notification. At that point the driver
could decide what to do; the most likely response would be to
reschedule the suspend. In fact, it's hard to think of a situation
where the driver would want to just let the timer keep on running.

> For example, suppose ->runtime_resume() has been called as
> a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> I/O to process, but it is known beforehand that the device will most likely be
> inactive after the I/O is done. So, it's tempting to call
> pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> inappropriate (the device is not regarded as suspended).

?? Conditions are perfectly appropriate, since suspend requests are
allowed in the RESUMING state.

Unless the driver also did a pm_runtime_get, of course. But in that
case it would have to do a pm_runtime_put eventually, at which point it
could schedule the suspend.

> However, calling
> pm_schedule_suspend() with a long enough delay doesn't break any rules related
> to the ->runtime_*() callbacks, so why should it be forbidden?

It isn't.

> Next, suppose pm_schedule_suspend() is called, but it fails because the
> conditions are inappropriate. What's the caller supposed to do? Wait for the
> conditions to change and repeat?

In a manner of speaking. More precisely, whatever code is responsible
for changing the conditions should call pm_schedule_suspend. Or set up
an idle notification, leading indirectly to pm_schedule_suspend.

> But why should it bother if the conditions
> may still change before ->runtime_suspend() is actually called?

It should bother because conditions might _not_ change, in which case
the suspend would occur. But for what you are proposing, if the
conditions don't change then the suspend will not occur.

> IMO, it's the caller's problem whether or not what it does is useful or
> efficient. The core's problem is to ensure that it doesn't break things.

But what's the drawback? The extra overhead of checking whether two
counters are positive is minuscule compared to the effort of setting up
a timer. And it's even better when you consider that the mostly likely
outcome of letting the timer run is that the timer handler would fail
to queue a suspend request (because the counters are unchanged).


> > Trying to keep track of reasons for incrementing and decrementing
> > usage_count is very difficult to do in the core. What happens if
> > pm_request_resume increments the count but then the driver calls
> > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work
> > routine can run?
>
> Nothing wrong, as long as the increments and decrements are balanced (if they
> aren't balanced, there is a bug in the driver anyway).

That's my point -- in this situation it's very difficult for the driver
to balance them. There would be no decrement to balance
pm_request_resume's automatic increment, because the work routine would
never run.

> In fact, for this to
> work we need the rule that a new request of the same type doesn't replace an
> existing one. Then, the submitted resume request cannot be canceled, so the
> work function will run and drop the usage counter.

A new pm_schedule_suspend _should_ replace an existing one. For
idle_notify and resume requests, this rule is more or less a no-op.

> > It's better to make the driver responsible for maintaining the counter
> > value. Forcing the driver to do pm_runtime_get, pm_request_resume is
> > better than having the core automatically change the counter.
>
> So the caller will do:
>
> pm_runtime_get(dev);
> error = pm_request_resume(dev);
> if (error)
> goto out;
> <process I/O>
> pm_runtime_put();

[."error" isn't a good name. The return value would be 0 to indicate
the request was accepted and queued, or 1 to indicate the device is
already active. Or perhaps vice versa.]

> but how is it supposed to ensure that pm_runtime_put() will be called after
> executing the 'goto out' thing?

The same way it knows that the runtime_resume method has to process the
pending I/O. That is, the presence of I/O to process means that once
the processing is over, the driver should call pm_runtime_put.

> Anyway, we don't need to use the usage counter for that (although it's cheap).
> Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> resume request is pending and fail if that's the case.

But what about pm_runtime_suspend? I think we need to use the counter.
Besides, the states in which suspend requests and idle requests are
valid are disjoint from the states in which resume requests are valid.

> Let's put it another way. What's the practical benefit to the caller if we
> always check the counters in submissions?

It saves the overhead of setting up and running a useless timer. It
avoids a race between the timer routine and pm_runtime_put.


> > > Any pending request takes precedence over a new idle notification request.
> >
> > For pending resume requests this rule is unnecessary; it's invalid to
> > submit an idle notification request while a resume request is pending
> > (since resume requests can be pending only in the RPM_SUSPENDING and
> > RPM_SUSPENDED states while idle notification requests are accepted only
> > in RPM_RESUMING and RPM_ACTIVE).
>
> It is correct nevertheless. :-)

Okay, if you want. Provided you agree that "pending request" doesn't
include unexpired suspend timers.

> Well, after some reconsideration I think it's not enough (as I wrote in my last
> message), becuase it generally makes sense to make the following rule:
>
> A pending request always takes precedence over a new request of the same
> type.
>
> So, for example, if pm_request_resume() is called and there's a resume request
> pending already, the new pm_request_resume() should just let the pending
> request alone and quit.

Do you mean we shouldn't cancel the work item and then requeue it? I
agree. In fact I'd go even farther: If the timer routine find an idle
request pending, it shouldn't cancel it -- instead it should simply
change async_action to ASYNC_SUSPEND. That's a simple optimization.
Regardless, the effect isn't visible to drivers.

> Thus, it seems reasonable to remember what type of a request is pending
> (i don't think we can figure it out from the status fields in 100% of the
> cases).

That's what the async_action field in my proposal is for.


> > Yes. But the driver might depend on something happening inside the
> > runtime_resume method, so it would need to know if a successful
> > pm_runtime_resume wasn't going to invoke the callback.
>
> Hmm. That would require the driver to know that the device was suspended,
> but in that case pm_runtime_resume() returning 0 would mean that _someone_
> ran ->runtime_resume() for it in any case.
>
> If the driver doesn't know if the device was suspended beforehand, it cannot
> depend on the execution of ->runtime_resume().

Exactly. Therefore it needs to be told if pm_runtime_resume isn't
going to call the runtime_resume method, so that it can take
appropriate remedial action.


> > > > To be determined: How runtime PM will interact with system sleep.
> > >
> > > Yes. My first idea was to disable run-time PM before entering a system sleep
> > > state, but that would involve canceling all of the pending requests.
> >
> > Or simply freezing the workqueue.
>
> Well, what about the synchronous calls? How are we going to prevent them
> from happening after freezing the workqueue?

How about your "rpm_disabled" flag?


> Now there's a point in which allowing to set up the suspend timer at any time
> simplifies things quite a bit. Namely, in that case, if pm_schedule_suspend()
> is called and it sees a timer pending, it deactivates the timer with
> del_timer() and sets up a new one with add_timer(). It doesn't need to worry
> about whether the suspend request has been queued up already or
> pm_runtime_suspend() is running or something. Things will work themselves out
> anyway eventually.
>
> Otherwise, after calling del_timer() we'll need to check if the timer was pending
> and if it wasn't, then if the suspend requests has been queued up already, and
> if it has, then if pm_runtime_suspend() is running (the current status is
> RPM_SUSPENDING) etc. That doesn't look particularly clean.

It's not as bad as you think. In pseudo code:

ret = suspend_allowed(dev);
if (ret)
return ret;
if (dev->power.timer_expiration) {
del_timer(&dev->power.timer);
dev->power.timer_expiration = 0;
}
if (dev->power.work_pending) {
cancel_work(&dev->power.work);
dev->power.work_pending = 0;
dev->power.async_action = 0;
}
dev->power.timer_expiration = max(jiffies + delay, 1UL);
mod_timer(&dev->power.timer, delay);

The middle section could usefully be put in a subroutine.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rjw at sisk

Jul 2, 2009, 4:05 PM

Post #8 of 14 (269 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Thursday 02 July 2009, Alan Stern wrote:
> On Thu, 2 Jul 2009, Rafael J. Wysocki wrote:
>
> > > The RPM_IN_TRANSITION flag is unnecessary. It would always be equal to
> > > (status == RPM_SUSPENDING || status == RPM_RESUMING).
> >
> > I thought of replacing the old flags with RPM_IN_TRANSITION, actually.
>
> Okay, but hopefully you won't mind if I continue to use the old state
> names in conversation.

Sure.

> > Still, the additional flag for 'idle notification is in progress' is still
> > necessary for the following two reasons:
> >
> > (1) Idle notifications cannot be run (synchronously) when one is already in
> > progress, so we need a means to determine whether or not this is the case.
> >
> > (2) If run-time PM is to be disabled, the function doing that must guarantee
> > that ->runtime_idle() won't be running after it's returned, so it needs to
> > know how to check that.
>
> Agreed.
>
>
> > > I don't agree. For example, suppose the device has an active child
> > > when the driver says: Suspend it in 30 seconds. If the child is then
> > > removed after only 10 seconds, does it make sense to go ahead with
> > > suspending the parent 20 seconds later? No -- if the parent is going
> > > to be suspended, the decision as to when should be made at the time the
> > > child is removed, not beforehand.
> >
> > There are two functions, on that sets up the timer and the other that queues
> > up the request. This is the second one that makes the decision if the request
> > is still worth queuing up.
> >
> > > (Even more concretely, suppose there is a 30-second inactivity timeout
> > > for autosuspend. Removing the child counts as activity and so should
> > > restart the timer.)
> > >
> > > To put it another way, suppose you accept a delayed request under
> > > inappropriate conditions. If the conditions don't change, the whole
> > > thing was a waste of effort. And if the conditions do change, then the
> > > whole delayed request should be reconsidered anyhow.
> >
> > The problem is, even if you always accept a delayed request under appropriate
> > conditions, you still have to reconsider it before putting it into the work
> > queue, because the conditions might have changed. So, you'd like to do this:
> >
> > (1) Check if the conditions are appropriate, set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> >
> > while I think it will be simpler to do this:
> >
> > (1) Set up a timer.
> > (2) Check if the conditions are appropriate, queue up a suspend request.
> >
> > In any case you can have a pm_runtime_suspend() / pm_runtime_resume() cycle
> > between (1) and (2), so I don't really see a practical difference.
>
> A cycle like that would cancel the timer anyway. Maybe that's what you
> meant...

Yes.

> Hmm. What sort of conditions are we talking about? One possiblity is
> that we are in the wrong state, i.e., in SUSPENDING or SUSPENDED. It's
> completely useless to start a timer then; if the state changes the
> timer will be cancelled, and if it doesn't change then the request
> won't be queued when the timer expires.

OK

> The other possiblity is that either the children or usage counter is
> positive. If the counter decrements to 0 so that a suspend is feasible
> then we would send an idle notification. At that point the driver
> could decide what to do; the most likely response would be to
> reschedule the suspend. In fact, it's hard to think of a situation
> where the driver would want to just let the timer keep on running.

OK

> > For example, suppose ->runtime_resume() has been called as
> > a result of a remote wake-up (ie. after pm_request_resume()) and it has some
> > I/O to process, but it is known beforehand that the device will most likely be
> > inactive after the I/O is done. So, it's tempting to call
> > pm_schedule_suspend() from within ->runtime_resume(), but the conditions are
> > inappropriate (the device is not regarded as suspended).
>
> ?? Conditions are perfectly appropriate, since suspend requests are
> allowed in the RESUMING state.

OK

> Unless the driver also did a pm_runtime_get, of course. But in that
> case it would have to do a pm_runtime_put eventually, at which point it
> could schedule the suspend.
>
> > However, calling
> > pm_schedule_suspend() with a long enough delay doesn't break any rules related
> > to the ->runtime_*() callbacks, so why should it be forbidden?
>
> It isn't.
>
> > Next, suppose pm_schedule_suspend() is called, but it fails because the
> > conditions are inappropriate. What's the caller supposed to do? Wait for the
> > conditions to change and repeat?
>
> In a manner of speaking. More precisely, whatever code is responsible
> for changing the conditions should call pm_schedule_suspend. Or set up
> an idle notification, leading indirectly to pm_schedule_suspend.
>
> > But why should it bother if the conditions
> > may still change before ->runtime_suspend() is actually called?
>
> It should bother because conditions might _not_ change, in which case
> the suspend would occur. But for what you are proposing, if the
> conditions don't change then the suspend will not occur.
>
> > IMO, it's the caller's problem whether or not what it does is useful or
> > efficient. The core's problem is to ensure that it doesn't break things.
>
> But what's the drawback? The extra overhead of checking whether two
> counters are positive is minuscule compared to the effort of setting up
> a timer. And it's even better when you consider that the mostly likely
> outcome of letting the timer run is that the timer handler would fail
> to queue a suspend request (because the counters are unchanged).
>
>
> > > Trying to keep track of reasons for incrementing and decrementing
> > > usage_count is very difficult to do in the core. What happens if
> > > pm_request_resume increments the count but then the driver calls
> > > pm_runtime_get, pm_runtime_resume, pm_runtime_put all before the work
> > > routine can run?
> >
> > Nothing wrong, as long as the increments and decrements are balanced (if they
> > aren't balanced, there is a bug in the driver anyway).
>
> That's my point -- in this situation it's very difficult for the driver
> to balance them. There would be no decrement to balance
> pm_request_resume's automatic increment, because the work routine would
> never run.
>
> > In fact, for this to
> > work we need the rule that a new request of the same type doesn't replace an
> > existing one. Then, the submitted resume request cannot be canceled, so the
> > work function will run and drop the usage counter.
>
> A new pm_schedule_suspend _should_ replace an existing one. For
> idle_notify and resume requests, this rule is more or less a no-op.
>
> > > It's better to make the driver responsible for maintaining the counter
> > > value. Forcing the driver to do pm_runtime_get, pm_request_resume is
> > > better than having the core automatically change the counter.
> >
> > So the caller will do:
> >
> > pm_runtime_get(dev);
> > error = pm_request_resume(dev);
> > if (error)
> > goto out;
> > <process I/O>
> > pm_runtime_put();
>
> [."error" isn't a good name. The return value would be 0 to indicate
> the request was accepted and queued, or 1 to indicate the device is
> already active. Or perhaps vice versa.]

Why do you insist on using positive values? Also, there are other situations
possible (like run-time PM is disabled etc.).

> > but how is it supposed to ensure that pm_runtime_put() will be called after
> > executing the 'goto out' thing?
>
> The same way it knows that the runtime_resume method has to process the
> pending I/O. That is, the presence of I/O to process means that once
> the processing is over, the driver should call pm_runtime_put.

I overlooked the fact that if pm_request_resume() returns a value indicating
that the request has been queued up, the status is such that it won't allow any
other requests to be queued up and only pm_runtime_resume() can. The status
will still remain this way until ->runtime_resume() has returned, so the caller
can just call pm_runtime_put() right after pm_request_resume() in that case
(unless it wants to process I/O after ->runtime_resume() has returned, but
then it can increment the usage counter in ->runtime_resume()).

> > Anyway, we don't need to use the usage counter for that (although it's cheap).
> > Instead, we can make pm_request_suspend() and pm_request_idle() check if a
> > resume request is pending and fail if that's the case.
>
> But what about pm_runtime_suspend? I think we need to use the counter.
> Besides, the states in which suspend requests and idle requests are
> valid are disjoint from the states in which resume requests are valid.

That's correct. pm_runtime_suspend() should check the counter IMO, but it
shouldn't change it.

Also, it looks like the status bits are sufficient to prevent suspend requests
or synchronous suspends from happening at wrong times, from the core's point
of view, so scratch the idea of using the usage counter to block them.

> > Let's put it another way. What's the practical benefit to the caller if we
> > always check the counters in submissions?
>
> It saves the overhead of setting up and running a useless timer. It
> avoids a race between the timer routine and pm_runtime_put.

OK

> > > > Any pending request takes precedence over a new idle notification request.
> > >
> > > For pending resume requests this rule is unnecessary; it's invalid to
> > > submit an idle notification request while a resume request is pending
> > > (since resume requests can be pending only in the RPM_SUSPENDING and
> > > RPM_SUSPENDED states while idle notification requests are accepted only
> > > in RPM_RESUMING and RPM_ACTIVE).
> >
> > It is correct nevertheless. :-)
>
> Okay, if you want. Provided you agree that "pending request" doesn't
> include unexpired suspend timers.

Sure.

> > Well, after some reconsideration I think it's not enough (as I wrote in my last
> > message), becuase it generally makes sense to make the following rule:
> >
> > A pending request always takes precedence over a new request of the same
> > type.
> >
> > So, for example, if pm_request_resume() is called and there's a resume request
> > pending already, the new pm_request_resume() should just let the pending
> > request alone and quit.
>
> Do you mean we shouldn't cancel the work item and then requeue it? I
> agree. In fact I'd go even farther: If the timer routine find an idle
> request pending, it shouldn't cancel it -- instead it should simply
> change async_action to ASYNC_SUSPEND. That's a simple optimization.
> Regardless, the effect isn't visible to drivers.

I don't really like the async_action idea, as you might have noticed.

> > Thus, it seems reasonable to remember what type of a request is pending
> > (i don't think we can figure it out from the status fields in 100% of the
> > cases).
>
> That's what the async_action field in my proposal is for.

Ah. Why don't we just use a request type field instead?

In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
(RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

Additionally, we'll need an "idle notification is running" flag as we've aleady
agreed, but that's independent on the status and request type (except that, I
think, it should be forbidden to set the request type to RPM_REQ_IDLE if
this flag is set).

That would pretty much suffice to represent all of the possibilities.

I'd also add a "disabled" flag indicating that run-time PM of the device is
disabled, an "error" flag indicating that one of the
->runtime_[suspend/resume]() callbacks has failed to do its job and
and an int field to store the error code returned by the failing callback (in
case the failure happened in an asynchronous routine).

> > > Yes. But the driver might depend on something happening inside the
> > > runtime_resume method, so it would need to know if a successful
> > > pm_runtime_resume wasn't going to invoke the callback.
> >
> > Hmm. That would require the driver to know that the device was suspended,
> > but in that case pm_runtime_resume() returning 0 would mean that _someone_
> > ran ->runtime_resume() for it in any case.
> >
> > If the driver doesn't know if the device was suspended beforehand, it cannot
> > depend on the execution of ->runtime_resume().
>
> Exactly. Therefore it needs to be told if pm_runtime_resume isn't
> going to call the runtime_resume method, so that it can take
> appropriate remedial action.

OK, it can return 1 if the status was already RPM_ACTIVE.

> > > > > To be determined: How runtime PM will interact with system sleep.
> > > >
> > > > Yes. My first idea was to disable run-time PM before entering a system sleep
> > > > state, but that would involve canceling all of the pending requests.
> > >
> > > Or simply freezing the workqueue.
> >
> > Well, what about the synchronous calls? How are we going to prevent them
> > from happening after freezing the workqueue?
>
> How about your "rpm_disabled" flag?

That's fine, we'd also need to wait for running callbacks to finish too. And
I'm still not convinced if we should preserve requests queued up before the
system sleep. Or keep the suspend timer running for that matter.

> > Now there's a point in which allowing to set up the suspend timer at any time
> > simplifies things quite a bit. Namely, in that case, if pm_schedule_suspend()
> > is called and it sees a timer pending, it deactivates the timer with
> > del_timer() and sets up a new one with add_timer(). It doesn't need to worry
> > about whether the suspend request has been queued up already or
> > pm_runtime_suspend() is running or something. Things will work themselves out
> > anyway eventually.
> >
> > Otherwise, after calling del_timer() we'll need to check if the timer was pending
> > and if it wasn't, then if the suspend requests has been queued up already, and
> > if it has, then if pm_runtime_suspend() is running (the current status is
> > RPM_SUSPENDING) etc. That doesn't look particularly clean.
>
> It's not as bad as you think. In pseudo code:
>
> ret = suspend_allowed(dev);
> if (ret)
> return ret;
> if (dev->power.timer_expiration) {
> del_timer(&dev->power.timer);
> dev->power.timer_expiration = 0;
> }
> if (dev->power.work_pending) {
> cancel_work(&dev->power.work);
> dev->power.work_pending = 0;
> dev->power.async_action = 0;
> }
> dev->power.timer_expiration = max(jiffies + delay, 1UL);
> mod_timer(&dev->power.timer, delay);
>
> The middle section could usefully be put in a subroutine.

Could you please remind me what timer_expiration is for?

So, at a high level, the pm_request_* and pm_schedule_* functions would work
like this (I'm omitting acquiring and releasing locks):

pm_request_idle()
* return -EINVAL if 'disabled' is set or 'runtime_error' is set
* return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
* return -EALREADY if 'request type' is RPM_REQ_IDLE
* return -EINPROGRESS if 'idle notification in progress' is set
* change 'request type' to RPM_REQ_IDLE and queue up a request to execute
->runtime_idle() or ->runtime_suspend() (which one will be executed depends
on 'request type' at the time when the work function is run)
* return 0

pm_schedule_suspend()
* return -EINVAL if 'disabled' is set or 'runtime_error' is set
* return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
* return -EALREADY if 'runtime status' is RPM_SUSPENDED
* return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
* if suspend timer is pending, deactivate it
* if 'request type' is not RPM_REQ_NONE, cancel the work
* set up a timer to execute pm_request_suspend()
* return 0

pm_request_suspend()
* return if 'disabled' is set or 'runtime_error' is set
* return if 'usage_count' > 0 or 'child_count' > 0
* return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
* if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
* change 'request type' to RPM_REQ_SUSPEND and queue up a request to
execute ->runtime_suspend()

pm_request_resume()
* return -EINVAL if 'disabled' is set or 'runtime_error' is set
* return -EINPROGRESS if 'runtime status' is RPM_RESUMING
* return -EALREADY if 'request type' is RPM_REQ_RESUME
* if suspend timer is pending, deactivate it
* if 'request type' is not RPM_REQ_NONE, cancel the work
* return 1 if 'runtime status' is RPM_ACTIVE
* change 'request type' to RPM_REQ_RESUME and queue up a request to
execute ->runtime_resume()
* return 0

Or did I miss anything?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 3, 2009, 1:58 PM

Post #9 of 14 (266 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:

> > [."error" isn't a good name. The return value would be 0 to indicate
> > the request was accepted and queued, or 1 to indicate the device is
> > already active. Or perhaps vice versa.]
>
> Why do you insist on using positive values? Also, there are other situations
> possible (like run-time PM is disabled etc.).

I think we should use positive values to indicate situations that
aren't the "nominal" case but also aren't errors. This simplifies
error checking in drivers. For example, you wouldn't want to print a
debugging or warning message just because the device happened to be
active already when you called pm_runtime_resume.


> I don't really like the async_action idea, as you might have noticed.

Do you mean that you don't like the field, or that you don't like its name?

> > > Thus, it seems reasonable to remember what type of a request is pending
> > > (i don't think we can figure it out from the status fields in 100% of the
> > > cases).
> >
> > That's what the async_action field in my proposal is for.
>
> Ah. Why don't we just use a request type field instead?

"A rose by any other name..."

> In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).

That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.

> Additionally, we'll need an "idle notification is running" flag as we've aleady
> agreed, but that's independent on the status and request type (except that, I
> think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> this flag is set).

I don't see why; we can allow drivers to queue an idle notification
from within their runtime_idle routine (even though it might seem
pointless). What we should forbid is calling pm_runtime_idle when the
flag is set.

> That would pretty much suffice to represent all of the possibilities.
>
> I'd also add a "disabled" flag indicating that run-time PM of the device is
> disabled, an "error" flag indicating that one of the
> ->runtime_[suspend/resume]() callbacks has failed to do its job and
> and an int field to store the error code returned by the failing callback (in
> case the failure happened in an asynchronous routine).

Sure -- those are all things in the current design which should remain.
As well as the wait_queue.


> That's fine, we'd also need to wait for running callbacks to finish too. And
> I'm still not convinced if we should preserve requests queued up before the
> system sleep. Or keep the suspend timer running for that matter.

This all does into the "to-be-determined" category. :-)


> Could you please remind me what timer_expiration is for?

It is the jiffies value for the next timer expiration, or 0 if the
timer isn't pending. Its purpose is to allow us to correctly
reschedule suspend requests.

Suppose the timer expires at about the same time as a new
pm_schedule_suspend call occurs. If the timer routine hasn't queued
the work item yet then there's nothing to cancel, so how do we prevent
a suspend request from being added to the workqueue? Answer: The timer
routine checks timer_expiration. If the value stored there is in the
future, then the routine knows it was triggered early and it shouldn't
submit the work item.

Also (a minor benefit), before calling del_timer we can check whether
timer_expiration is nonzero.

> So, at a high level, the pm_request_* and pm_schedule_* functions would work
> like this (I'm omitting acquiring and releasing locks):
>
> pm_request_idle()
> * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
> RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0

We should allow the status to be RPM_RESUMING.

> * return -EALREADY if 'request type' is RPM_REQ_IDLE

No, return 0.

> * return -EINPROGRESS if 'idle notification in progress' is set

No, go ahead and schedule another idle notification.

> * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
> ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
> on 'request type' at the time when the work function is run)

More simply, just queue the work item.

> * return 0
>
> pm_schedule_suspend()
> * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING

The last two aren't right. If the status is RPM_SUSPENDED or
RPM_SUSPENDING, cancel any pending work and set the type to
RPM_REQ_NONE before returning. In other words, cancel a possible
pending resume request.

> * if suspend timer is pending, deactivate it

This step isn't needed here, since you're going to restart the timer
anyway.

> * if 'request type' is not RPM_REQ_NONE, cancel the work

Set timer_expiration = jiffies + delay.

> * set up a timer to execute pm_request_suspend()
> * return 0
>
> pm_request_suspend()
> * return if 'disabled' is set or 'runtime_error' is set
> * return if 'usage_count' > 0 or 'child_count' > 0
> * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED

First cancel a possible pending resume request.

If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending
timer (and set time_expiration to 0).

> * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
> * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
> execute ->runtime_suspend()
>
> pm_request_resume()
> * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> * return -EINPROGRESS if 'runtime status' is RPM_RESUMING

Or RPM_ACTIVE.

> * return -EALREADY if 'request type' is RPM_REQ_RESUME

For these last two, first cancel a possible pending suspend request
and a possible timer. Should we leave a pending idle request in place?
And return 1, not an error code.

> * if suspend timer is pending, deactivate it

The timer can't be pending at this point.

> * if 'request type' is not RPM_REQ_NONE, cancel the work

At this point, 'request type' can only be RPM_REQ_NONE or
RPM_REQ_RESUME. In neither case do we want to cancel it.

> * return 1 if 'runtime status' is RPM_ACTIVE

See above.

> * change 'request type' to RPM_REQ_RESUME and queue up a request to
> execute ->runtime_resume()

Queue the request only if the state is RPM_SUSPENDED.

> * return 0
>
> Or did I miss anything?

I think this is pretty close. It'll be necessary to go back and reread
the old email messages to make sure this really does everything we
eventually agreed on. :-)

Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
pm_runtime_idle. There is an extra requirement: When a suspend or
resume is over, if 'request type' is set then schedule the work item.
Doing things this way allows the workqueue thread to avoid waiting
around for the suspend or resume to finish.

Also, when a resume is over we should schedule an idle notification
even if 'request type' is clear, provided the counters are 0.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rjw at sisk

Jul 3, 2009, 4:57 PM

Post #10 of 14 (265 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Friday 03 July 2009, Alan Stern wrote:
> On Fri, 3 Jul 2009, Rafael J. Wysocki wrote:
>
> > > [."error" isn't a good name. The return value would be 0 to indicate
> > > the request was accepted and queued, or 1 to indicate the device is
> > > already active. Or perhaps vice versa.]
> >
> > Why do you insist on using positive values? Also, there are other situations
> > possible (like run-time PM is disabled etc.).
>
> I think we should use positive values to indicate situations that
> aren't the "nominal" case but also aren't errors. This simplifies
> error checking in drivers. For example, you wouldn't want to print a
> debugging or warning message just because the device happened to be
> active already when you called pm_runtime_resume.

OK

> > I don't really like the async_action idea, as you might have noticed.
>
> Do you mean that you don't like the field, or that you don't like its name?

The name, actually. That's because I'd like to use the values for something
that's not 'async' in substance (more on that later).

> > > > Thus, it seems reasonable to remember what type of a request is pending
> > > > (i don't think we can figure it out from the status fields in 100% of the
> > > > cases).
> > >
> > > That's what the async_action field in my proposal is for.
> >
> > Ah. Why don't we just use a request type field instead?
>
> "A rose by any other name..."
>
> > In fact, we can use a 2-bit status field (RPM_ACTIVE, RPM_SUSPENDING,
> > RPM_SUSPENDED, RPM_RESUMING) and a 2-bit request type field
> > (RPM_REQ_NONE, RPM_REQ_IDLE, RPM_REQ_SUSPEND, RPM_REQ_RESUME).
>
> That's the same as my 0, ASYNC_IDLE, ASYNC_SUSPEND, ASYNC_RESUME.
>
> > Additionally, we'll need an "idle notification is running" flag as we've aleady
> > agreed, but that's independent on the status and request type (except that, I
> > think, it should be forbidden to set the request type to RPM_REQ_IDLE if
> > this flag is set).
>
> I don't see why; we can allow drivers to queue an idle notification
> from within their runtime_idle routine (even though it might seem
> pointless). What we should forbid is calling pm_runtime_idle when the
> flag is set.

OK

> > That would pretty much suffice to represent all of the possibilities.
> >
> > I'd also add a "disabled" flag indicating that run-time PM of the device is
> > disabled, an "error" flag indicating that one of the
> > ->runtime_[suspend/resume]() callbacks has failed to do its job and
> > and an int field to store the error code returned by the failing callback (in
> > case the failure happened in an asynchronous routine).
>
> Sure -- those are all things in the current design which should remain.
> As well as the wait_queue.

It occured to me in the meantime that if we added a 'request_pending' (or
'work_pending' or whatever similar) flag to the above, we could avoid using
cancel_work(). Namely, if 'request_pending' indicates that there's a work item
queued up, we could change 'request type' to NONE in case we didn't want the
work function to do anything. Then, the work function would just unset
'request_pending' and quit if 'request type' is NONE.

I generally like the idea of changing 'request type' on the fly once we've
noticed that the currently pending request should be replaced by another one.
That would require us to introduce a big idle-suspend-resume function
choosing the callback to run based on 'request type', which would be quite
complicated. But that function could also be used for the 'synchronous'
operations, so perhaps it's worth trying?

Such a function can take two arguments, dev and request, where the second
one determines the callback to run. It can take the same values as 'request
type', where NONE means "you've been called from the workqueue, use 'request
type' from dev to check what to do", but your ASYNC_* names are not really
suitable here. :-)

> > That's fine, we'd also need to wait for running callbacks to finish too. And
> > I'm still not convinced if we should preserve requests queued up before the
> > system sleep. Or keep the suspend timer running for that matter.
>
> This all does into the "to-be-determined" category. :-)

Well, I'd like to choose something to start with.

> > Could you please remind me what timer_expiration is for?
>
> It is the jiffies value for the next timer expiration, or 0 if the
> timer isn't pending. Its purpose is to allow us to correctly
> reschedule suspend requests.
>
> Suppose the timer expires at about the same time as a new
> pm_schedule_suspend call occurs. If the timer routine hasn't queued
> the work item yet then there's nothing to cancel, so how do we prevent
> a suspend request from being added to the workqueue? Answer: The timer
> routine checks timer_expiration. If the value stored there is in the
> future, then the routine knows it was triggered early and it shouldn't
> submit the work item.
>
> Also (a minor benefit), before calling del_timer we can check whether
> timer_expiration is nonzero.

OK, thanks.

> > So, at a high level, the pm_request_* and pm_schedule_* functions would work
> > like this (I'm omitting acquiring and releasing locks):
> >
> > pm_request_idle()
> > * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > * return -EAGAIN if 'runtime status' is not RPM_ACTIVE or 'request type' is
> > RPM_REQ_SUSPEND or 'usage_count' > 0 or 'child_count' > 0
>
> We should allow the status to be RPM_RESUMING.

OK

> > * return -EALREADY if 'request type' is RPM_REQ_IDLE
>
> No, return 0.

OK

> > * return -EINPROGRESS if 'idle notification in progress' is set
>
> No, go ahead and schedule another idle notification.

OK

> > * change 'request type' to RPM_REQ_IDLE and queue up a request to execute
> > ->runtime_idle() or ->runtime_suspend() (which one will be executed depends
> > on 'request type' at the time when the work function is run)
>
> More simply, just queue the work item.
>
> > * return 0
> >
> > pm_schedule_suspend()
> > * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
>
> The last two aren't right. If the status is RPM_SUSPENDED or
> RPM_SUSPENDING, cancel any pending work and set the type to
> RPM_REQ_NONE before returning. In other words, cancel a possible
> pending resume request.

Wy do you think the possible pending resume request should be canceled?

I don't really agree here. Resume request really means there's data to
process, so we shouldn't cancel pending resume requests IMO.

The driver should be given a chance to process data in ->runtime_resume()
even if it doesn't use the usage counter. Otherwise, the usage counter would
always have to be used along with resume requests, so having
pm_request_resume() that doesn't increment the usage counter would really be
pointless.

> > * if suspend timer is pending, deactivate it
>
> This step isn't needed here, since you're going to restart the timer
> anyway.

OK, restart the timer.

> > * if 'request type' is not RPM_REQ_NONE, cancel the work
>
> Set timer_expiration = jiffies + delay.

OK

> > * set up a timer to execute pm_request_suspend()
> > * return 0
> >
> > pm_request_suspend()
> > * return if 'disabled' is set or 'runtime_error' is set
> > * return if 'usage_count' > 0 or 'child_count' > 0
> > * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
>
> First cancel a possible pending resume request.

I disagree.

> If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending
> timer (and set time_expiration to 0).

We're the timer function, so either the timer is not pending, or we've been
executed too early.

> > * if 'request type' is RPM_REQ_IDLE, change it to RPM_REQ_SUSPEND and return
> > * change 'request type' to RPM_REQ_SUSPEND and queue up a request to
> > execute ->runtime_suspend()
> >
> > pm_request_resume()
> > * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
>
> Or RPM_ACTIVE.

Maybe return 1 in that case?

> > * return -EALREADY if 'request type' is RPM_REQ_RESUME
>
> For these last two, first cancel a possible pending suspend request
> and a possible timer.

Possible timer only, I think. if 'request type' is RESUME, there can't be
suspend request pending.

> Should we leave a pending idle request in place?

Probably not. It's likely going to result in a suspend request that we would
cancel.

> And return 1, not an error code.
>
> > * if suspend timer is pending, deactivate it
>
> The timer can't be pending at this point.

That's if we deactivated it earlier. OK

> > * if 'request type' is not RPM_REQ_NONE, cancel the work
>
> At this point, 'request type' can only be RPM_REQ_NONE or
> RPM_REQ_RESUME. In neither case do we want to cancel it.
>
> > * return 1 if 'runtime status' is RPM_ACTIVE
>
> See above.
>
> > * change 'request type' to RPM_REQ_RESUME and queue up a request to
> > execute ->runtime_resume()
>
> Queue the request only if the state is RPM_SUSPENDED.
>
> > * return 0
> >
> > Or did I miss anything?
>
> I think this is pretty close. It'll be necessary to go back and reread
> the old email messages to make sure this really does everything we
> eventually agreed on. :-)

I think it's sufficient if we agree on the final version. :-)

> Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> pm_runtime_idle. There is an extra requirement: When a suspend or
> resume is over, if 'request type' is set then schedule the work item.
> Doing things this way allows the workqueue thread to avoid waiting
> around for the suspend or resume to finish.

I agree except that I would like suspends to just fail when the status is
RPM_RESUMING. The reason is that a sloppily written driver could enter a
busy-loop of suspending-resuming the device, without the possibility to process
data, if there's full symmetry between suspend and resume. So, I'd like to
break that symmetry and make resume operations privileged with respect to
suspend and idle notifications.

> Also, when a resume is over we should schedule an idle notification
> even if 'request type' is clear, provided the counters are 0.

Agreed.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 3, 2009, 8:12 PM

Post #11 of 14 (258 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > > I don't really like the async_action idea, as you might have noticed.
> >
> > Do you mean that you don't like the field, or that you don't like its name?
>
> The name, actually. That's because I'd like to use the values for something
> that's not 'async' in substance (more on that later).

Okay. I don't care about the name.

> It occured to me in the meantime that if we added a 'request_pending' (or
> 'work_pending' or whatever similar) flag to the above, we could avoid using
> cancel_work(). Namely, if 'request_pending' indicates that there's a work item
> queued up, we could change 'request type' to NONE in case we didn't want the
> work function to do anything. Then, the work function would just unset
> 'request_pending' and quit if 'request type' is NONE.

You mean use request_pending to decide whether to call cancel_work,
instead of looking at request_type? That's right.

As for whether or not we should actually call cancel_work... Which is
more expensive: Calling cancel_work when no work is pending, or letting
the work item run when it doesn't have anything to do? Probably the
latter.

> I generally like the idea of changing 'request type' on the fly once we've
> noticed that the currently pending request should be replaced by another one.

Me too.

> That would require us to introduce a big idle-suspend-resume function
> choosing the callback to run based on 'request type', which would be quite
> complicated.

It doesn't have to be very big or complicated:

spin_lock_irq(&dev->power.lock);
switch (dev->power.request_type) {
case RPM_REQ_SUSPEND:
__pm_runtime_suspend(dev, false);
break;
case RPM_REQ_RESUME:
__pm_runtime_resume(dev, false);
break;
case RPM_REQ_IDLE:
__pm_runtime_idle(dev, false);
break;
default:
}
spin_unlock_irq(&dev->power.lock);

It would be necessary to change the __pm_runtime_* routines, since they
would now have to be called with the lock held.

> But that function could also be used for the 'synchronous'
> operations, so perhaps it's worth trying?
>
> Such a function can take two arguments, dev and request, where the second
> one determines the callback to run. It can take the same values as 'request
> type', where NONE means "you've been called from the workqueue, use 'request
> type' from dev to check what to do", but your ASYNC_* names are not really
> suitable here. :-)

I don't see any advantage in that approach. The pm_runtime_* functions
already know what they want to do. Why encode it in a request argument
only to decode it again?


> > > That's fine, we'd also need to wait for running callbacks to
> > > finish too. And I'm still not convinced if we should preserve
> > > requests queued up before the system sleep. Or keep the suspend
> > > timer running for that matter.
> >
> > This all does into the "to-be-determined" category. :-)
>
> Well, I'd like to choose something to start with.

Pending suspends and the suspend timer don't matter much; we can cancel
them because they ought to get resubmitted after the system wakes up.
Pending resumes are more difficult; depending on how they are treated
they could morph into immediate wakeup requests.

Perhaps even more tricky is how to handle things like the ACPI suspend
calls when the device is already runtime-suspended. I don't know what
we should do about that.


> > > pm_schedule_suspend()
> > > * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > > * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > > * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> >
> > The last two aren't right. If the status is RPM_SUSPENDED or
> > RPM_SUSPENDING, cancel any pending work and set the type to
> > RPM_REQ_NONE before returning. In other words, cancel a possible
> > pending resume request.
>
> Wy do you think the possible pending resume request should be canceled?

It's part of the "most recent request wins" approach.

> I don't really agree here. Resume request really means there's data to
> process, so we shouldn't cancel pending resume requests IMO.
>
> The driver should be given a chance to process data in ->runtime_resume()
> even if it doesn't use the usage counter. Otherwise, the usage counter would
> always have to be used along with resume requests, so having
> pm_request_resume() that doesn't increment the usage counter would really be
> pointless.

All right, I'll go along with this. So instead of "most recent request
wins", we have something like this:

Resume requests (queued or in progress) override suspend and
idle requests (sync or async).

Suspend requests (queued or in progress, but not unexpired)
override idle requests (sync or async).

But this statement might not be precise enough, and I'm too tired to
think through all the ramifications right now.


> > > pm_request_suspend()
> > > * return if 'disabled' is set or 'runtime_error' is set
> > > * return if 'usage_count' > 0 or 'child_count' > 0
> > > * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> >
> > First cancel a possible pending resume request.
>
> I disagree.

This is the same as the above, right?

> > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending
> > timer (and set time_expiration to 0).
>
> We're the timer function, so either the timer is not pending, or we've been
> executed too early.

Oh, okay. I thought perhaps you might have wanted to export
pm_request_suspend. But this is really supposed to be just the timer
handler.


> > > pm_request_resume()
> > > * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> >
> > Or RPM_ACTIVE.
>
> Maybe return 1 in that case?

Yes.

> > > * return -EALREADY if 'request type' is RPM_REQ_RESUME
> >
> > For these last two, first cancel a possible pending suspend request
> > and a possible timer.
>
> Possible timer only, I think. if 'request type' is RESUME, there can't be
> suspend request pending.

But there can be if the status is RPM_ACTIVE. So okay, if the status
isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME,
then return 0.


> > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > pm_runtime_idle. There is an extra requirement: When a suspend or
> > resume is over, if 'request type' is set then schedule the work item.
> > Doing things this way allows the workqueue thread to avoid waiting
> > around for the suspend or resume to finish.
>
> I agree except that I would like suspends to just fail when the status is
> RPM_RESUMING. The reason is that a sloppily written driver could enter a
> busy-loop of suspending-resuming the device, without the possibility to process
> data, if there's full symmetry between suspend and resume. So, I'd like to
> break that symmetry and make resume operations privileged with respect to
> suspend and idle notifications.

This follows from the new precedence rule.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rjw at sisk

Jul 4, 2009, 2:27 PM

Post #12 of 14 (250 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Saturday 04 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
>
> > > > I don't really like the async_action idea, as you might have noticed.
> > >
> > > Do you mean that you don't like the field, or that you don't like its name?
> >
> > The name, actually. That's because I'd like to use the values for something
> > that's not 'async' in substance (more on that later).
>
> Okay. I don't care about the name.
>
> > It occured to me in the meantime that if we added a 'request_pending' (or
> > 'work_pending' or whatever similar) flag to the above, we could avoid using
> > cancel_work(). Namely, if 'request_pending' indicates that there's a work item
> > queued up, we could change 'request type' to NONE in case we didn't want the
> > work function to do anything. Then, the work function would just unset
> > 'request_pending' and quit if 'request type' is NONE.
>
> You mean use request_pending to decide whether to call cancel_work,
> instead of looking at request_type? That's right.
>
> As for whether or not we should actually call cancel_work... Which is
> more expensive: Calling cancel_work when no work is pending, or letting
> the work item run when it doesn't have anything to do? Probably the
> latter.

Agreed, but that doesn't affect functionality. We can get the desired
functionality without the cancel_work() patch and then optimize things along
with that patch. This way it'll be easier to demontrate the benefit of it.

> > I generally like the idea of changing 'request type' on the fly once we've
> > noticed that the currently pending request should be replaced by another one.
>
> Me too.
>
> > That would require us to introduce a big idle-suspend-resume function
> > choosing the callback to run based on 'request type', which would be quite
> > complicated.
>
> It doesn't have to be very big or complicated:
>
> spin_lock_irq(&dev->power.lock);

Ah, that is the detail I overlooked, we don't need to use
spin_lock_irqsave(), because these function will always be called with
interrupts enabled.

> switch (dev->power.request_type) {
> case RPM_REQ_SUSPEND:
> __pm_runtime_suspend(dev, false);
> break;
> case RPM_REQ_RESUME:
> __pm_runtime_resume(dev, false);
> break;
> case RPM_REQ_IDLE:
> __pm_runtime_idle(dev, false);
> break;
> default:
> }
> spin_unlock_irq(&dev->power.lock);
>
> It would be necessary to change the __pm_runtime_* routines, since they
> would now have to be called with the lock held.
>
> > But that function could also be used for the 'synchronous'
> > operations, so perhaps it's worth trying?
> >
> > Such a function can take two arguments, dev and request, where the second
> > one determines the callback to run. It can take the same values as 'request
> > type', where NONE means "you've been called from the workqueue, use 'request
> > type' from dev to check what to do", but your ASYNC_* names are not really
> > suitable here. :-)
>
> I don't see any advantage in that approach. The pm_runtime_* functions
> already know what they want to do. Why encode it in a request argument
> only to decode it again?

Well, scratch that anyway, I thought it would be necessary because of the
'irqsave' locking.

> > > > That's fine, we'd also need to wait for running callbacks to
> > > > finish too. And I'm still not convinced if we should preserve
> > > > requests queued up before the system sleep. Or keep the suspend
> > > > timer running for that matter.
> > >
> > > This all does into the "to-be-determined" category. :-)
> >
> > Well, I'd like to choose something to start with.
>
> Pending suspends and the suspend timer don't matter much; we can cancel
> them because they ought to get resubmitted after the system wakes up.
> Pending resumes are more difficult; depending on how they are treated
> they could morph into immediate wakeup requests.
>
> Perhaps even more tricky is how to handle things like the ACPI suspend
> calls when the device is already runtime-suspended. I don't know what
> we should do about that.

That almost entirely depends on the bus type. For PCI and probably PNP as well
there's a notion of ACPI low power states and there are AML methods to put the
devices into these states. Unfortunately, the ACPI low power state to put the
device into depends on the target sleep state of the system, so these devices
will probably have to be put into D0 before system suspend anyway.

I think that the bus type can handle this as long as it knows the state the
device is in before system suspend. So, the only thing the core should do is
to block the execution of run-time PM framework functions during system
sleep and resume. The state it leaves the device in shouldn't matter.

So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
device and wait for all run-time PM operations on it in progress to complete.

In the 'disabled' state the bus type or driver can modify the run-time PM
status to whatever they like anyway. Perhaps we can provide a helper to
change 'request type' to RPM_REQ_NONE.

> > > > pm_schedule_suspend()
> > > > * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > > * return -EAGAIN if 'usage_count' > 0 or 'child_count' > 0
> > > > * return -EALREADY if 'runtime status' is RPM_SUSPENDED
> > > > * return -EINPROGRESS if 'runtime status' is RPM_SUSPENDING
> > >
> > > The last two aren't right. If the status is RPM_SUSPENDED or
> > > RPM_SUSPENDING, cancel any pending work and set the type to
> > > RPM_REQ_NONE before returning. In other words, cancel a possible
> > > pending resume request.
> >
> > Wy do you think the possible pending resume request should be canceled?
>
> It's part of the "most recent request wins" approach.
>
> > I don't really agree here. Resume request really means there's data to
> > process, so we shouldn't cancel pending resume requests IMO.
> >
> > The driver should be given a chance to process data in ->runtime_resume()
> > even if it doesn't use the usage counter. Otherwise, the usage counter would
> > always have to be used along with resume requests, so having
> > pm_request_resume() that doesn't increment the usage counter would really be
> > pointless.
>
> All right, I'll go along with this. So instead of "most recent request
> wins", we have something like this:
>
> Resume requests (queued or in progress) override suspend and
> idle requests (sync or async).
>
> Suspend requests (queued or in progress, but not unexpired)
> override idle requests (sync or async).
>
> But this statement might not be precise enough, and I'm too tired to
> think through all the ramifications right now.

Fair enough. :-)

> > > > pm_request_suspend()
> > > > * return if 'disabled' is set or 'runtime_error' is set
> > > > * return if 'usage_count' > 0 or 'child_count' > 0
> > > > * return if 'runtime status' is RPM_SUSPENDING or RPM_SUSPENDED
> > >
> > > First cancel a possible pending resume request.
> >
> > I disagree.
>
> This is the same as the above, right?

Yes.

> > > If the status is RPM_RESUMING or RPM_ACTIVE, cancel a possible pending
> > > timer (and set time_expiration to 0).
> >
> > We're the timer function, so either the timer is not pending, or we've been
> > executed too early.
>
> Oh, okay. I thought perhaps you might have wanted to export
> pm_request_suspend. But this is really supposed to be just the timer
> handler.

Yes, it is.

> > > > pm_request_resume()
> > > > * return -EINVAL if 'disabled' is set or 'runtime_error' is set
> > > > * return -EINPROGRESS if 'runtime status' is RPM_RESUMING
> > >
> > > Or RPM_ACTIVE.
> >
> > Maybe return 1 in that case?
>
> Yes.
>
> > > > * return -EALREADY if 'request type' is RPM_REQ_RESUME
> > >
> > > For these last two, first cancel a possible pending suspend request
> > > and a possible timer.
> >
> > Possible timer only, I think. if 'request type' is RESUME, there can't be
> > suspend request pending.
>
> But there can be if the status is RPM_ACTIVE. So okay, if the status
> isn't RPM_RESUMING or RPM_ACTIVE and request_type is RPM_REQ_RESUME,
> then return 0.
>
>
> > > Similar outlines apply for pm_runtime_suspend, pm_runtime_resume, and
> > > pm_runtime_idle. There is an extra requirement: When a suspend or
> > > resume is over, if 'request type' is set then schedule the work item.
> > > Doing things this way allows the workqueue thread to avoid waiting
> > > around for the suspend or resume to finish.
> >
> > I agree except that I would like suspends to just fail when the status is
> > RPM_RESUMING. The reason is that a sloppily written driver could enter a
> > busy-loop of suspending-resuming the device, without the possibility to process
> > data, if there's full symmetry between suspend and resume. So, I'd like to
> > break that symmetry and make resume operations privileged with respect to
> > suspend and idle notifications.
>
> This follows from the new precedence rule.

Yes.

So, I guess we have the majority of things clarified and perhaps its time to
write a patch for further discussion. :-)

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


stern at rowland

Jul 5, 2009, 7:50 AM

Post #13 of 14 (237 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:

> > As for whether or not we should actually call cancel_work... Which is
> > more expensive: Calling cancel_work when no work is pending, or letting
> > the work item run when it doesn't have anything to do? Probably the
> > latter.
>
> Agreed, but that doesn't affect functionality. We can get the desired
> functionality without the cancel_work() patch and then optimize things along
> with that patch. This way it'll be easier to demontrate the benefit of it.

Good idea.

> That almost entirely depends on the bus type. For PCI and probably PNP as well
> there's a notion of ACPI low power states and there are AML methods to put the
> devices into these states. Unfortunately, the ACPI low power state to put the
> device into depends on the target sleep state of the system, so these devices
> will probably have to be put into D0 before system suspend anyway.
>
> I think that the bus type can handle this as long as it knows the state the
> device is in before system suspend. So, the only thing the core should do is
> to block the execution of run-time PM framework functions during system
> sleep and resume. The state it leaves the device in shouldn't matter.
>
> So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> device and wait for all run-time PM operations on it in progress to complete.
>
> In the 'disabled' state the bus type or driver can modify the run-time PM
> status to whatever they like anyway. Perhaps we can provide a helper to
> change 'request type' to RPM_REQ_NONE.

The only modification that really makes sense is like you said, going
back to full power in preparation for the platform suspend operation.
Therefore perhaps we should allow pm_runtime_resume to work even when
rpm_disabled is set. And if we're going to cancel pending suspend and
idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

Which leaves only the question of what to do when a resume request is
pending...

> So, I guess we have the majority of things clarified and perhaps its time to
> write a patch for further discussion. :-)

Go ahead!

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rjw at sisk

Jul 5, 2009, 2:47 PM

Post #14 of 14 (229 views)
Permalink
Re: [RFC] Run-time PM framework (was: Re: [patch update] PM: Introduce core framework for run-time PM of I/O devices (rev. 6)) [In reply to]

On Sunday 05 July 2009, Alan Stern wrote:
> On Sat, 4 Jul 2009, Rafael J. Wysocki wrote:
>
> > > As for whether or not we should actually call cancel_work... Which is
> > > more expensive: Calling cancel_work when no work is pending, or letting
> > > the work item run when it doesn't have anything to do? Probably the
> > > latter.
> >
> > Agreed, but that doesn't affect functionality. We can get the desired
> > functionality without the cancel_work() patch and then optimize things along
> > with that patch. This way it'll be easier to demontrate the benefit of it.
>
> Good idea.
>
> > That almost entirely depends on the bus type. For PCI and probably PNP as well
> > there's a notion of ACPI low power states and there are AML methods to put the
> > devices into these states. Unfortunately, the ACPI low power state to put the
> > device into depends on the target sleep state of the system, so these devices
> > will probably have to be put into D0 before system suspend anyway.
> >
> > I think that the bus type can handle this as long as it knows the state the
> > device is in before system suspend. So, the only thing the core should do is
> > to block the execution of run-time PM framework functions during system
> > sleep and resume. The state it leaves the device in shouldn't matter.
> >
> > So, I think we can simply freeze the workqueue, set the 'disabled' bit for each
> > device and wait for all run-time PM operations on it in progress to complete.
> >
> > In the 'disabled' state the bus type or driver can modify the run-time PM
> > status to whatever they like anyway. Perhaps we can provide a helper to
> > change 'request type' to RPM_REQ_NONE.
>
> The only modification that really makes sense is like you said, going
> back to full power in preparation for the platform suspend operation.
> Therefore perhaps we should allow pm_runtime_resume to work even when
> rpm_disabled is set. And if we're going to cancel pending suspend and
> idle requests, then rpm_request would normally be RPM_REQ_NONE anyway.

After we've disabled run-time PM with pm_runtime_disable(), the bus type
and driver can do whatever they like with the device, we don't care. However,
they need to make sure that the state of the device will match its run-time PM
status when its run-time PM is enabled again.

> Which leaves only the question of what to do when a resume request is
> pending...

I think the pm_runtime_disable() can carry out a synchronous wake-up if it
sees a pending resume request. That would make sense in general, IMO, becuase
having a resume request pending usually means there's I/O to process and it's
better to allow the device to process that I/O before disabling the run-time
PM of it.

To put it differently, if there's a resume request pending, the run-time PM of
the device should be disabled while in the 'active' state rather than while in
the 'suspended' state.

Now, if we do that, the problem of run-time resume requests pending while
entering a system sleep state can be solved. Namely, we can make
pm_runtime_disable() return a result that will be -EBUSY if a pending resume
request is found by it and 0 otherwise. Then, that result can be used by
dpm_prepare() to decide whether to continue suspend or to terminate it if
the device is a wake-up one.

> > So, I guess we have the majority of things clarified and perhaps its time to
> > write a patch for further discussion. :-)
>
> Go ahead!

In fact I've already done that, but I need to have a final look at it to check
if there are no obvious mistakes in there.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.