mbox series

[v2,0/3] thermal/debugfs: Fix handling of cdev states and mitigation episodes in progress

Message ID 3297002.44csPzL39Z@kreacher
Headers show
Series thermal/debugfs: Fix handling of cdev states and mitigation episodes in progress | expand

Message

Rafael J. Wysocki April 25, 2024, 2:01 p.m. UTC
Hi Everyone,

This is an update of

https://lore.kernel.org/linux-pm/5774279.DvuYhMxLoT@kreacher/

and the only non-trivial difference between it and the v1 is a small
rebase of the second patch (the v1 of which didn't apply).

It generally has been based on top of

https://lore.kernel.org/linux-pm/12427744.O9o76ZdvQC@kreacher/

but it should apply on top of the linux-next branch in linux-pm.git as well.

It is present in the thermal-core-next branch in that tree, along with the
above series.

Thanks!

Comments

Lukasz Luba April 25, 2024, 7:08 p.m. UTC | #1
On 4/25/24 15:03, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Because thermal_debug_cdev_state_update() only creates a duration record
> for the old state of a cooling device, if its new state is used for the
> first time, there will be no record for it and cdev_dt_seq_show() will
> not print the duration information for it even though it contains code
> to compute the duration value in that case.
> 
> Address this by making thermal_debug_cdev_state_update() create a
> duration record for the new state if there is none.
> 
> Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
> Reported-by: Lukasz Luba <lukasz.luba@arm.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   drivers/thermal/thermal_debugfs.c |    8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -433,6 +433,14 @@ void thermal_debug_cdev_state_update(con
>   	}
>   
>   	cdev_dbg->current_state = new_state;
> +
> +	/*
> +	 * Create a record for the new state if it is not there, so its
> +	 * duration will be printed by cdev_dt_seq_show() as expected if it
> +	 * runs before the next state transition.
> +	 */
> +	thermal_debugfs_cdev_record_get(thermal_dbg, cdev_dbg->durations, new_state);
> +
>   	transition = (old_state << 16) | new_state;
>   
>   	/*
> 
> 
> 
> 

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Lukasz Luba April 25, 2024, 8:54 p.m. UTC | #2
On 4/25/24 15:05, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> If a thermal mitigation event is in progress, its duration value has
> not been updated yet, so 0 will be printed as the event duration by
> tze_seq_show() which is confusing.
> 
> Avoid doing that by marking the beginning of the event with the
> KTIME_MIN duration value and making tze_seq_show() compute the current
> event duration on the fly, in which case '>' will be printed instead of
> '=' in the event duration value field.
> 
> Similarly, for trip points that have been crossed on the down, mark
> the end of mitigation with the KTIME_MAX timestamp value and make
> tze_seq_show() compute the current duration on the fly for the trip
> points still involved in the mitigation, in which cases the duration
> value printed by it will be prepended with a '>' character.
> 
> Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   drivers/thermal/thermal_debugfs.c |   39 ++++++++++++++++++++++++++++++++------
>   1 file changed, 33 insertions(+), 6 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -552,6 +552,7 @@ static struct tz_episode *thermal_debugf
>   
>   	INIT_LIST_HEAD(&tze->node);
>   	tze->timestamp = now;
> +	tze->duration = KTIME_MIN;
>   
>   	for (i = 0; i < tz->num_trips; i++) {
>   		tze->trip_stats[i].min = INT_MAX;
> @@ -680,6 +681,9 @@ void thermal_debug_tz_trip_down(struct t
>   	tze->trip_stats[trip_id].duration =
>   		ktime_add(delta, tze->trip_stats[trip_id].duration);
>   
> +	/* Mark the end of mitigation for this trip point. */
> +	tze->trip_stats[trip_id].timestamp = KTIME_MAX;
> +
>   	/*
>   	 * This event closes the mitigation as we are crossing the
>   	 * last trip point the way down.
> @@ -754,15 +758,25 @@ static int tze_seq_show(struct seq_file
>   	struct thermal_trip_desc *td;
>   	struct tz_episode *tze;
>   	const char *type;
> +	u64 duration_ms;
>   	int trip_id;
> +	char c;
>   
>   	tze = list_entry((struct list_head *)v, struct tz_episode, node);
>   
> -	seq_printf(s, ",-Mitigation at %lluus, duration=%llums\n",
> -		   ktime_to_us(tze->timestamp),
> -		   ktime_to_ms(tze->duration));
> +	if (tze->duration == KTIME_MIN) {
> +		/* Mitigation in progress. */
> +		duration_ms = ktime_to_ms(ktime_sub(ktime_get(), tze->timestamp));
> +		c = '>';
> +	} else {
> +		duration_ms = ktime_to_ms(tze->duration);
> +		c = '=';
> +	}
> +
> +	seq_printf(s, ",-Mitigation at %lluus, duration%c%llums\n",
> +		   ktime_to_us(tze->timestamp), c, duration_ms);
>   
> -	seq_printf(s, "| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |\n");
> +	seq_printf(s, "| trip |     type | temp(°mC) | hyst(°mC) |  duration   |  avg(°mC) |  min(°mC) |  max(°mC) |\n");

So this one more space accounts for the new 'c' symbol in the rows
below that header, for the 'duration' column. Make sense.

>   
>   	for_each_trip_desc(tz, td) {
>   		const struct thermal_trip *trip = &td->trip;
> @@ -794,12 +808,25 @@ static int tze_seq_show(struct seq_file
>   		else
>   			type = "hot";
>   
> -		seq_printf(s, "| %*d | %*s | %*d | %*d | %*lld | %*d | %*d | %*d |\n",
> +		if (trip_stats->timestamp != KTIME_MAX) {
> +			/* Mitigation in progress. */
> +			ktime_t delta = ktime_sub(ktime_get(),
> +						  trip_stats->timestamp);
> +
> +			delta = ktime_add(delta, trip_stats->duration);
> +			duration_ms = ktime_to_ms(delta);
> +			c = '>';
> +		} else {
> +			duration_ms = ktime_to_ms(trip_stats->duration);
> +			c = ' ';
> +		}
> +
> +		seq_printf(s, "| %*d | %*s | %*d | %*d | %c%*lld | %*d | %*d | %*d |\n",
>   			   4 , trip_id,
>   			   8, type,
>   			   9, trip->temperature,
>   			   9, trip->hysteresis,
> -			   10, ktime_to_ms(trip_stats->duration),
> +			   c, 10, duration_ms,
>   			   9, trip_stats->avg,
>   			   9, trip_stats->min,
>   			   9, trip_stats->max);
> 
> 
> 
> 

The comments in code in this particular case helps, since treating
the KTIME_MIN/MAX values might become not obvious after a while.
That LGTM

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>