diff mbox series

[v5,1/2] dt-bindings: cpufreq: add virtual cpufreq device

Message ID 20240127004321.1902477-2-davidai@google.com
State New
Headers show
Series [v5,1/2] dt-bindings: cpufreq: add virtual cpufreq device | expand

Commit Message

David Dai Jan. 27, 2024, 12:43 a.m. UTC
Adding bindings to represent a virtual cpufreq device.

Virtual machines may expose MMIO regions for a virtual cpufreq device
for guests to read frequency information or to request frequency
selection. The virtual cpufreq device has an individual controller for
each frequency domain. Performance points for a given domain can be
normalized across all domains for ease of allowing for virtual machines
to migrate between hosts.

Co-developed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: David Dai <davidai@google.com>
---
 .../cpufreq/qemu,cpufreq-virtual.yaml         | 110 ++++++++++++++++++
 1 file changed, 110 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/cpufreq/qemu,cpufreq-virtual.yaml

Comments

Rob Herring (Arm) Feb. 2, 2024, 3:53 p.m. UTC | #1
On Wed, Jan 31, 2024 at 10:23:03AM -0800, Saravana Kannan wrote:
> On Wed, Jan 31, 2024 at 9:06 AM Rob Herring <robh@kernel.org> wrote:
> >
> > On Fri, Jan 26, 2024 at 04:43:15PM -0800, David Dai wrote:
> > > Adding bindings to represent a virtual cpufreq device.
> > >
> > > Virtual machines may expose MMIO regions for a virtual cpufreq device
> > > for guests to read frequency information or to request frequency
> > > selection. The virtual cpufreq device has an individual controller for
> > > each frequency domain. Performance points for a given domain can be
> > > normalized across all domains for ease of allowing for virtual machines
> > > to migrate between hosts.
> > >
> > > Co-developed-by: Saravana Kannan <saravanak@google.com>
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > Signed-off-by: David Dai <davidai@google.com>
> > > ---
> > >  .../cpufreq/qemu,cpufreq-virtual.yaml         | 110 ++++++++++++++++++
> >
> > > +    const: qemu,virtual-cpufreq
> >
> > Well, the filename almost matches the compatible.
> >
> > > +
> > > +  reg:
> > > +    maxItems: 1
> > > +    description:
> > > +      Address and size of region containing frequency controls for each of the
> > > +      frequency domains. Regions for each frequency domain is placed
> > > +      contiguously and contain registers for controlling DVFS(Dynamic Frequency
> > > +      and Voltage) characteristics. The size of the region is proportional to
> > > +      total number of frequency domains. This device also needs the CPUs to
> > > +      list their OPPs using operating-points-v2 tables. The OPP tables for the
> > > +      CPUs should use normalized "frequency" values where the OPP with the
> > > +      highest performance among all the vCPUs is listed as 1024 KHz. The rest
> > > +      of the frequencies of all the vCPUs should be normalized based on their
> > > +      performance relative to that 1024 KHz OPP. This makes it much easier to
> > > +      migrate the VM across systems which might have different physical CPU
> > > +      OPPs.
> > > +
> > > +required:
> > > +  - compatible
> > > +  - reg
> > > +
> > > +additionalProperties: false
> > > +
> > > +examples:
> > > +  - |
> > > +    // This example shows a two CPU configuration with a frequency domain
> > > +    // for each CPU showing normalized performance points.
> > > +    cpus {
> > > +      #address-cells = <1>;
> > > +      #size-cells = <0>;
> > > +
> > > +      cpu@0 {
> > > +        compatible = "arm,armv8";
> > > +        device_type = "cpu";
> > > +        reg = <0x0>;
> > > +        operating-points-v2 = <&opp_table0>;
> > > +      };
> > > +
> > > +      cpu@1 {
> > > +        compatible = "arm,armv8";
> > > +        device_type = "cpu";
> > > +        reg = <0x0>;
> > > +        operating-points-v2 = <&opp_table1>;
> > > +      };
> > > +    };
> > > +
> > > +    opp_table0: opp-table-0 {
> > > +      compatible = "operating-points-v2";
> > > +
> > > +      opp64000 { opp-hz = /bits/ 64 <64000>; };
> >
> > opp-64000 is the preferred form.
> >
> > > +      opp128000 { opp-hz = /bits/ 64 <128000>; };
> > > +      opp192000 { opp-hz = /bits/ 64 <192000>; };
> > > +      opp256000 { opp-hz = /bits/ 64 <256000>; };
> > > +      opp320000 { opp-hz = /bits/ 64 <320000>; };
> > > +      opp384000 { opp-hz = /bits/ 64 <384000>; };
> > > +      opp425000 { opp-hz = /bits/ 64 <425000>; };
> > > +    };
> > > +
> > > +    opp_table1: opp-table-1 {
> > > +      compatible = "operating-points-v2";
> > > +
> > > +      opp64000 { opp-hz = /bits/ 64 <64000>; };
> > > +      opp128000 { opp-hz = /bits/ 64 <128000>; };
> > > +      opp192000 { opp-hz = /bits/ 64 <192000>; };
> > > +      opp256000 { opp-hz = /bits/ 64 <256000>; };
> > > +      opp320000 { opp-hz = /bits/ 64 <320000>; };
> > > +      opp384000 { opp-hz = /bits/ 64 <384000>; };
> > > +      opp448000 { opp-hz = /bits/ 64 <448000>; };
> > > +      opp512000 { opp-hz = /bits/ 64 <512000>; };
> > > +      opp576000 { opp-hz = /bits/ 64 <576000>; };
> > > +      opp640000 { opp-hz = /bits/ 64 <640000>; };
> > > +      opp704000 { opp-hz = /bits/ 64 <704000>; };
> > > +      opp768000 { opp-hz = /bits/ 64 <768000>; };
> > > +      opp832000 { opp-hz = /bits/ 64 <832000>; };
> > > +      opp896000 { opp-hz = /bits/ 64 <896000>; };
> > > +      opp960000 { opp-hz = /bits/ 64 <960000>; };
> > > +      opp1024000 { opp-hz = /bits/ 64 <1024000>; };
> > > +
> > > +    };
> >
> > I don't recall your prior versions having an OPP table. Maybe it was
> > incomplete. You are designing the "h/w" interface. Why don't you make it
> > discoverable or implicit (fixed for the h/w)?
> 
> We also need the OPP tables to indicate which CPUs are part of the
> same cluster, etc. Don't want to invent a new "protocol" and just use
> existing DT bindings.

Topology binding is for that.

What about when x86 and other ACPI systems need to do this too? You 
define a discoverable interface, then it works regardless of firmware. 
KVM, Virtio, VFIO, etc. are all their own protocols.

> > Do you really need it if the frequency is normalized?
> 
> Yeah, we can have little and big CPUs and want to emulate different
> performance levels. So while the Fmax on big is 1024, we still want to
> be able to say little is 425. So we definitely need frequency tables.

You need per CPU Fmax, sure. But all the frequencies? I don't follow why 
you don't just have a max available capacity and then request the 
desired capacity. Then the host maps that to an underlying OPP. Why have 
an intermediate set of fake frequencies?

As these are normalized, I guess you are normalizing for capacity as 
well? Or you are using "capacity-dmips-mhz"? 

I'm also lost how this would work when you migrate and the underlying 
CPU changes. The DT is fixed.

> > Also, we have "opp-level" for opaque values that aren't Hz.
> 
> Still want to keep it Hz to be compatible with arch_freq_scale and
> when virtualized CPU perf counters are available.

Seems like no one would want "opp-level" then. Shrug.

Anyway, if Viresh and Marc are fine with all this, I'll shut up.

Rob
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/cpufreq/qemu,cpufreq-virtual.yaml b/Documentation/devicetree/bindings/cpufreq/qemu,cpufreq-virtual.yaml
new file mode 100644
index 000000000000..cd617baf75e7
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/qemu,cpufreq-virtual.yaml
@@ -0,0 +1,110 @@ 
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/cpufreq/qemu,cpufreq-virtual.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Virtual CPUFreq
+
+maintainers:
+  - David Dai <davidai@google.com>
+  - Saravana Kannan <saravanak@google.com>
+
+description:
+  Virtual CPUFreq is a virtualized driver in guest kernels that sends frequency
+  selection of its vCPUs as a hint to the host through MMIO regions. Each vCPU
+  is associated with a frequency domain which can be shared with other vCPUs.
+  Each frequency domain has its own set of registers for frequency controls.
+
+properties:
+  compatible:
+    const: qemu,virtual-cpufreq
+
+  reg:
+    maxItems: 1
+    description:
+      Address and size of region containing frequency controls for each of the
+      frequency domains. Regions for each frequency domain is placed
+      contiguously and contain registers for controlling DVFS(Dynamic Frequency
+      and Voltage) characteristics. The size of the region is proportional to
+      total number of frequency domains. This device also needs the CPUs to
+      list their OPPs using operating-points-v2 tables. The OPP tables for the
+      CPUs should use normalized "frequency" values where the OPP with the
+      highest performance among all the vCPUs is listed as 1024 KHz. The rest
+      of the frequencies of all the vCPUs should be normalized based on their
+      performance relative to that 1024 KHz OPP. This makes it much easier to
+      migrate the VM across systems which might have different physical CPU
+      OPPs.
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |
+    // This example shows a two CPU configuration with a frequency domain
+    // for each CPU showing normalized performance points.
+    cpus {
+      #address-cells = <1>;
+      #size-cells = <0>;
+
+      cpu@0 {
+        compatible = "arm,armv8";
+        device_type = "cpu";
+        reg = <0x0>;
+        operating-points-v2 = <&opp_table0>;
+      };
+
+      cpu@1 {
+        compatible = "arm,armv8";
+        device_type = "cpu";
+        reg = <0x0>;
+        operating-points-v2 = <&opp_table1>;
+      };
+    };
+
+    opp_table0: opp-table-0 {
+      compatible = "operating-points-v2";
+
+      opp64000 { opp-hz = /bits/ 64 <64000>; };
+      opp128000 { opp-hz = /bits/ 64 <128000>; };
+      opp192000 { opp-hz = /bits/ 64 <192000>; };
+      opp256000 { opp-hz = /bits/ 64 <256000>; };
+      opp320000 { opp-hz = /bits/ 64 <320000>; };
+      opp384000 { opp-hz = /bits/ 64 <384000>; };
+      opp425000 { opp-hz = /bits/ 64 <425000>; };
+    };
+
+    opp_table1: opp-table-1 {
+      compatible = "operating-points-v2";
+
+      opp64000 { opp-hz = /bits/ 64 <64000>; };
+      opp128000 { opp-hz = /bits/ 64 <128000>; };
+      opp192000 { opp-hz = /bits/ 64 <192000>; };
+      opp256000 { opp-hz = /bits/ 64 <256000>; };
+      opp320000 { opp-hz = /bits/ 64 <320000>; };
+      opp384000 { opp-hz = /bits/ 64 <384000>; };
+      opp448000 { opp-hz = /bits/ 64 <448000>; };
+      opp512000 { opp-hz = /bits/ 64 <512000>; };
+      opp576000 { opp-hz = /bits/ 64 <576000>; };
+      opp640000 { opp-hz = /bits/ 64 <640000>; };
+      opp704000 { opp-hz = /bits/ 64 <704000>; };
+      opp768000 { opp-hz = /bits/ 64 <768000>; };
+      opp832000 { opp-hz = /bits/ 64 <832000>; };
+      opp896000 { opp-hz = /bits/ 64 <896000>; };
+      opp960000 { opp-hz = /bits/ 64 <960000>; };
+      opp1024000 { opp-hz = /bits/ 64 <1024000>; };
+
+    };
+
+    soc {
+      #address-cells = <1>;
+      #size-cells = <1>;
+
+      cpufreq@1040000 {
+        compatible = "qemu,virtual-cpufreq";
+        reg = <0x1040000 0x10>;
+      };
+    };