5

Tweaking the internals of Intel Cryo Cooling – codeinsecurity

 2 years ago
source link: https://codeinsecurity.wordpress.com/2021/09/18/tweaking-the-internals-of-intel-cryo-cooling/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Tweaking the internals of Intel Cryo Cooling

Intel Cryo Cooling is an active cooling solution that uses a TEC, also known as a Peltier element, to cool the CPU. A TEC can pump heat from one side of it to the other, meaning that one side gets cold while the other gets hot. This is useful for situations where you want to reduce the heat of something below the ambient temperature. Many camping fridges utilise TECs for this purpose.

While TECs have previously been investigated for the purposes of sub-ambient cooling, they were generally considered infeasible. TECs require a significant amount of power to operate (tens of watts, typically) and all of that power turns into extra heat on the hot side. If this additional thermal power cannot be dissipated, the TEC saturates and the cold side gets hot too. Since CPUs are already fairly massive heaters, adding a TEC of sufficient power to make any difference requires far greater cooling capacity than a normal cooling system would need. Another issue with TECs is that when the CPU is at idle, producing little thermal power, the cold side of the TEC may drop below the dew point of the environment, and cause condensation. It’s also massively wasteful, because the TEC consumes a ton of power.

Intel have revisited TECs with a novel new approach. Instead of running the TEC flat out all the time, their Cryo solution modulates the TEC power to match the thermal output of the CPU, and monitors the TEC temperature to ensure that it does not drop below the dew point. This is a far more practical approach, because the TEC can be run at much lower power when the CPU is idle, and have its power ramped up to match the load without running into condensation issues. This alone wouldn’t have been all that useful 20 years ago, because the large thermal output of the CPU and TEC combined would have overwhelmed air coolers from that era, but these days you can buy custom water-cooling solutions that are capable of dissipating many hundreds of watts.

This solution is interesting to me from a purely technical standpoint. I have no need to cool my CPUs below ambient, as they’re Xeons that don’t support overclocking anyway. I first heard of this from a tech YouTube channel. In the video, one thing they wanted to try was running it on an AMD platform to see what happened. However, they found that the software prevented the installation on systems with unsupported CPUs, and even if they copied the installed program files across and registered the service manually, it performed a runtime CPU support check and refused to run. In the end, they connected the Intel Cryo controller hardware to a system running a supported CPU, but attached the TEC to the AMD system. This would obviously not work for any features where the TEC’s power tracks the CPU power output, but it did allow them to test in “Extreme” mode, which sets the TEC power to maximum while throttling to avoid condensation.

As mentioned, I have no use for the actual hardware. As such, I don’t own an Intel Cryo cooler. I have, however, taken a good look around the software. In order to do that, I had to get the executables out of the installer, which wasn’t happy because I don’t have a supported CPU.

What surprised me is that the list of supported CPUs, at time of writing, is actually quite minimal:

  • 11900K
  • 11900KF
  • 11700K
  • 11700KF
  • 11600K
  • 11600KF
  • 10900K
  • 10900KF
  • 10850K
  • 10700K
  • 10700KF
  • 10600K
  • 10600KF
  • 9900KS
  • 9900KF
  • 9900K
  • 9700KF
  • 9700K
  • 9600K
  • 8700K
  • 8086K

The installer executable contains VBScript scripts that perform the platform support checks. The script uses WMI to query Win32_Processor to find the name of the CPU installed in the system, trims out all of the extra terms like “Intel(R)” and “11th Gen”, then looks for one of the supported CPU names in the results. Install scripts set properties, e.g. RESULT_PROPERTY, to specify what the result of the script was. In this case the script works by first setting RESULT_PROPERTY to 0, then setting it to 1 later if a supported CPU was found. It is trivial to patch this out with a hex editor.

When the installer tries to install the driver without the Intel Cryo hardware being present, it will not start properly, and this will prevent the installation from continuing. This is also checked via a script in the installer, which checks to see if the IS_DRIVER_INSTALLED property is any string other than “true” and fails if so. Patching this check to always succeed (e.g. checking if IS_DRIVER_INSTALLED is equal to “boop”) resolves this problem and allows the installer to perform all the initial installation steps like extracting program files and registering services. The installer will attempt to start the usermode service, which will fail, but if you just kill the installer or leave it running then it won’t roll back the installation and you’ll have everything you need to poke around.

The Intel Cryo hardware effectively consists of a variable power driver for the TEC, a microcontroller running a PID controller, and USB interface (implemented via a SiLabs CP210x) that can be communicated with like a serial port. The setpoint of the PID is applied in realtime by a usermode service running on the computer. The service derives the PID using an exponentially weighted moving average (EWMA) model, which Intel says is derived from machine learning. There are two defined models: one for CPU SKUs with a “K” suffix, and another for CPU SKUs with a “KF” suffix. The EWMA model takes the min, max, and mean CPU temperatures, package power, IA core power, integrated GPU power (on “K” suffix SKUs), and some thermal margin. Perhaps support for reading these hardware performance registers is what limits CPU support. Intel appear to be cagey about this model, because they obfuscated it in later releases. I’m not interested in poking a rather litigious bear here, so I won’t share the EWMA model coefficients.

If you do have the hardware, but your CPU isn’t supported, you might be able to get it running by patching the service executable to make it ignore the CPU compatibility check. I have no idea if it’ll just crash afterwards, or improperly choose a setpoint due to missing hardware performance information, or even brick the device somehow, so obviously there’s absolutely no warranty here.

Most of the Windows application code is .NET, making it fairly easy to reverse engineer. Newer versions of the software are obfuscated, but older ones are not, making it quite easy to reference the old code to find functionality in the new code. In all versions I’ve looked at, the CryoCoolingService.exe executable contains a class called IntelCryoCooling.ControllerService, which has a method called CheckProcessor that returns a bool representing whether or not the CPU is supported. This check can be patched by simply replacing the ldc.i4.0 opcode at the very start of the function’s IL with ldc.i4.1, which changes the result variable’s default state from false to true, effectively meaning that it always returns true. This alone is sufficient, but if you change the second instruction to ret, this will ensure that all the function does is return true no matter what changes Intel make in the future. You can make these patches using ildasm / ilasm, or one of the .NET decompiler tools that supports the Reflexil plugin.

These changes should be enough to get you going if you just want to try using the stock software on an unsupported CPU.

What’s interesting to me is that the PID values appear to be completely fixed. While there is facility to change them on the fly, I was not able to find any location in which they were set after the initial configuration. It is possible that the PID tuning occurs on the device, but that seems unlikely. By default the tuning is simply P=100, I=1, D=0, and these values are sent to the device as 32-bit IEEE floating point numbers.

If you want to play around with the internal PID settings at runtime, using Intel’s own code, you can load CryoCoolingNotifications.exe as a dependency in a .NET Framework application and utilise the CryoCoolingServiceClient class in the IntelCryoCooling.CondensationControlServiceRef namespace to talk to the running service using WCF. This class, the service interfaces, and other types you need to make this work are all public, so you can consume them for your own use. They give you direct control over PID tuning, set points, and other hardware control variables. If Intel later release a version where these types are not public, you can simply change the visibility using ildasm/ilasm or with something like Reflexil.

If you want to write your own client to directly talk to the hardware, it’s fairly easy. The serial config is 115200 baud, 8-bit data, 1 stop bit, no parity. The protocol works using 8-byte packets, the structure of which is as follows:

  • One byte of value 0xAA
  • One byte operation code
  • Four operand bytes, padded with zeroes if the operand is smaller than that
  • A CRC16-CCITT checksum of the preceding bytes, in big-endian

Replies from the device are of the same format, with the 127 added to the operation code. So, if you sent a command with opcode 10, you’d get back opcode 137.

The opcodes are as follows:

  • heartbeat = 0
  • getTecTemperature = 1
  • getHumidity = 2
  • getDewPoint = 3
  • getSetPointOffset = 4
  • getPCoefficient = 5
  • getICoefficient = 6
  • getDCoefficient = 7
  • getTecPowerLevel = 8
  • getHwVersion = 9
  • getFwVersion = 10
  • setSetPointOffset = 20
  • setPCoefficient = 21
  • setICoefficient = 22
  • setDCoefficient = 23
  • setLowPowerMode = 24
  • setCPUTemp = 25
  • setNtcCoefficient = 26
  • getNtcCoefficient = 27
  • setTempSensorMode = 28
  • setTecPowerLevel = 29
  • resetBoard = 30
  • getBoardTemp = 31
  • getVoltageAndCurrent = 34
  • getTecVoltage = 35
  • getTecCurrent = 36

Only the getVoltageAndCurrent opcode response really requires any parsing. The first byte is the voltage multiplied by 21.1, giving a voltage range of 0-12V, and the second byte is the current multiplied by 4.6545, which coincides with a current limit of 54.7A. The current will probably never reach that high, and they’ve just extended the range of their current sensor so they have some safety margin. The software calculates both the power (multiply voltage by current) and the internal resistance (divide voltage by current) of the TEC after receiving this response packet.

The voltage and current values returned by getTecVoltage and getTecCurrent are 32-bit integers, but these again are just scalled by 21.1 and 4.6545, so their actual value range in practice is 0-255 (8-bit) instead of the full 32-bit space. It’s not clear why there are two separate methods to get the values, because in the code the combined call is only used for heartbeats. Perhaps there is some difference in the measurement time between them, or one of them triggers a measurement whereas the other uses the last periodic measurement. I don’t have the hardware, so I can’t check.

The heartbeat command response returns a value full of status flags, one bit each. The meanings of the bits are as follows:

  • 0 = Board initialisation completed
  • 1 = Power supply OK
  • 2 = TEC thermistor reading in range
  • 3 = Humidity sensor reading in range
  • 4 = Last received command OK
  • 5 = Last received command had a bad CRC
  • 6 = Last received command is incomplete
  • 7 = Failsafe has been activated
  • 8 = PID constants were loaded and accepted
  • 9 = PID constants were rejected
  • 10 = Set point for the PID is out of range
  • 11 = Default set point was loaded
  • 12 = PID is running
  • 13 = Overcurrent protection has been triggered
  • 14 = Board temperature is in range
  • 15 = TEC connection OK
  • 16 = Low power mode enabled
  • 17 = Input temperature mode (0 = digital temperature sensor, 1 = NTC thermistor)

This should be enough to implement your own interface, if you have the hardware.

Advertisements
Report this ad

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK