Thermodynamic Computing: How to Save Machine Learning by Replacing Transistors

One of the hottest areas in technology today is machine learning. The rate of development from month to month is astonishing. Architectures are rapidly evolving, andMoore's Law can no longer keep pace. But what's even worse, Moore's Law itself is about to hit a wall.

The essence of Moore's Law is that the number of transistors on integrated circuits approximately doubles every eighteen months. This trend has held for a long time, as technological advancements have enabled the production of ever-smaller and more efficient transistors.

However, we've reached a physical limit where the laws of physics give us a reality check. Beyond a certain point, making smaller transistors is no longer possible. They don't work because electrons leak out of them. The only solution is to look for alternative architectures.

One direction is the creation of specialized hardware, such asASICs (Application-Specific Integrated Circuits). Recently, there was headline news about Groq's LPU, a specialized processor designed specifically for large language models (LLMs). It is capable of running LLMs much faster and with much less energy consumption than traditional GPUs. Although this circuit is made with conventional technology, it is optimized for the task, which accounts for its significant speedup. Such specialized hardware can exploit the reserves in current technology, but the collision of Moore's Law with its limits also poses a constraint here.

The company Extropic came forward with a bolder idea. Their solution would banish transistors and replace them with something else. Like Google, the company was founded by two guys: Guillaume Verdon and Trevor McCourt. Both worked in the field of quantum computing before founding the company, and their chip lies somewhere halfway between traditional integrated circuits and quantum computers.

As I mentioned before, the biggest limitation of traditional integrated circuits is that transistors cannot be miniaturized indefinitely. In very small domains, the rules of quantum mechanics significantly impact particles, making transistor operation impossible. In contrast, quantum computers leverage the effects of quantum mechanics. The greatest challenge with quantum computers is that qubits must be very well isolated from the environment. Even minimal noise can disrupt the system's operation. This is why quantum computers need to be cooled to absolute zero degrees to prevent the surrounding heat from interfering with their function.

Extropic's brilliant idea is to, instead of fighting against environmental noise and heat, make use of them. This is where the term "Thermodynamic computing" comes from.

In the field of machine learning,normal distribution random noise is used extensively. One example of this is stable diffusion, which is also used by the Midjourney image generator. Here, a bit of noise is added to the image at every step until the entire image becomes noise. The neural network is then trained to reverse this process, that is, to generate an image from noise based on the given text. If the system is trained with enough images and text, it will be capable of generating images from random noise based on text. This is how Midjourney operates.

In fact, the use of noise is even more widespread. In any neural network, the initial weights are usually initialized with random noise. During training,backpropagation shapes the final weights from this random noise. Thus, a similar process occurs in every neural network as in stable diffusion: the final model emerges from noise through the modification of weights during training.

On current computer architectures, this random distribution can only be emulated with circuits made of deterministic transistors. In contrast, Extropic's circuit does this in an analog manner. The starting state is completely random, normally distributed thermal noise. Through programming the circuit, this noise can be modified within each component. Instead of transistors, analog weights take their place, which are noisy, but the outcome can be determined through statistical analysis of the output. The guys call this probabilistic computing.

These analog circuits are much faster and consume much less energy, and since the thermal noise is not only non-disruptive but actually an essential component of the operation, they do not require the special conditions needed by quantum computers. The chips can be manufactured with existing production technology, so they could enter the commercial market within a few years.

As we have seen from the above, Extropic's technology is very promising. However, what personally piqued my interest is that it is more biologically plausible. Of course, I don't think that the neurons in artificial neural networks have anything to do with human brain neurons. These are two very different systems. However, it is clear that the human brain does not learn throughgradient descent. Biological learning is something entirely different, and randomness certainly plays a significant role in it.

In fact, in biology and nature, everything operates randomly. What we see as deterministic at a high level is actually just what statistically stands out from many random events. This is how, for example, many living beings (including us humans) came to be through completely random evolution yet are built with almost engineering precision. I suspect that the human brain operates in a similar way to evolution. A multitude of random events within a suitably directed system, which we perceive from the outside as consistent thinking. This is whygenetic algorithms were so intriguing to me, and now I see the same principle in Extropic's chip.

If you are interested, check the company homepage or this interview with the founder guys.