The Mind-Bending World of Heuristic Emergence Might Mean We're Artificial

Imagine winning the lottery!

I know. What are the chances, right?

For a sec, humor me.

Pretend we’re playing to win the Powerball jackpot. In your mind, pick 5 correct numbers plus the correct Powerball. Calculating the odds is not hard provided you have a calculator handy:

5/69 x 4/68 x 3/67 x 2/66 x 1/65 x 1/26 = 120/35 064 160 560 = 1/292 201 338

For argument’s sake let’s say it’s two hundred and ninety million to one. Two hundred and ninety million can be written in a far more succinct way using scientific notation:

2.9 x 10⁸

That’s 2.9 multiplied by 10 eight times in a row.

That’s a pretty big number. So big that it’s easy to simply discard it as a “big number” without really appreciating how big it is.

To get some perspective we can compare it to, say, the chances of being struck by lightning. Most of us aren’t particularly worried about being struck by lightning because the chances are too small.

According to National Geographic, the chances of being struck by lightning in the U.S. over the course of a single year are 700 000 to 1, or 7 x 10⁵.

So how do the odds stack up?

You have to live for a shade over 417 years before the odds of being struck by lightning match the odds of winning the lottery.

With me so far?

Great, that’s our warm-up done. Let’s start looking at some real numbers.

Off the top of your head, what’s the smallest thing you can think of?

Did Hydrogen atom come to mind?

Ok sure, there are plenty of things smaller than a Hydrogen atom — electrons, quarks, a host of sub-atomic particles, the Planck length, and so on.

For our purposes a single atom of Hydrogen will suffice. Primarily because it is the most abundant element in our Universe (assuming there isn’t Dark Hydrogen) so it makes a great building block for big number analogies.

To get an idea of just how small Hydrogen is, we can think about how many molecules of water H2O there are in a single 250ml cup. Using something called the molar mass of water we known that 18 grams of water contains approximately:

6.022 x 10²³

molecules of water. Let’s assume we have a 250ml cup. We know the total number of molecules will be:

250/18 x 6.022 x 10²³ = 8.36 x 10²⁴

Of course, a water molecule has two Hydrogen atoms, so our total amount of Hydrogen atoms comes to a grand total of:

1.67 x 10²⁵

Now that’s a real number.

Estimates based on the volume of all the water in all the oceans on Earth put the number of 250ml cups of water in the oceans at:

5 x 10²¹

This yields a rather startling result.

There are more molecules in a cup of water than there are cups in all the water.

We can go ahead and calculate the total number of all Hydrogen atoms in all the water on Earth by simple multiplication:

1.67 x 10²⁵ x 5 x 10²¹ = 8.35 x 10⁴⁶

That’s getting up there, but we can still think bigger.

The Sun.

Around 99.8% of all the mass in our solar system is held in the Sun. It’s pretty big.

It’s also mainly Hydrogen (although it has been fusing this into Helium and other heavier elements), which makes it perfect for our purposes. Estimates for the number of Hydrogen atoms in the Sun sit at slightly over:

10⁵⁷

You might be tempted to think that the number of Hydrogen atoms in all the Earth’s water is pretty close to that number. It’s only about 10¹⁰ less after all. In fact, this difference is still about 34 times the odds of winning the lottery, so it’s bigger than it appears.

Still, the Sun is only one star in a pretty big galaxy.

Our Milky way is a barred spiral galaxy approximately 100 000 light years across containing approximately 100 billion solar masses. Approximating the number of Hydrogen atoms in our entire galaxy is simple:

10⁵⁷ x 10¹¹ = 10⁶⁸

The total number of stars in the entire Universe is estimated at 10²³. Putting the total number of Hydrogen atoms in all the stars that exist:

10⁵⁷ x 10²³ = 10⁸⁰

Now that’s a pretty rough estimate, but it’s also a pretty big number.

Except that it’s not.

It’s a tiny number. Negligible. Minuscule.

Compared to the complexity of NP-Hard (Non-Deterministic Polynomial Hard) problems, it’s nothing.

10⁸⁰ would have to go to Kindergarten, school, college and get ten years of work experience before it could even be compared to the most modest NP-Hard problems.

One famous type of NP-Hard problem is known as the Travelling Salesman Problem (TSP). Essentially the goal is to work out the most efficient way to visit a bunch of locations. For real world purposes we generally want to consider a slight variation known as the multiple Travelling Sales Problem, or mTSP.

The mTSP has a lot of practical applications, most notably in the field of route optimization. Delivery companies want to ensure that their fleet of vehicles is operating as efficiently as possible in order to save time and money on fuel, labor, wear and tear, and so on.

Consider a small delivery company with a fleet of 5 vehicles and 100 deliveries to fulfill. By no means an unreasonable real-world scenario.

For this type of problem there is no way to know whether or not a given solution is the best without checking every single possibility.

On the face of it this doesn’t seem like it should be too hard. All we need to do is work out the number of possibilities, try each one and pick the best.

To work out how many potential solutions there are we can repeat the following process until all the locations have been visited:

1. Pick 1 of 5 vehicles to move to 1 of the 100 locations.

2. Pick 1 of 5 vehicles to move to 1 of the remaining 99 locations.

3. Pick 1 of 5 vehicles to move to 1 of the remaining 98 locations.

4. …

100. Pick 1 of 5 vehicles to move to the final remaining location

To calculate this would be a pretty long formula:

(1/5 x 1/100) x (1/5 x 1/99) x (1/5 x 1/98) x (1/5 x 1/97) x … x (1/5 x 1/1)

To get the total number of permutations we can write this far more succinctly:

Vᴸ x L!

Where V is the number of vehicles and L is the number of locations. The exclamation mark is shorthand for writing out the sequence of multiplication (in our case starting at 100):

100 x 99 x 98 x 97 x … x 1

Plugging in the number of vehicles as 5 and the number of locations as 100 we get the following number:

7.88 x 10⁶⁹ x 9.33 x 10¹⁵⁷= 7.36 x 10²²⁷

Bear in mind that the total number of Hydrogen atoms in all the stars in all the galaxies in the entire known Universe only comes to 10⁸⁰. Even if we’ve underestimated the size of the Universe by a factor of one million the number only goes up to 10⁸⁶.

We don’t really have the language to describe how much smaller that number is than the complexity of our run-of-the-mill mTSP.

With that many permutations to check we’re going to need to start thinking about how long it might take us.

The SUMMIT supercomputer built by IBM for the Oak Ridge National Laboratory has the fastest processing capability in history (for now). This is measured in something called FLOPS, which stands for Floating Point Operations Per Second.

SUMMIT manages a respectable 200 petaflops. Peta denotes 10¹⁵, so we can write the number of petaflops from SUMMIT as 2 x 10¹⁷. Impressively fast.

Let’s be extremely generous and assume that a single flop is equivalent to a complete check of one possibility in our problem. Essentially, this means we could test 2 x 10¹⁷ possibilities every single second.

After calling in some favors, we secure some processing time, load up the problem and go grab a cup of coffee while we wait. And wait. And wait. And …

Here’s the thing. At 2 x 10¹⁷ checks per second we’re going to take:

7.36 x 10²²⁷ / 2 x 10¹⁷ = 3.68 x 10²¹⁰ seconds

Divide by the number of seconds in a minute, minutes in an hour, hours in a day, days in a year and we get:

1.16 x 10²⁰³ years

This is going to make us unpopular with other scientists waiting for their turn.

Our best estimates tell us that the Universe is currently 13.8 billion years old and that it will likely last another 5 billion years — regardless of whether it’s going to be a heat death, big crunch or big tear.

5 billion years, or 5 x 10⁹, is what we have. 1.16 x 10²⁰³ is what we need to complete our calculations. That’s a bit depressing.

Trillions of SUMMIT supercomputers with trillions of years more than the lifespan of the Universe would not be able to check each permutation in our modest size NP-Hard problem.

We’re going to need to find a different way of tackling this problem in order to help the delivery company lower their costs.

One possibility would be to use quantum computing as this would allow us to check (theoretically) infinite possibilities simultaneously. Unfortunately, we’re not quite production ready so that might still be a few years off.

We’re stuck with traditional computing for the moment.

If the hardware can’t change, our methods must.

Instead of using brute force to check every possibility we can try comparing guessed solutions (approximations) over and over. If we’re sensible about how to generate those guesses and clever about the lessons derived from the comparisons, it might be possible to produce a really good outcome (even if we can never prove it’s the best possible outcome).

Algorithms that compare alternatives (as opposed to traditional algorithms that are procedural — i.e. take predefined set of steps to arrive at an outcome) are known as heuristic algorithms. These emerge solutions over time and often take their inspiration from nature.

The word emerge here is pivotal.

Emergence occurs when a system exhibits properties not observed in any of its constituent parts.

Our solar system is a good example. From a cloud of gas and dust acted on by gravity, emerged a star with planets and moons, and life itself — none of which existed initially but emerged over time.

Ant Colony Optimization (ACO) mimics the behavior of ants foraging for resources around their nest. Initially ants head out in all directions. If one finds food it heads back to the nest, leaving a faint trail of pheromones that may be picked up by successive waves of ants.

If other ants find food or resources in the same place, they also leave a trail of pheromones leading back to the nest. Over time these pheromone trails build up causing more and more ants to follow them directly. After a while very direct and efficient lines of transport emerge between the nest and surrounding resources.

How might this apply to the mTSP?

One way would be to make 10 (or a hundred, or a thousand) random guesses at the solution and record how much each one cost. The lowest cost guess could have each constituent route marked with a digital pheromone.

Repeat this process again and again to build up ever stronger trails of digital pheromones that can start influencing successive guesses ever so slightly — with a bias towards lower cost (since we only ever add pheromones to the lowest cost result from each iteration).

After hundreds, thousands, or tens of thousands of iterations fairly strong pheromone trails build up over what (hopefully) turns out to be a pretty efficient solution.

ACO is not the only heuristic algorithm we could use. Simulated annealing (annealing is the process of successively heating and working metal as it cools to remove weaknesses in the crystalline arrangement of atoms) and genetic algorithms also add their own unique heuristic opportunities to generate better solutions.

Different heuristic algorithms come with their own unique set of strengths and weaknesses so combining them can help produce better solutions faster.

It helps to be able to picture what 5 vehicle 100 location problem might look like in the real world.

5 vehicle 100 location route optimization map courtesy of Optergon

This screenshot shows 100 locations dotted around London (in fact, it’s a big list of museums for anyone interested in a whirlwind historical tour).

Note that, as in the real world, there are limitations placed on the formulation of the problem itself.

For example, the vehicles have operating hours. In this case, between 9am and 4pm. They have costs associated with them — Fixed, Distance & Time. In this example, distance costs (including fuel and wear and tear, etc) are set to $1 per kilometer and time costs (such as driver pay) set to $10 per hour. These would change to match the specific operational costs of the company concerned (for example, lightweight pickups might have lower distance-based costs compared to large trucks).

Locations have a time associated with them. It’s important to take into account the amount of time a vehicle would need to stop at a location in order to fulfill its task (i.e. a delivery, pickup or service).

Here’s the result.

5 vehicle 100 location route optimization solution courtesy of Optergon

There some interesting characteristics of this result that may not be immediately apparent.

First, it is possible to visit all the locations with the vehicles available in the time-frame given. This is important. Companies that are not able to optimize their routes well need to expand their fleet in order to manage their workload at a significantly greater cost.

Second, there is some overlap in the routes taken by each vehicle. More often than not we intuitively expect each vehicle to work within its own little partition of the map. In fact, this is often the result of short-cuts built into optimization algorithms intended to reduce the complexity of the problem.

In essence, the complexity of an mTSP problem may be drastically reduced by dividing up the map and optimizing V smaller problems (where V is the number of vehicles). Assigning each vehicle, a partition of the map to divide the locations more or less equally gives us the following complexity (only roughly, since there are many ways to partition a surface):

(L/V)!ⱽ

Plugging in 5 vehicles and 100 locations gives:

(100/5)!⁵ = 20!⁵ = 8.5 x 10⁹¹

A negligible fraction of the complexity of the initial formulation of the problem that came to 7.36 x 10²²⁷.

That’s a massive reduction in the size of the potential solution space. However, it doesn’t come for free. There’s always a trade-off. What you gain in reduced complexity may be lost in accuracy since you are drastically limiting the solution space available to explore.

There are many scenarios in which optimized routes will cross-over and/or overlap. Especially in a city like London where there are numerous one-way streets, arterial routes that move significantly faster than smaller, more direct routes, restrictions on time, distance and vehicle capacities, and so on.

The particular problem we have used here has a large, tight cluster of locations in the center of town with more distant outlying points surrounding the central cluster. Since there are more locations in the dense, central cluster than one vehicle can cater for, more than one needs to visit the same area.

The two vehicles that handle the bulk of this dense cluster need to travel up the same highway from the depot, before visiting their respective locations.

Partially differentiated routes courtesy of Optergon

The three other vehicles handle the bulk of the outlying locations while picking up a few of the clustered locations as part of their longer (in terms of distance) routes.

Partially overlapping routes courtesy of Optergon

What would you expect if the vehicles were allowed more time? Instead of working from 9am — 4pm they could work from 9am — 5pm.

Here’s the result.

Reduced cost & vehicle routes courtesy of Optergon

This solution is about 5% cheaper. Less vehicles were required to meet the demands of the problem posed. At the very least, using less vehicles reduces the redundant time and distance traveled to and from the depot.

Again, the result shows that it is possible for the company to meet their objectives within the specific time-frame using only 4 vehicles — even though 5 were available. Another important result since it shows how a smaller, more efficient fleet can potentially lower costs.

Until now, each of the vehicles used incurred precisely the same cost. In reality there may be significant variation over a large fleet of different vehicle types. Since this entire optimization is an exercise in reducing costs, it’s important to see the effect of non-uniform costs.

Let’s assume that one vehicle is a bit older and tends to have worse fuel consumption. Its distance cost moves up to $1.50 per kilometer. Another vehicle is being driven by a replacement driver who is on time-and-a-half, $15 per hour.

The new problem formulation looks like this.

Unique time and costs parameters courtesy of Optergon

Green Pickup I (top of the list) now has a distance cost of $1.50 per km. Blue Pickup III (second in the list) now has an hourly cost of $15. What, if any, differences do you expect in the result?

Here’s the new result.

Distance cost optimization result courtesy of Optergon

A slightly more expensive result. To be expected since costs increased.

Green Pickup I (with 50% increase in distance costs) was utilized but only traveled a total of 49km. Less than half the distance of any other vehicle.

Blue Pickup III was not used at all.

Recall we have five vehicles available so only one of the more expensive vehicles had to be used in order to successfully meet all the objectives.

Can you guess what might happen if we reduce the time-frame back to 9am — 4pm in order to force all five vehicles to be used?

Here’s the result.

Asymmetric distance & time cost optimization result courtesy of Optergon

It’s easy to see that while Blue Pickup III had to be used it spent roughly 30% less time operating than any other vehicle. While Green Pickup I traveled significantly less distance than any other vehicle, further reducing costs.

Nuances between distance and time costs can play a significant role in shaping the optimal result. Both are important. Factoring in two cost parameters adds additional complexity of the system since it must now balance asymmetric cost conditions.

On the face of things these results seem agreeable. They’re what you’d expect, right? Common sense.

Think about this for a second.

The underlying algorithms have no concept of common sense. They only manipulate and compare alternatives. Yet, with unwavering accuracy they will emerge results we expect — except in cases where they outsmart us (most likely due to conditions we could not possibly foresee at the start of the problem).

Agreeing with heuristic algorithms says more about us than it does about the algorithms.

Humans cannot hope to perform the calculations required to optimize these types of problem. This doesn’t prevent us from intuitively knowing what to expect using a wide array of abstractions, tricks and mental leaps we take for granted.

Turns out that we’re heuristics in the flesh. Our brains implement them, and both our brains and bodies are evolved using them.

An everyday task like catching a ball is a skill we have to emerge by practicing it over and over again in order to build up a significant number of “alternatives” that we compare to our current situation in real-time in order to predict the most likely outcome (and avoid getting hit on the nose).

Heuristics aren’t only used by our minds; they’re used to build our minds.

DNA is the product of millions of years of genetic heuristics that emerge new and well-adapted individuals, likely to survive and procreate.

What’s weird here is that one natural heuristic process (i.e. genetic evolution) emerged another, organic heuristic process (common sense, hard-wired into our brains).

If nature has been using heuristics and emergence to produce intelligent life from nothing more than gravity and clouds of gas, perhaps tackling NP-Hard problems using heuristics is a gateway to helping us produce something far more profound than solutions to the mTSP.

Right now, heuristic techniques can help you be more effective in a number of day-to-day ways. Especially when it comes to solving creative problems (that can’t be procedurally brute forced). The SubMerge Technique, explained in section 12. of How to Make Money Blogging is a great example of how an heuristic methodology can be applied to a range of everyday creative problems - such as coming up with new article ideas.

Ultimately, however, heuristics might be a great way to emerge artificial intelligence (artificial common sense, if you like).

“Iterative changes acted on by selective pressures” is the exact scenario under which evolution emerges.

With only very simple rules governing which heuristic programs “live” or “die” based on their fitness-for-purpose, such as those codified into Conway’s Game of Life, a rich and diverse ecosystem of rapidly evolving programs may establish themselves.

A good example of fitness-for-purpose in this instance might be the design of better chips, better architecture, creativity, more accurate weather pattern predictions, stock market changes, and so on.

A diverse ecosystem of programs competing to be of value to us would provide one potential definition of fitness-for-purpose for a burgeoning ecosystem. Fail to be useful and lose processing resources. Be of use and capture additional resources. An analog for competition (food & resources, such as drinking water, salt, shelter, etc) in the natural world.

Heuristic emergence in software would become the driver of artificial evolution that may ultimately lead to self-aware artificial entities in the same way that physical heuristics and emergence lead to self-aware natural organisms like us.

The only difference being that evolution in software would occur rapidly.

Extremely rapidly.

What took nature millions of years might take software months, then weeks, then days…

Whether self-aware software would want to explore our physical Universe, building Von Neumann machines to manufacture wormholes and/or ships with FTL travel (hopefully taking us along for the ride), or simply inhabit its own virtual Universe is anyone’s guess.

The former option would seem unnecessarily archaic for software unless it was concerned about solar system level extinction events, so I favor the latter (virtual Universe) option.

The Fermi paradox would suggest this is the most likely scenario, anyway.

This leads to an extremely unsettling conclusion.

Consider this principle:

One element of an unordered set of known size greater than one should not be assumed set-wise unique.

Full disclosure: I made that principle up. It may well exist in one form or another elsewhere.

To demonstrate it in action, imagine a large tin of cookies.

The first cookie you grab has chocolate chips.

Is it reasonable to assume it is unique and that no other cookies in that large tin have chocolate chips? Or is it more reasonable to assume that at least some other cookies have chocolate chips?

We can apply the same argument to life in the Universe.

Consider a Universe consisting of around 10²⁴ planets. The very first planet observed (called Earth) has life in abundance. Should you assume it is unique, or is it more likely that at least some other planets have life?

We can apply this principle to Universes in turn.

Consider the following scenario (for argument’s sake, let’s call it the heuristic intelligence Universe scenario):

Heuristic intelligence emerges in one Universe sufficient to create an artificial Universe. This implies a set of Universes greater than one (it is not necessary for this to have happened here on Earth — only that it is possible for it to happen anywhere in the Universe).

With a set of known Universes greater than one, we should not assume uniqueness.

Therefore, our Universe is probably not the first one (since this would make it set-wise unique).

This leaves us with the mildly vertiginous conclusion that we may exist in some iteration of the above-mentioned process in which artificial Universes are emerged by intelligences emerged from heuristic processes.

David contributes to SME Pals - a blog for online startups - and consults to a wide array of technology startups, including Optergon. He can't shake the feeling we're running in a simulation.