Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. A randomized algorithm or probabilistic algorithm is an Algorithm which employs a degree of randomness as part of its logic A metaheuristic is a Heuristic method for solving a very general class of computational problems by combining user-given black-box procedures usually heuristics Global optimization is a branch of Applied mathematics and Numerical analysis that deals with the optimization of a function or a set In Mathematics, a global optimum is a selection from a given domain which yields either the highest value or lowest value (depending on the objective when a specific The Mathematical concept of a function expresses dependence between two quantities one of which is given (the independent variable, argument of the function It is often used when the search space is discrete (e. g. , all tours that visit a given set of cities). For certain problems, simulated annealing may be more effective than exhaustive enumeration â provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution. In Computer science, brute-force search or exhaustive search, also known as generate and test, is a trivial but very general problem-solving technique

The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. Annealing, in Metallurgy and Materials science, is a Heat treatment wherein a material is altered causing changes in its properties such as strength Metallurgy is a domain of Materials science that studies the physical and chemical behavior of metallic elements, their intermetallic compounds, and their In Materials science, a crystal is a Solid in which the constituent Atoms Molecules or Ions are packed in a regularly ordered repeating Crystalline solids have a very regular atomic structure that is the local positions of atoms with respect to each other are repeated at the atomic scale The heat causes the atoms to become unstuck from their initial positions (a local minimum of the internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one. History See also Atomic theory, Atomism The concept that matter is composed of discrete units and cannot be divided into arbitrarily tiny In Thermodynamics, the internal energy of a Thermodynamic system, or a body with well-defined boundaries, denoted by  U, or sometimes

By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random "nearby" solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly "downhill" as T goes to zero. The allowance for "uphill" moves saves the method from becoming stuck at local minimaâwhich are the bane of greedier methods. In Mathematics, maxima and minima, known collectively as extrema, are the largest value (maximum or smallest value (minimum that A greedy algorithm is any Algorithm that follows the Problem solving Metaheuristic of making the locally optimum choice at each stagewith the hope

The method was independently described by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983, and by V. ÄernĂ˝ in 1985. The method is an adaptation of the Metropolis-Hastings algorithm, a Monte Carlo method to generate sample states of a thermodynamic system, invented by N. Metropolis et al in 1953. In Mathematics and Physics, the Metropolis-Hastings algorithm is a method for creating a Markov chain that can be used to generate a sequence of Monte Carlo methods are a class of Computational Algorithms that rely on repeated Random sampling to compute their results Nicholas Constantine Metropolis ( June 11, 1915 &ndash October 17, 1999) was a Greek American Mathematician, Physicist

## Overview

In the simulated annealing (SA) method, each point s of the search space is analogous to a state of some physical system, and the function E(s) to be minimized is analogous to the internal energy of the system in that state. In Physics the word system has a technical meaning namely it is the portion of the physical Universe chosen for analysis In Thermodynamics, the internal energy of a Thermodynamic system, or a body with well-defined boundaries, denoted by  U, or sometimes  The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.

### The basic iteration

At each step, the SA heuristic considers some neighbour s' of the current state s, and probabilistically decides between moving the system to state s' or staying put in state s. Probability is the likelihood or chance that something is the case or will happen The probabilities are chosen so that the system ultimately tends to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.

### The neighbours of a state

The neighbours of each state (the candidate moves) are specified by the user, usually in an application-specific way. For example, in the traveling salesman problem, each state is typically defined as a particular tour (a permutation of the cities to be visited); and one could define the neighbours of a tour as those tours that can be obtained from it by exchanging any pair of consecutive cities. The Travelling salesman problem ( TSP) in Operations research is a problem in discrete or Combinatorial optimization. In several fields of Mathematics the term permutation is used with different but closely related meanings

### Acceptance probabilities

The probability of making the transition from the current state s to a candidate new state s' is specified by an acceptance probability function P(e,e',T), that depends on the energies e = E(s) and e' = E(s') of the two states, and on a global time-varying parameter T called the temperature. In Automata theory and Sequential logic, a state transition table is a table showing what state (or states in the case of a nondeterministic finite automaton

One essential requirement for the probability function P is that it must be nonzero when e' > e, meaning that the system may move to the new state even when it is worse (has a higher energy) than the current one. It is this feature that prevents the method from becoming stuck in a local minimumâa state that is worse than the global minimum, yet better than any of its neighbours.

On the other hand, when T goes to zero, the probability P(e,e',T) must tend to zero if e' > e, and to a positive value if e' < e. That way, for sufficiently small values of T, the system will increasingly favor moves that go "downhill" (to lower energy values), and avoid those that go "uphill". In particular, when T becomes 0, the procedure will reduce to the greedy algorithmâwhich makes the move only if it goes downhill. A greedy algorithm is any Algorithm that follows the Problem solving Metaheuristic of making the locally optimum choice at each stagewith the hope

In the original description of SA, the probability P(e,e',T) was defined as 1 when e' < e â i. e. , the procedure always moved downhill when it found a way to do so, irrespective of the temperature. Many descriptions and implementations of SA still take this condition as part of the method's definition. However, this condition is not essential for the method to work, and one may argue that it is both counterproductive and contrary to its spirit.

The P function is usually chosen so that the probability of accepting a move decreases when the difference e' â e increasesâthat is, small uphill moves are more likely than large ones. However, this requirement is not strictly necessary, provided that the above requirements are met.

Given these properties, the evolution of the state s depends crucially on the temperature T. Roughly speaking, the evolution of s is sensitive to coarser energy variations when T is large, and to finer variations when T is small.

### The annealing schedule

Another essential feature of the SA method is that the temperature is gradually reduced as the simulation proceeds. Initially, T is set to a high value (or infinity), and it is decreased at each step according to some annealing scheduleâwhich may be specified by the user, but must end with T = 0 towards the end of the allotted time budget. In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower and narrower; and finally move downhill according to the steepest descent heuristic. For the analytical method called "steepest descent" see Method of steepest descent.

 Example illustrating the effect of cooling schedule on the performance of simulated annealing. The problem is to rearrange the pixels of an image so as to minimize a certain potential energy function, which causes similar colours to attract at short range and repel at slightly larger distance. In Digital imaging, a pixel ( pict ure el ement is the smallest piece of information in an image Potential energy can be thought of as Energy stored within a physical system The elementary moves swap two adjacent pixels. The images were obtained with fast cooling schedule (left) and slow cooling schedule (right), producing results similar to amorphous and crystalline solids, respectively. An amorphous solid is a Solid in which there is no Long-range order of the positions of the Atoms (Solids in which there is long-range atomic order are

It can be shown that for any given finite problem, the probability that the simulated annealing algorithm terminates with the global optimal solution approaches 1 as the annealing schedule is extended. In Mathematics, a global optimum is a selection from a given domain which yields either the highest value or lowest value (depending on the objective when a specific This theoretical result, however, is not particularly helpful, since the time required to ensure a significant probability of success will usually exceed the time required for a complete search of the solution space. In optimization (a branch of Mathematics) a candidate solution is a member of a set of possible solutions to a given problem

## Pseudocode

The following pseudocode implements the simulated annealing heuristic, as described above, starting from state s0 and continuing to a maximum of kmax steps or until a state with energy emax or less is found. The call neighbour(s) should generate a randomly chosen neighbour of a given state s; the call random() should return a random value in the range [0,1). The annealing schedule is defined by the call temp(r), which should yield the temperature to use, given the fraction r of the time budget that has been expended so far.

``s := s0; e := E(s)                           // Initial state, energy. sb := s; eb := e                             // Initial "best" solutionk := 0                                       // Energy evaluation count. while k < kmax and e > emax                  // While time remains & not good enough:  sn := neighbour(s)                         //   Pick some neighbour.   en := E(sn)                                //   Compute its energy.   if en < eb then                            //   Is this a new best?    sb := sn; eb := en                       //     Yes, save it.   if P(e, en, temp(k/kmax)) > random() then  //   Should we move to it?    s := sn; e := en                         //     Yes, change state.   k := k + 1                                 //   One more evaluation donereturn sb                                    // Return the best solution found. ``

Actually, the "pure" SA algorithm does not keep track of the best solution found so far: it does not use the variables sb and eb, it lacks the first if inside the loop, and, at the end, it returns the current state s instead of sb. While saving the best state is a standard optimization, that can be used in any metaheuristic, it breaks the analogy with physical annealing â since a physical system can "store" a single state only.

Saving the best state is not necessarily an improvement, since one may have to specify a smaller kmax in order to compensate for the higher cost per iteration. However, the step sb := sn happens only on a small fraction of the moves. Therefore, the optimization is usually worthwhile, even when state-copying is an expensive operation.

## Selecting the parameters

In order to apply the SA method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function E(), the candidate generator procedure neighbour(), the acceptance probability function P(), and the annealing schedule temp(). These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices of these parameters that will be good for all problems, and there is no general way to find the best choices for a given problem. The following sections give some general guidelines.

### Diameter of the search graph

Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves. In Mathematics and Computer science, graph theory is the study of graphs: mathematical structures used to model pairwise relations between objects An essential requirement for the neighbour() function is that it must provide a sufficiently short path on this graph from the initial state to any state which may be the global optimum. (In other words, the diameter of the search graph must be small. In the mathematical field of Graph theory, the distance between two vertices in a graph is the number of edges in a shortest path ) In the traveling salesman example above, for instance, the search space for n = 20 cities has n! = 2432902008176640000 (2. Definition The factorial function is formally defined by n!=\prod_{k=1}^n k 5 quintillion) states; yet the neighbour generator function that swaps two consecutive cities can get from any state (tour) to any other state in n(n â 1) / 2 = 190 steps. Names of numbers larger than a quadrillion are almost never used for reasons discussed further below

### Transition probabilities

For each edge (s,s') of the search graph, one defines a transition probability, which is the probability that the SA algorithm will move to state s' when its current state is s. This probability depends on the current temperature as specified by temp(), by the order in which the candidate moves are generated by the neighbour() function, and by the acceptance probability function P(). (Note that the transition probability is not simply P(e,e',T), because the candidates are tested serially. )

### Acceptance probabilities

The specification of neighbour(), P(), and temp() is partially redundant. In practice, it's common to use the same acceptance function P() for many problems, and adjust the other two functions according to the specific problem.

In the formulation of the method by Kirkpatrick et al. , the acceptance probability P(e,e',T) was defined as 1 if e' < e, and exp((e â e') / T) otherwise. This formula corresponds to the Metropolis-Hastings algorithm, in the case where the proposal distribution of Metropolis-Hastings is symmetric. In Mathematics and Physics, the Metropolis-Hastings algorithm is a method for creating a Markov chain that can be used to generate a sequence of However, this acceptance probability is often used for simulated annealing even when the neighbour() function, which is analogous to the the proposal distribution in Metropolis-Hastings, is not symmetric, or not probabilistic at all. Individual transitions of the simulated annealing algorithm do not correspond to the short-term evolution of a physical system, but rather the long-term distribution over states of the algorithm at a particular temperature corresponds to the probability distribution over states of a physical system at a particular temperature.

### Efficient candidate generation

When choosing the candidate generator neighbour(), one must consider that after a few iterations of the SA algorithm, the current state is expected to have much lower energy than a random state. Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state s' is likely to be similar to that of the current state. This heuristic (which is the main principle of the Metropolis-Hastings algorithm) tends to exclude "very good" candidate moves as well as "very bad" ones; however, the latter are much more common than the former, so the heuristic is generally quite effective. heuristic (hyuĚ-Ëris-tik is a method to help solve a problem commonly an informal method In Mathematics and Physics, the Metropolis-Hastings algorithm is a method for creating a Markov chain that can be used to generate a sequence of

In the traveling salesman problem above, for example, swapping two consecutive cities in a low-energy tour is expected to have a modest effect on its energy (length); whereas swapping two arbitrary cities is far more likely to increase its length than to decrease it. Thus, the consecutive-swap neighbour generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with n â 1 swaps, instead of n(n â 1) / 2).

A more precise statement of the heuristic is that one should try first candidate states s' for which P(E(s),E(s'),T) is large. For the "standard" acceptance function P above, it means that E(s') â E(s) is on the order of T or less. Thus, in the traveling salesman example above, one could use a neighbour() function that swaps two random cities, where the probability of choosing a city pair vanishes as their distance increases beyond T.

### Barrier avoidance

When choosing the candidate generator neighbour() one must also try to reduce the number of "deep" local minima â states (or sets of connected states) that have much lower energy than all its neighbouring states. Such "closed catchment basins" of the energy function may trap the SA algorithm with high probability (roughly proportional to the number of states in the basin) and for a very long time (roughly exponential on the energy difference between the surrounding state and the bottom of the basin). A catchment is any device or structure that captures Water.

As a rule, it is impossible to design a candidate generator that will satisfy this goal and also prioritize candidates with similar energy. On the other hand, one can often vastly improve the efficiency of SA by relatively simple changes to the generator. In the traveling salesman problem, for instance, it is not hard to exhibit two tours A, B, with nearly equal lengths, such that (0) A is optimal, (1) every sequence of city-pair swaps that converts A to B goes through tours that are much longer than both, and (2) A can be transformed into B by flipping (reversing the order of) a set of consecutive cities. In this example, A and B lie in different "deep basins" if the generator performs only random pair-swaps; but they will be in the same basin if the generator performs random segment-flips.

### Cooling schedule

The physical analogy that is used to justify SA assumes that the cooling rate is low enough for the probability distribution of the current state to be near thermodynamic equilibrium at all times. In Thermodynamics, a thermodynamic system is said to be in thermodynamic equilibrium when it is in thermal equilibrium Mechanical equilibrium, and Unfortunately, the relaxation timeâthe time one must wait for the equilibrium to be restored after a change in temperatureâstrongly depends on the "topography" of the energy function and on the current temperature. In the SA algorithm, the relaxation time also depends on the candidate generator, in a very complicated way. Note that all these parameters are usually provided as black box functions to the SA algorithm. In Computing, a procedural parameter is a parameter of a procedure that is itself a procedure

Therefore, in practice the ideal cooling rate cannot be determined beforehand, and should be empirically adjusted for each problem. The variant of SA known as thermodynamic simulated annealing tries to avoid this problem by dispensing with the cooling schedule, and instead automatically adjusting the temperature at each step based on the energy difference between the two states, according to the laws of thermodynamics.

## Restarts

Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This is called restarting. To do this we set `s` and `e` to `sb` and `eb` and perhaps restart the annealing schedule. The decision to restart could be based on a fixed number of steps, or based on the current energy being too high from the best energy so far.

## Related methods

• Quantum annealing uses "quantum fluctuations" instead of thermal fluctuations get through high but thin barriers in the target function. In Mathematics and applications quantum annealing (QA is a general method for finding the Global minimum of a given Objective function over
• Stochastic tunneling attempts to overcome the increasing difficulty simulated annealing runs have in escaping from local minima as the temperature decreases, by 'tunneling' through barriers. Stochastic tunneling (STUN is an approach to Global optimization based on the Monte Carlo method - sampling of the function to be minimized
• Tabu search normally moves to neighbouring states of lower energy, but will take uphill moves when it finds itself stuck in a local minimum; and avoids cycles by keeping a "taboo list" of solutions already seen. Tabu search is a mathematical optimization method belonging to the class of local search techniques
• Stochastic gradient descent runs many greedy searches from random initial locations. Stochastic gradient descent is a general optimization Algorithm, but is typically used to fit the Parameters of a Machine learning model
• Genetic algorithms maintain a pool of solutions rather than just one. A genetic algorithm (GA is a Search technique used in Computing to find exact or Approximate solutions to optimization and Search New candidate solutions are generated not only by "mutation" (as in SA), but also by "combination" of two solutions from the pool. Probabilistic criteria, similar to those used in SA, are used to select the candidates for mutation or combination, and for discarding excess solutions from the pool.
• Ant colony optimization (ACO) uses many ants (or agents) to traverse the solution space and find locally productive areas. The ant colony optimization Algorithm (ACO introduced by Marco Dorigo in 1992 in his PhD thesis is a probabilistic technique for solving computational
• The cross-entropy method (CE) generates candidates solutions via a parameterized probability distribution. The cross-entropy (CE method attributed to Reuven Rubinstein is a general Monte Carlo approach to combinatorial and continuous multi-extremal optimization The parameters are updated via cross-entropy minimization, so as to generate better samples in the next iteration.
• Harmony search mimics musicians in improvisation process where each musician plays a note for finding a best harmony all together. Harmony search (HS is a Metaheuristic algorithm (also known as Soft computing algorithm or Evolutionary algorithm) mimicking the improvisation process
• Stochastic optimization is an umbrella set of methods that includes simulated annealing and numerous other approaches. Stochastic optimization (SO methods are optimization Algorithms which incorporate probabilistic (random elements either in the problem data (the