Figure 2: This picture illustrates a so-called fine-grained (diffusion or cellular) population structure.
Each solution has a spatial location and interacts only within some neighbourhood, termed a deme.
Clusters of solutions tend to form around different optima, which is both inherently useful and helps
to avoid premature convergence. Information slowly diffuses across the grid as evolution progresses
by mating within the overlapping demes.
using the operator before selection and crossover can take place.
2.3 Hybrid Models
Thereissufficientflexibilityinthereproductiveplanlanguagetoallowarbitraryhybridmodels
population models, for example, an array of islands each with fine-grained populations or a
fine-grainedmodel in which each site has an island (which could be viewed as a generalisation
of the stepping stone model). Such models have not, as far as the authors are aware, been
presented in the literature, but may yield interesting newavenues forexploration. An example
plan which uses just such a hybrid model is given in the appendix.
3 Representation
One of the longest-running and sometimes most heated arguments in the field of genetic
algorithms (and to a lesser extent the wider evolutionary computing community) concerns
the representation of solutions. This is a multi-threaded debate taking in issues of problem-
specific and representation-specific operators, hybridisation with other search techniques, the
handling of constraints, the interpretation of the schema theorem, the meaning of genes and
the efficacy of newer variants of the basic algorithms such as genetic programming. The
developers of RPL2 are strongly of the opinion that exotic representations should be the norm
rather than the exception for a number of reasons outlined in this section.
3.1 Real Parameter Optimisation
A particular focus of disagreement about representations concerns the coding of real numbers
in real parameter optimisation problems. The main split is between those who insist on
coding parameters as binary strings and those who prefer simply to treat real parameters
as floating point values. It is first necessary to clarify that the issue here is not one of
the physical representation of a real parameter on the machine—whether it should be, for
example, an IEEE floating point number, an integer array or a packed binary integer, which
is an implementational issue—but rather how genes and alleles are defined and manipulated.
David Goldberg is usually identified—perhaps unfairly—asthe leading advocate of binary
representations. He has developed a theory of virtual alphabets for what he calls “real-
coded” genetic algorithms (Goldberg, 1990). He considers the case in which the parameters
themselves are treated as genes and processed using a traditional crossover operator such as
-point or uniform crossover (manipulating whole-parameter genes). In this circumstance,
he argues that the genetic algorithm “chooses its own” low cardinality representation for each
gene (largely from the values that happen to be present in relatively good solutions in the
initial population) but then suffers “blocking”, whereby the algorithm has difficulty accessing
some parts of the search space through reduced ergodicity. These arguments, while valid
in their own context, ignore the fact that people who use “real codings” in genetic search
invariably use quite different sorts of recombination operators. These include averaging
crossovers (Davis, 1991), random respectful recombination (R ; Radcliffe, 1991a) and “blend
crossover” (BLX-
; Eshelman & Schaffer,1992). These are combined with appropriatecreep
(Davis, 1991) or end-point (Radcliffe, 1991a) forms of mutation. Similarly, the Evolution
Strategies community, which has always used “real” codings, uses recombination operators
that are equivalent to R
and BLX-0 (Baeck et al., 1991).
The works cited above, together with Michalewicz (1992), provide compelling evidence
that this approach outperforms binary codings, whether these are of the traditional or “Gray-
coded” variety (Caruana & Schaffer, 1988). In particular, Davis (1991) provides a potent
example of the improvementthat can be achieved by moving frombinary-coded to real-coded
genetic algorithms. Thisexample has been reproducedin the tutorialguide to RPL2 contained
in Surry & Radcliffe (1994).
3.2 Real-World Optimisation
When tackling real-world optimisation problems, a number of further factors come into play,
many of which again tend to make simple binary and related low-cardinality representations
unattractive or impractical.
In industrial optimisation problems it is typically the case that the evaluation function has
already been written and other search or optimisation techniques have been used to tackle the
problem. This may impose constraints on the representation. While in some cases conversion
between representations is feasible, in others this will be unacceptably time consuming.
Moreover, the representation used by the existing evaluation function will normally have
been carefully chosen to facilitate manipulation. If there is not a benefit to be gained from
changing to a special “genetic” representation, it would seem perverse to do so. The same
considerations apply even more strongly if hybridisation with a pre-existing heuristic or other
search technique is to be attempted. This is important because hybrid approaches, in which
a domain-specific technique is embedded, whole or in part, in a framework of evolutionary
search, can almost always be constructed that outperform both pure genetic search and the
domain-specific technique. This is the approach routinely taken when tackling “real world”
applications, such as those described in section 6.
Further problems arise in constrained optimisation, where some constraints (including
simple bounds on parameter ranges) can manifestthemselves in unnecessarily complex forms
with restricted coding schemes. For example, a parameter that can take on exactly 37
different values is difficult to handle with a binary representation, and will tend either to
lead to a redundant coding (whereby several strings may represent the same solution) or to
having to search (or avoid) “illegal” portions of the representation space. Similar issues can
arise when trying to represent permutations with, for example, binary matrices (e.g. Fox &
McMahon, 1991), rather than in the more natural manner. It should be noted that even many
of those traditionallyassociated withthe “binary is best” schoolaccept that for some classes of
problems low cardinality representations are not viable. For example, it was Goldberg who,
with Lingle, developed the first generalisation of schema analysis in the form of o-schemata
for the travelling sales-rep problem (TSP; Goldberg & Lingle, 1985).
3.3 Multiple Representations
Some evolutionary algorithms have been developed that employ morethan one representation
at a time. A notableexampleofthis is the work of Hillis(1991),who evolved sortingnetworks
using a parasite model. Hillis’s evaluation function evolved by changing the test set as the
sortingnetworks improved. In a similar vein, Husbands & Mill (1991)haveused co-evolution
models in which different populations optimise different parts of a process plan which are
then brought together for arbitration. This necessitates the use of multiple representations.
There are also cases in which controlalgorithmsare employed to varythe (often largenum-
ber of) parameters of an evolutionary algorithm as it progresses. For example, Davis (1989)
adapts operator probabilities on the basis of their observed performance using a credit ap-
portionment scheme. RPL2 caters for the simultaneous use of multiple representations in a
single reproductive plan, which greatly simplifies the implementation of such schemes.
3.4 Schemata, Formae and Implicit Parallelism
In addition to the practical motivations for supporting complex representations, certain theor-
etical insights support this approach. These are obtained by consideringthe Schema Theorem
(Holland, 1975) and the r
ˆ
ole of “implicit parallelism”. Holland introduced the notion of a
schema (pl. schemata) as a collection of genomes that share certain gene values (alleles). For
example, the schema is the set of chromosomes with a at the first locus and a at the
third locus.
The Schema Theorem may be stated in a fairly general form (though assuming fitness-
proportionate selection for convenience) thus:
(1)
where
is the number of members of the population at time that are members of a given
schema ;
is the observed fitness of the schema at time , i.e. the average fitness of all the
members of the population at time that are instances (members) of the schema ;
is the average fitness of the whole population at time ;
is the set of genetic operators in use;
the term quantifies the potential disruptive effect on schema membership of the
application of operator ;
denotes an expectation value.
This theorem is fairly easily proved. It has been extended by Bridges & Goldberg (1987), for
the case of binary schemata, to replace the inequality with an equality by including terms for
string gains as well as the disruption terms.
Holland used the concept of “implicit parallelism” (n´ee intrinsic parallelism) to argue for
the superiority of low cardinality representations, a theme picked up and amplified by Gold-
berg (1989, 1990), and more recently championed by Reeves (1993). Implicit parallelism
refers to the fact that the Schema Theorem applies to all schemata represented in the popu-
lation, leading to the suggestion that genetic algorithms process schemata rather (or as well
as) individual solutions. The advocates of binary representations then argue that the degree
of intrinsic parallelism can be maximised by maximising the number of schemata that each
solution belongs to. This is clearly achieved bymaximising the string length, which in turn re-
quires minimising the cardinality of the genes used. This simple counting argument has been
shown to be seriously misleading by a number of researchers, including Antonisse (1989),
Radcliffe (1990, 1994a) and Vose (1991), and as Mason (1993) has noted, ‘[t]here is now no
justification for the continuance of [the] bias towards binary encodings’.
It is both a strength and a weakness of the Schema Theorem that it applies equally,
given a representation space
(of “chromosomes” or “genotypes”) for a search space
(of “phenotypes”), no matter which mapping is chosen to relate genotypes to phenotypes.
Assuming that
and have the same size, there are such mappings (representations)
available—clearlyvastlymorethanthesizeofthesearchspaceitself—yettheschematheorem
applies equally to each of them. The only link between the representation and the theorem is
the term . The theorem states that the expected number of instances of any schema at the
nexttime-stepisdirectlyproportionaltoitsobservedfitness(inthecurrentpopulation)relative
to everything else in the population (subject to the effects of disruption; Radcliffe, 1994a).
Thus, the ability of the schema theorem, which governs the behaviour of a simple genetic
algorithm, to lead the search to interesting areas of the space is limited by the quality of
the information it collects about the space through observed schema fitness averages in the
population.
It can be seen that if schemata tend to collect together solutions with related performance,
then the fitness-variance of schemata will be relatively low, and the information that the
schema theorem utilises will have predictive power for previously untested instances of
schemata that the algorithm may generate. Conversely, if schemata do not tend to collect
together solutions with related performance, while the predictions the theorem makes about
schema membership of the next population will continue to be accurate, the performance of
the solutions that it generates cannot be assumed to bear any relation to the fitnesses of the
parents. This clearly shows that it is essential that domain-specific knowledge be used in
constructing a genetic algorithm, through the choice of representation and operators, whether
this be implicit or—as is advocated in the present paper—explicit. If no domain-specific
knowledge is used in selecting an appropriate representation, the algorithm will have no
opportunity to exceed the performance of a random search.
Inadditiontothese observations about the Schema Theorem’s representation independence
and the sensitivity of its predictions to the fitness variance of schemata, Vose (1991) and
Radcliffe (1990) have independently proved that the “schema” theorem actually applies to
any subset of the search space, not only schemata, provided that the disruption coefficients
are computed appropriately for whichever set is actually considered. Vose’s response
to this was to term a generalised schema a predicate and to investigate transformations
of operators and representations that change problems that are hard for genetic algorithms
into problems that are easy for them (Vose & Liepins, 1991). This was achieved through
exploiting a limited duality between operators and representations, which is discussed briefly
in Radcliffe (1994a). Radcliffe instead termed the generalised schemata formae (sing. forma)
and set out todevelop a formalism to allowoperators and representations to be developed with
regard to stated assumptions about performance correlations in the search space. The aim
was to maximise the predictive power of the Schema Theorem (and thus its ability to guide
the search effectively) by allowing the developer of a genetic algorithm for some particular
problem to codify knowledge about the search space by specifying families of formae that
might reasonably be assumed to group together solutions with related performance.
3.5 Forma Analysis
Given a collection of formae (generalised schemata, or arbitrary subsets of the search space)
thought relevant to performance, forma analysis suggests two key properties for a recombina-
tion operator, both motivated by the way conventional genetic crossover operators manipulate
genes. Respect requires that if both parents are members of some forma then so should be
all their children produced by recombination alone. For example, if eye colour has been
chosen as an important characteristic, and both parents have blue eyes, then respect restricts
recombination to produce only children with blue eyes. A stronger form of this condition,
called gene transmission, requires that children inherit each of their genes from one or other
parent, so that if one parent had green eyes and the other had blue eyes a child produced by
recombination would be bound to have either green or blue eyes. It is not, however, always
possible to identify suitable genes, so this condition is not always imposed. For a detailed
exposition of “genetic search” without “genes” the reader is referred to the discussion of
allelic representations in Radcliffe & Surry (1994).
The other desirable property for recombination operators is assortment, which requires
that recombination should be capable of bringing together any mixture of compatible genetic
material present in the parents. Thus, for example, if one parent has blue eyes, and the other
has curly hair, then if these are compatible characteristics it should be possible foran assorting
recombination operator to combine these characteristics.
Although these two principles seem rather innocuous, there are many problems for which
the natural formae cannot simultaneously be respected and assorted. Such sets of formae are
said to be non-separable. A varied suite of domain-independent recombination, mutation
and hill-climbing operators has been developed using the principles of respect and assortment
together with related ideas. These include random respectful recombination and random
transmitting recombination (R
and RTR respectively; Radcliffe, 1991b), random assorting
recombination (RAR; Radcliffe, 1991b), generalised -point crossover (GNX; Radcliffe &
Surry, 1994), binomial minimal mutation (BMM; Radcliffe, 1994b) and minimal-mutation-
based hill-climbing (Radcliffe & Surry, 1994). Of these, R is the simplest. It operates by
taking all the genes common to the two parents and inserting them in the child while making
random (legal) choices for remaining genes. In some situations this is surprisingly effective,
while in others a more sophisticated approach is required. The set of all solutions sharing all
the genes of two parents and is called their similarity set, denoted ,soR can
be seen to pick an arbitrary member of the parents’ similarity set.
Locality Formae for Real Parameter Optimisation
In considering continuous real parameter optimisation problems it seems reasonable to sup-
pose that solutions that are close to one another might have similar performance. Locality
formae (Radcliffe, 1991a) group chromosomes on the basis of their proximity to each other,
and can be used to express this supposition. Suppose that a single parameter function is
defined over a real interval . Then formae are defined that divide the interval up into
Figure 3: Given and , with , the formae are compatible only if .
The arrow shows the similarity set
.
Figure 4: The left-hand graph shows (schematically) the probability of selecting each point along the
axis under R
. The right-hand graph shows the corresponding diagram for standard crossover with
real genes.
Figure 5: The -dimensional R operator for real genes picks any point in the hypercuboid with
corners at the chromosomes being recombined,
and .
strips of arbitrary width. Thus, a forma might be a half-open interval with and
both lying in the range . These formae are separable. Respect requires that all children
are instances of any formae which contain both parents and . Clearly the similarity set of
and (the smallest interval which contains them both) is , where it has been assumed,
without loss of generality, that . Thus respect requires that all their children lie in .
Similarly, if is in some interval and lies in some other interval ,
then for these formae to be compatible the intersection of the intervals that define them must
be non-empty ( ; figure 3) and so picking a random element from the similarity set
allows any element that lies in the intersection to be picked, showing that R fulfils the
requirements of assortment (figure 4). The -dimensional R operator picks a random point
in the -dimensional hypercuboid with corners at the two chromosomes and (figure 5).
Both this operator and its natural analogue for -ary string representations, which for
each locus picks a random value in the range defined by the alleles from the two parents,
suffer from a bias away from the ends of the interval. It is therefore necessary to introduce a
mutation operator that offsets this bias in order to ensure that the whole search space remains
accessible. An appropriate mutation operator acts with very low probability to introduce the
extremal values at an arbitrary locus along the chromosome. In the one dimensional case
this amounts to occasionally replacing the value of one of the chromosomes with an or a
. The combination of R and such end-point mutation provides a surprisingly powerful set
of genetic operators for some problems, outperforming more common (binary) approaches
(Radcliffe, 1991a). The blend crossover operator BLX-0.5, which is a generalisation of R
developed by Eshelman & Schaffer (1992), performs even better.
Locality formae are not, of course, the only alternatives to schemata which can be applied
to real-valued problems, and there is no suggestion here that locality formae should be seen
as a generic or definitive alternative to schemata. It would be interesting, for example, to
attempt to construct formae and representations on the basis of fourier analysis, or some other
complete orthonormal set of functions over the space being searched.
Edge Formae for the Travelling Sales-rep Problem
The travelling sales-rep problem(TSP)isperhaps the most studiedcombinatorialoptimisation
problem, and has certainly attracted much effort with evolutionary algorithms. Given a set of
cities, the problem is to find the shortest route for a notional sales-rep to follow, visiting each
city exactly once. This problem has a number of important industrial applications, including
finding drilling paths for printed circuit boards to maximise production speed. It seems clear,
as Whitley et al. (1989b) have argued, that the edges rather than the vertices of the graph are
central to the TSP. While there might be some argument as to whether or not the edges should
be taken to be directed, the symmetry of the euclidean metric used in the evaluation function
suggests that undirected edges suffice.
If the towns (vertices) in an
-city TSP are numbered 1 to , and the edges are described
as non-ordered pairs of vertices
, then apparently suitable edge formae are simply sets
of edges, subject to the condition that every vertex appears in the description of exactly two
edges. These formae are not separable. To see this, consider two tours
and , with
containing the fragment 2–1–3 and containing 4–1–3. Plainly these have the common edge
(which is, of course, the same as ). Edge formae are described by the list of edges
they require to be present in angle brackets, so that
is an instance of the forma and
is an instance of the forma . These formae are clearly compatible, because any tour
containing the fragment 2–1–4 is in their intersection
Any recombination operator that respects these formae is bound to include the common edge
in all offspring from these parents. This precludes generating a child in .
Since assortment requires that this child be capable of being generated this shows that these
formae are not separable.
R can be defined for edge formae even though they are not separable: it works simply
by copying common edges into the child and then putting in random edges in such a way as
to complete a legal tour. The lack of separability simply ensures that R does not assort the
formae. Extensive experiments with R -related operators for the TSP are related in Radcliffe
& Surry (1994).
Curiously, the intersection operation for these edge formae looks like the set union operation. This is
because is really an abbreviation for the set of chromosomes containing the 1–3 edge.
Formae for Set Problems
A large number of optimisation problems are naturally formulated as subset extraction prob-
lems, i.e. given some “universal” set, find the best subset of it according to some criterion.
Examples include stock-marketportfoliooptimisation(Shapcott,1992), choosing
sites from
possible sites for retail dealers (George, 1994) and optimising the connectivity of a three
layer neural network (Radcliffe, 1993). If the size of the subset is not fixed, the natural way
to tackle this problem is by using a binary string the length of the universal set, using a 1
to indicate that the given element is in the subset. If, however, the size is fixed this is more
problematical, because this constrains the number of ones in the string. A more natural ap-
proach is to store the elements in the subset and apply appropriate genetic operators directly.
In this case the elements themselves can form alleles, and approaches as simple as choosing
the desired number of elements from those available between the parents can be effective.
This method happens to equate to use of RAR (Radcliffe, 1992b). Forma analysis for set
problems is covered extensively in Radcliffe (1992a).
General Representations
It has been argued in the preceding sections that there are theoretical, practical and empirical
motivations for moving away from the very simple binary string representations that have
dominated genetic algorithms for so long. Combined with the successes shown by genetic
programming, evolution strategies and evolutionary programming these form a compelling
case for supporting the use of arbitrary data structures as genetic representations. The way in
which RPL2 achieves this is discussed in section 5.
4 Parallelism
Evolutionary algorithms that use populations are inherently parallel in the sense that—
depending on the exact reproductive plan used—each chromosome update is to some extent
independent of the others. There are a number of options for implementation on parallel
computers, several of which have been proposed in the literature and implemented. As has
been emphasised, population structure has tended to be tied closely to the architecture of a
particular target machine to date, but there is no reason, in general, why this need be so.
Parallelism is supported in RPL2 at a variety of levels. Data decomposition of structured
populations can be achieved transparently, with different regions of the population evolving
on different processors, possibly partially synchronised by inter-process communication.
Distribution of fine-grained models tends to require more interprocess communication and
synchronisation so their efficiency is more sensitive to the computation-to-communications
ratio for the target platform.
Task farming of compute intensive tasks, such as genome evaluation (e.g. Verhoeven et
al., 1992; Starkweather et al., 1990), is also provided via the loop construct, which
indicates a set of operations to be performed on all members of a population stack in no fixed
order. This is particularly relevant to real-world optimisation tasks for which it is almost
invariably the case that the bulk of the time is spent on fitness evaluation. (For example see
section 6.) User operators may themselves include parallel code or run on parallel hardware
independently of the framework, giving yet more scope for parallelism.
RPL2willrunthesamereproductiveplanonserial, distributed or parallel hardware without
modification using the minimum degree of synchronisation consistent with the reproductive
plan specified.
5 System Architecture
RPL2 defines a C-like data-parallel language for describing reproductive plans. It is de-
signed to simplify drastically the task of implementing and experimenting with evolutionary
algorithms. Both parallel and serial implementations of the run-time system exist and will
execute the same plans without modification.
The language provides a small number of built-in constructs, along with facilities for
calling user-defined operators. Several libraries of such operators are provided with the
basic framework, as are definitions of several commonly used genetic representations. Two
example plans are presented at the end of this paper.
5.1 Language Features
RPL2 is a simple procedural language, similar to C, with six basic data types—
, ,
, , and . The genome and gstack types are explained further
below.
Simple control flow using
, , and are provided, as are normal algebraic
mathematical and logical expressions. User defined operators (C-callable functions) are
made visible as procedures in the language with the declaration, which is analogous to
C’s .
The data-parallel aspect of the language is supported using the concept of a population
structure, which is declared as a multi-dimensional hypercuboid. Arrays corresponding to
any combination of axes of the population structure may then be declared and manipulated
in a SIMD-like way (i.e. an operation on such an array affects every element within it).
Several special operators to project and reduce the dimensionality of such arrays are also
provided. Built-in constructs to support parallelism include two types of parallel loops—the
data-parallel , and a construct to indicate that data-independent farming
out of work is possible.
5.2 The RPL2 Framework
The RPL2 framework provides an implementation of the reproductive plan language based
on a interpreter and a run-time system, supported by various other modules. The diagram in
figure 6 shows how these different elements interact.
The interpreter acts in two main modes: interactive commands are processed immediately,
while non-interactive commands are compiled to an internal form as a reproductive plan
is being defined. Facilities also exist for batch processing, I/O redirection, and some on-
line help. The interpreted nature of the system is especially useful for fast turn-around
experimentation. The trade-off in speed over a compiled version is insignificant for real
applications in which almost all of the execution time is spent in the evaluation function. The
system uses the Marsaglia pseudo-random number generator (Marsaglia et al., 1990), which
as well as producing numbers with good statistical distributions allows identical results to be
produced on different processor architectures provided that they use the same floating point
representation.
Two versions of the run-time system exist, a serial (single-processor) implementation, and
a parallel (multiple distributed processors) implementation. In the serial case, both the parser
and the run-time system run on a single processor, and no communication is required. In
the parallel case, the parser runs on a single processor, but the work of actually executing
a reproductive plan is shared across other processors. Two methods for this work-sharing
are provided, one in which the data space of a structured population is decomposed across a
regular grid of processors and one in which extra processors are used simply as evaluation
servers for compute-intensive sections of code, typically evaluation of genome fitness. A
hybrid model in which the data space is distributed across some processors and others are
used as a pool of evaluation servers is planned as a future extension.
Parallelism by data-decomposition is made possible by the SIMD-like nature of the lan-
guage. In such a case, a structured population is typically declared and operations take
PNP2P0
Init
Parser
RTS
Serial Version
P0
Init
Parser
RTS
RD/TF
Comms
Init
RTS
RD/TF
Comms
Idle
Data, TF/RD
PLAN
endplan
endplan
endplan
P1
Parallel Version
. . .
"run" "run"
Figure 6: Simplified Execution Flow. The simplest mode of execution is the serial framework with
a single process, P0, in which actions are processed by the parser, and “compiled” code is executed by
the run time system. In parallel operation, the parser runs on a single process, and information about
the reproductive plan is shared by communication. A decision about how to execute the plan is made,
resulting either in the data space being split across the processes or in compute-intensive parts of the
code being task farmed.
place uniformly over various projections of the multi-dimensional space. Each processor can
then simply execute the common instructions over its local data, sharing information with
neighbouring processes when necessary.
Task farming is supported by the the
construct which allows the user to specify
a set of operations which apply to all genomes in population. This is illustrated in the first
example plan presented in the appendix.
Itwas stressed in sections 1and3earlierthatamajordesign aim forRPL2 was thatit should
impose no constraints on the data structures used to represent solutions. This is achieved by
providing a completely generic data structure which contains only information that
any type of genome would have. From the user’s point of view, this consists only of the
raw fitness value. A generic pointer is then included that references a user-definable data
structure, allowing a genome to be completely general. Collections of s are called
s, and admit the notionof scaled fitness, relative to other genomes in the group. These
data structures are illustrated in figure 7.
5.3 Extending the Framework
Ithasbecomeclearthatreal-worldapplicationsdemandgoodqualityproblem-specificgenome
representations, as discussed in section 3. The RPL2 system leaves the user completely free in
Không có nhận xét nào:
Đăng nhận xét