>home     >contact     >sitemap     >login
BIOCHEMICS CONSULTING SAS BCX logo
  About us Our Team Expertise Services
Expertise


Data mining

    Fuzzy logic

    Genetic algorithms

    Neural networks

Molecular modeling

Examples of achievements

Selected publications

Adaptive Fuzzy Partition


In the context of supervised database classification, the Fuzzy Partition algorithm [1] allows to generate fuzzy rules from numerical data by developing two steps:

  1. partitioning a working space into fuzzy subspaces;
  2. defining a fuzzy rule for each fuzzy subspace.

Let us assume that the working space is a N-dimension hyperspace defined by N molecular descriptors, each dimension i can be partitioned into L intervals Iij, where j represents an interval in the partition selected. Indicating with P(x1, x2, ... xn) a molecular vector in the hyperspace, a rule for a subspace Sk, derived by combining N intervals Iij, is defined by [2]:

if x1 is associated with µ1k(x1) and x2 is associated with µ2k(x2)... and xN is associated with µNk(xN)   =>   the score of the activity O for P is OkP, (1)

where xi represents the value of the ith descriptor for the molecule P, µik is the membership function related to the descriptor i for the subspace k, and OkP is the biochemical activity value related to the subspace Sk. The "and" of the fuzzy rule is generally represented by the Min operator [3] and the membership functions can be defined by triangular, trapezoidal or gaussian shapes [2,4,5].

These techniques of rule generation are very simple as all the fuzzy rules can be formulated by linguistic labels. But their performances in the database classification depend on the choice of the partition selected. Generally, a coarse partition leads to a generalist system but also to a model where prediction results are too approximate; a fine partition leads to a precise model of classification but also to a non-generalist system. To overcome this drawback, fuzzy classification methods were proposed [6], which simultaneously use several fuzzy partitions of different sizes in a single fuzzy rule-based classification system. Then, the relation (1) becomes:

if x1 is associated with µ1km(x1) and x2 with associated to µ2km(x2)... and xN is associated with µNkm(xN)  =>   the score of the activity O for P is OkmP,    (2)

where m > 2 and represents the amount of fuzzy subspaces according to the partition on the axes.

This approach allows to obtain a good compromise between generalist and specialist systems, improving the classification performances, as showed by the satisfactory results obtained in a previous paper studying olfactory series [7].

But this algorithm does not allow to solve the second big problem related to fuzzy partition that is due to the very high number of fuzzy subspaces generated when a large set of molecular descriptors is considered. Then, a new algorithm, named Adaptive Fuzzy Partition (AFP) and derived from several studies concerning fuzzy and non-fuzzy fields [8-11], was implemented, in which the hyperspace partition was adapted to the training data by dividing dynamically and separately the axes associated with the N molecular descriptors.

In a first phase, the global descriptor hyperspace is considered and cut into two subspaces where the fuzzy rules are derived. These two subspaces are divided step by step into smaller subspaces until certain conditions are satisfied, namely:

  1. the number of molecular vectors, within a subspace, attains a minimum threshold number;
  2. the difference between two generated subspaces is negligible in terms of chemical activities represented;
  3. the number of subspaces exceeds a maximum threshold number.

The aim of the algorithm is to select the descriptor and the cut position which allow to get the maximal difference between the two fuzzy rule scores generated by the new subspaces. The score is determined by the weighted average of the chemical activity values in an active subspace A and in its neighboring subspaces. If the number of trial cuts per descriptor is defined by N_cut, the number of trial partitions equals (N_cut + 1)N. Only the best cut is selected to subdivide the original subspace. In Figure 1, for example, three cuts per axis are tested from the original descriptor space with 2 dimensions. As cut x1 is the best, two subspaces are generated and considered to be further divided. Then, cut y3 is selected, but the procedure evaluates useful partitioning only in subspace S2; finally, three subspaces are built.

Figure 1. Example of adaptive partition on a two-dimensional space. Three cuts for each axis are evaluated and three subspaces are generated.

The fuzzy rules are generated by the relation (1), but the membership functions, defined by trapezoidal shapes, are based on the boundaries of the subspaces. If the width of a subspace Sk on the ith dimension, after each cut, is represented by wi, the p and q parameters defining the shape of the trapezoid (Figure 2) are calculated by

p = λiwi and q = ViwiΙ

where the two parameters λi and Vi vary so that p > 1 and q < 1. If p = 1 and q = 1, the membership function becomes a rectangle. All the rules created during the fuzzy procedure are considered to establish the model between descriptor hyperspace and biochemical activities.

Figure 2. Representation of a trapezoidal membership function µ(x), defined on the descriptor i.

The degree of membership of the subspace Sk can be represented by

 

(3)


M is the number of molecular vectors in a given subspace, N is the total number of descriptors, µik(Xi)Pi is the fuzzy membership function related to the descriptor i for the molecular vector Pj, and APj is the experimental activity of the compound Pj. A classic procedure of centroid defuzzification [12] is implemented to determine the chemical activity of a new test molecule. All the subspaces k are considered and the general formula to compute the degree of membership of the activity O for a generic molecule Pj is

 

(4)

where N_subsp represents the total number of subspaces.


References

  1. Y. Lin, G.J. Cunningham, Building a Fuzzy System from Input-Output Data, J. Intell. Fuzzy Syst. (1994) 2, 243-250.
  2. M. Sugeno, T. Yasakawa, A fuzzy-logic-based approach to qualitative modeling, IEEE T. Fuzzy Syst. (1993) 1, 7-31.
  3. D. Dubois, H. Prade, An introduction to possibilistic and fuzzy logics, in: G. Shafer, J. Pearl (Eds.), Readings in Uncertain Reasoning, Morgan Kaufmann, San Francisco, 1990, pp. 742-761.
  4. M. Ichino, A Nonparametric Multiclass Pattern Classificatier, IEEE T. Syst. Man Cy. (1979) 6, 345-352.
  5. B. Fritzke, Fast learning with incremental radial basis function networks, Neural Process. Lett. (1994) 1, 2-5.
  6. H. Ishibuchi, K. Nozaki, H. Tanaka, Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets Syst. (1992) 52, 21-32.
  7. K. Audouze, F. Ros, M. Pintore, J.R. Chretien, Prediction of odours of aliphatic alcohols and carbonylated compounds using fuzzy partition and self organising maps (SOM), Analusis (2000) 28, 625-632.
  8. Y. Lin, G.A. Cunningham III, S.V. Coggeshall, Using Fuzzy Partitions to Create Fuzzy Systems for Input Output data and Set the Initial Weights in a Fuzzy Neural Networks, IEEE T. Fuzzy Syst. (1997) 5, 614-621.
  9. W. Pedrycz, Fuzzy Sets in Pattern Recognition: Methodology and Methods, Pattern Recogn. (1990) 23, 121-146.
  10. B.D. Ripley, Statistical aspects of neural networks, in: O.E. Barndorff-Nielsen, J.L. Jensen, W.S. Kendall (Eds.), Networks and Chaos: Statistical and Probabilistic Aspects, Chapman and Hall, London, 1993, pp. 40-123.
  11. P.A. Chou, Optimal Partitioning for Classification and Regression Trees, IEEE T. Pattern Anal. Mach. Intel. (1991) 13, 340-354.
  12. M.M. Gupta, J. Qi, Theory of T-norms and fuzzy inference methods, Fuzzy Sets Syst. (1991) 40, 431-450

TOP
Biochemics Consulting SAS
Pépinière d’Entreprises - 111 bd Duhamel de Monceau
45 166 Olivet Cedex - France

® BCX 2004, All Rights Reserved.
- Biochemics Consulting SAS - 16 rue Leonard de Vinci - F 45074 ORLEANS Cedex 2 - Capital: 100.000€ - RCS SIRET 441 128 980 000 1 4 -