Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

Document Type: Research article

Authors

1 Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran ,Tehran, Iran

2 Laboratory of bioinformatics and drug design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

3 Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

4 Laboratory of Systems Biology and Bioinformatics, University of Tehran

Abstract

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and we observed the MGALA and SGALA algorithms have the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms is better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model has more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.

Keywords

Main Subjects