научная статья по теме AN ADAPTIVE WAVELENGTH INTERVAL SELECTION BY MODIFIED PARTICLE SWARM OPTIMIZATION ALGORITHM: SIMULTANEOUS SPECTRAL OR DIFFERENTIAL PULSE VOLTAMMETRIC DETERMINATION OF MULTIPLE COMPONENTS WITH OVERLAPPING PEAKS Химия

Текст научной статьи на тему «AN ADAPTIVE WAVELENGTH INTERVAL SELECTION BY MODIFIED PARTICLE SWARM OPTIMIZATION ALGORITHM: SIMULTANEOUS SPECTRAL OR DIFFERENTIAL PULSE VOLTAMMETRIC DETERMINATION OF MULTIPLE COMPONENTS WITH OVERLAPPING PEAKS»

ЖУРНАЛ АНАЛИТИЧЕСКОЙ ХИМИИ, 2013, том 68, № 7, с. 694- 702

ОРИГИНАЛЬНЫЕ СТАТЬИ

УДК 543

AN ADAPTIVE WAVELENGTH INTERVAL SELECTION BY MODIFIED PARTICLE SWARM OPTIMIZATION ALGORITHM: SIMULTANEOUS SPECTRAL OR DIFFERENTIAL PULSE VOLTAMMETRIC DETERMINATION OF MULTIPLE COMPONENTS WITH OVERLAPPING PEAKS © 2013 г. Wei-min Shi, Wei Kong, Qiu-bo Tao, Jiu-ji Guo, Mei-jun Xia, Qi Shen, Bao-xian Ye

Department of Chemistry, Zhengzhou University Zhengzhou, 450001 China Received 18.08.2011; in final form 05.07.2012

The elimination of wavelengths which contain useless or irrelevant information for calibration model is becoming one of the key steps in multicomponent spectral analysis even in situations where the partial least-squares (PLS) regression is applied. Because of the continuity of most kinds of spectral responses, the adaptive wavelength interval selection by modified particle swarm optimization (PSO) could be proposed in the present study. The proposed method was used for simultaneous spectrophotometric determination of caffeine, aspirin, and phenacetin and simultaneous differential pulse voltammetric determination of nitrophe-nol isomers. The method was eventually applied to commercial drugs. For comparison, a conventional full-spectrum PSO analysis was also performed. Experimental results demonstrate that the adaptive wavelength interval selection by modified PSO is capable of improving the future predictive ability of the model.

Keywords: particle swarm optimization, multicomponent analysis, wavelength interval selection, simultaneous determination, partial least-squares regression.

DOI: 10.7868/S0044450213070128

Multicomponent spectral and electrochemical analysis, an important area of analytical chemistry, constructs a calibration model relating the outputs of multivariate techniques to the compositions or the properties of analytical samples [1—4]. The advent of modern spectroscopic and electrochemical instruments has meant instantaneous capability of providing hundreds and even thousands ofwavelengths or potentials. Because of such a distinctive feature, a chemical sample is characterized with very high-dimensional and collinear wavelength or potential variables. The number of observations is usually small compared with the number of wavelength or potential descriptors. This critically affects the future predictive ability of the whole model and may lead to possible overfitting. The elimination of wavelengths or potential that contain useless or irrelevant information for calibration model like noise and background is becoming one of the key steps in multicomponent analysis [5—7]. The benefit gained from variable selection is not only the stability of the model, but also the interpretability of relationship between the model and sample compositions.

For multicomponent systems, particularly those with similar components, an intense overlapping between the spectra and an increased complexity of samples practically involved persist in a need of useful approaches to build robust and stable linear calibration models. Several approaches have been reported for multicomponent system, such as partial least-squares regression (PLS), mul-tivariate curve resolution-alternating least squares and Parallel factor analysis. PLS [8, 9] has been found useful in handling multivariate calibration in case there are many wavelengths or potentials involved and has been recommended as an approach to efficiently utilize the information carried by these descriptors. However, there is an increasing evidence that variable selection is also essential for successful PLS analysis of multicomponent data and the lack of variable selection also can spoil the PLS regression [10, 11]. It has been recognized that the elimination of uninformative spectral or potential channels which do not contribute to model formulation is of importance even in situations where the PLS is applied.

A number of procedures have been developed to improve the performance of PLS models in multivariate calibration. The search algorithms for locating the

optimal wavelength subsets comprise the classical stepwise regression [12], branch and bound combinatorial search [13], as well as some more sophisticated methodologies such as simulated annealing [14], and genetic algorithms (GAs) [15, 16]. Most of GA-based search procedures [16] were designed to select individual spectral or potential points, which selected variables with values 1 in a binary chromosome bit string and deleted variables with values 0. Because of the continuity of most kinds of spectral responses, selection of a few individual wavelengths instead of wavelength or potential intervals would cause a loss of useful information. Some wavelength intervals selected by GA [17] encoded each gene with fixed-size consecutive spectral points. The determining interval size in this strategy was still depended on experiences. Furthermore, as small perturbations in the experimental conditions and the physical properties of samples may make responses at some local spectral intervals, the size of spectral intervals may be different from each other. The fixed-size spectral intervals may increase the risk of missing the optimal interval. For the above reasons, adaptively selecting different wavelength interval seems very attractive.

A modified discrete particle swarm optimization (PSO) algorithm has been proposed in our previous study [18] to select individual variables in PLS modeling for quantitative structure—activity study with satisfactory performance. Similar to GAs, PSO is an optimization technique simulating biological systems. As the effective method to variable selection in multi-component spectral and electrochemical analysis focuses on the selection ofwavelength or potential intervals, the adaptive wavelength interval selection by modified PSO is proposed in the present study. The proposed method randomly optimizes wavelength or potential intervals and is used for simultaneous spec-trophotometric determination of caffeine, aspirin, and phenacetin and simultaneous differential pulse volta-mmetric determination of nitrophenol isomers. The method is eventually applied to commercially available pharmaceutical samples. The results were compared to those obtained by full-spectrum PLS modeling. The results demonstrate that the adaptive wavelength interval selection by modified PSO is capable of improving the future predictive ability of the model.

Theory. Adaptive wavelength interval selection by modified PSO (AWISPSO). The selection of suitable wavelength and the extraction of characteristic information are capable of improving the model accuracy. The efficient scheme is to select different wavelength interval adaptively. PSO [19, 20] is a stochastic global optimization technique and can be used for wavelength selection. According to information sharing

mechanism of PSO, a modified discrete PSO [18] was proposed in our previous work. The PSO algorithm models the exploration of a problem space by a population of individuals or particles. All of particles have fitness values which are evaluated by a fitness function to be optimized. The populations of individuals are updated by applying some kind of operators according to the fitness information so that the individuals of the population can be expected to move towards better solution areas. In the modified discrete PSO, each particle is encoded by a string of binary bits associated with the number of wavelengths. For wavelength interval selection, each particle is decoded or translated into assembled wavelength intervals with different sizes. Unlike individual spectral point selection, the most important issue in the decoding process is that the wavelength intervals between the adjacent bits with values 1 are alternately selected. Consider a particle (with size 15) showed as:

010001001000001,

where the corresponding wavelength number of the particle is 15 and the value of the second, sixth, ninth, fifteenth bit are "1". The wavelength ranges of 2—6 and 9—15 are selected. In the algorithm, wavelength intervals involved in each particle are different and are adaptively adjusted in computation.

The adaptive wavelength interval selection by modified PSO (AWISPSO) is described as follows.

Step 1. Randomly initialize the initial strings WI in modified discrete PSO with an appropriate size of population. WI is composed of strings of binary bits associated with the number of wavelengths.

Step 2. Translate WI into assembled wavelength intervals. The wavelength intervals between the adjacent bits "1" in each string are alternately selected.

Step 3. Calculate the fitness function of individual corresponding to models in training set. If the best object function of the generation fulfills the end condition, the training is stopped with the results output, otherwise, go to the next step.

Step 4. Update the WI population according to the modified discrete PSO.

Step 5. Go back to the second step and calculate the fitness of the renewed population.

Scheme. The chart of the AWISPSO scheme.

The AWISPSO scheme is presented in Scheme. Even if the same training set is used, it should be noted that the selected wavelength intervals at each iteration by PSO are not necessarily the same because of the different random seeds. To accurately select wavelength intervals, the above PSO calculations are repeated 100 times, and the most frequently appeared wavelength intervals are obtained. The obtained wavelength intervals are used for PLS model building and predicting the test data set.

Fitness function. In the modified PSO, the performance of each particle is measured according to a predefined fitness function. The modified Cp statistics as objective function is applied to variable selection in the modified PSO. The modified Cp in PLS is expressed as follows:

Cp (p) = RSSj(tPls - (n - 2k). (1)

Here n is the number of dependent variables, k is the number of latent variab

Для дальнейшего прочтения статьи необходимо приобрести полный текст. Статьи высылаются в формате PDF на указанную при оплате почту. Время доставки составляет менее 10 минут. Стоимость одной статьи — 150 рублей.

Показать целиком