ВЫСОКОМОЛЕКУЛЯРНЫЕ СОЕДИНЕНИЯ, Серия Б, 2011, том 53, № 9, с. 1665-1671
ТЕОРИЯ
УДК 541.64:542.952
A SUPPORT VECTOR MACHINE MODEL FOR THE PREDICTION OF MONOMER REACTIVITY RATIOS1
© 2011 г. Xinliang Yu" b and Xueye Wangb
a College of Chemistry and Chemical Engineering, Hunan Institute of Engineering, Xiangtan, Hunan 411104, China b Key Laboratory of Environmentally Friendly Chemistry and Applications of Ministry of Education, College of Chemistry,
Xiangtan University, Xiangtan, Hunan 411105, China e-mail: yxliang5602@sina.com.cn Received January 28, 2011 Revised Manuscript Received February 27, 2011
Abstract—To predict monomer reactivity ratios in radical copolymerization of monomers M1 (C1H2=C2XY) with M2 (styrene), a support vector machine model was developed. After 16 quantum chemical descriptors were calculated by the density functional theory at B3LYP level of theory with 6-31G(d) basis set, the genetic algorithm method, together with multiple linear regression analysis, was used to select the best combinations of the variables. The optimal SVM model with four descriptors (qACi, QAC 2, й and ELUMO) was obtained with the Gaussian radical basis kernel (C = 8000, s = 0.001 and у = 0.01). The root-mean-square errors for training set, validation set and test set are 0.125, 0.123 and 0.188, respectively, which are more accurate than the existing artificial neural network model. Therefore, it is reasonable to predict monomer reactivity ratios with the support vector machine method.
INTRODUCTION
The monomer reactivity ratios not only can describe relative reactivities of monomers, but also can provide valuable and precise information for the determination of microstructural parameters such as the distribution of units and sequence lengths along the macromolecular chains [1]. The copolymer composition equation which relates the composition of the initially formed copolymer and the initial monomer mixture is given by [2]
Rp - Rm
(rnRm + l)/(r21 + Rm ),
(1)
where Rm is equal to [M1]/[M2] in the monomer mixture and Rp is equal to [M1]/[M2] in the polymer formed. It would be extremely useful to obtain the values of r12 and r21 and hence the composition of any copolymer produced from any pair of monomers at any concentration ratios [2].
Generally, the reactivity ratios are obtained experimentally. They also can be determined using semi-empirical methods such as the Q—e scheme [3, 4] and the revised patterns scheme [5, 6]. But the semi-empirical methods are limited as the parameter values (Q, e, u and v) are not known. Yu et al. [7] developed artificial neural network models to predict monomer reactivity ratios (logr12) in radical copolymerization of monomers M1 (styrene, methyl methacrylate and acryloni-trile) with M2 (vinyl monomers). But these models have not been tested by prediction sets.
Статья печатается в представленном авторами виде.
In recent years, support vector machine (SVM) has become one of the most promising learning algorithms for classification and regression due to many attractive features and successful applications. The goal of this paper is to produce robust SVM model that could predict the monomer reactivity ratios logr1S in radical copolymerization of monomers M1 (C1H2=C2XY) with M2 (styrene).
MATERIAL AND METHODS
Table shows 60 monomer reactivity ratios of radical copolymerization for vinyl monomers Mj (C1H2=C2XY) with M2 (styrene) [8]. Monomers 1 show a high degree of structural variety. For example, the functional groups present in the side chains include acids, aldehydes, amides, nitrile, ketones, ha-lides, esters, sulfides, aromatic rings, non-aromatic rings, and so on. The logarithms of monomer reactivity ratios r1s are used because the spread of the data sets is more even when log r1s is used instead of r1s. Moreover, the logarithmic form of monomer reactivity ratios provides a more convenient linear solution for the Q—e scheme [3, 4] and the revised patterns scheme [5, 6]. The data set of monomer reactivity ratios (see table) was randomly split into training, validation and prediction sets of 30, 15 and 15 monomers, respectively.
To fit reactivity ratios log r1s, 16 descriptors were calculated using the density functional theory (DFT) in Gaussian 03 [9] program at B3LYP level of theory
1
1666
XINLIANG YU, XUEYE WANG
Descriptors used and monomer reactivity ratios for 60 monomers
No. Monomers <*aC Q 2 ^AC log >1S
-^LUMO Exp. Calc.
Training set
1 Vinyl acetate -0.209225 0.537494 1.6807 -0.01035 -1.6990 -1.3023
2 Vinyl bromide -0.184515 0.322882 1.4905 -0.00276 -1.2680 -1.0184
3 Vinyl chloride -0.178405 0.388537 1.6128 -0.00130 -1.2600 -1.2696
4 Vinyl chloromethyl ketone -0.009070 -0.158362 4.4722 -0.07280 -0.2950 -0.3047
5 Vinyl dichloroacetate -0.182978 0.533141 1.2219 -0.05583 -0.5530 -0.5428
6 Vinyl methyl ketone -0.012812 -0.158676 3.1137 -0.05691 -0.4950 -0.4558
7 Vinyl phenyl sulfide -0.167266 0.179144 1.3838 -0.01269 -0.8540 -0.5730
8 Vinyl stearate -0.224553 0.578319 1.5330 -0.00786 -1.3010 -1.3114
9 Vinyl tert-butyl sulfide -0.212891 0.224253 1.8887 0.02011 -0.8013 -0.8117
10 Vinyl 2-chloroethyl ether -0.291438 0.618845 1.0868 0.01693 -1.1550 -1.1447
11 Styrene, p-chloromethyl- -0.139193 0.094446 2.4578 -0.04866 0.0492 0.0388
12 Styrene, p-methyl- -0.163257 0.144259 0.6170 -0.02707 -0.0031 -0.0135
13 Styrene, p-l-(2-hydroxypro- pyl)- -0.168992 0.151401 1.9882 -0.02459 -0.0410 -0.1935
14 Acrylate, a-chloro-, methyl -0.143005 0.224688 3.5147 -0.05076 -0.5229 -0.5126
15 Acrylate, a-cyano-, methyl -0.034339 -0.016851 5.5808 -0.08472 -0.2147 -0.2051
16 Acrylate, ethyl -0.001391 -0.164429 1.6646 -0.04240 -0.7696 -0.7592
17 Acrylate, methyl -0.001352 -0.154921 1.4969 -0.04405 -0.7447 -0.7536
18 Acrylate, octadecyl -0.015417 -0.139835 2.3856 -0.04263 -0.5850 -0.5953
19 Methacrylate, 2,2,6,6-tetra-methyl-4-piperidinyl -0.059297 -0.104476 1.2008 -0.03683 -0.5229 -0.4406
20 Methacrylate, 2-bromoethyl -0.075200 -0.067634 2.8452 -0.04647 -0.3872 -0.2525
21 Methacrylate, benzyl -0.079629 -0.065409 2.0021 -0.03776 -0.3279 -0.2990
22 Methacrylate, butyl -0.080664 -0.060589 1.8956 -0.03604 -0.2757 -0.3300
23 Methacrylate, glycidyl -0.061043 -0.100338 2.1421 -0.04089 -0.3010 -0.3627
24 Methacrylate, isobutyl -0.078891 -0.058047 1.9092 -0.03681 -0.3768 -0.3355
25 Methacrylate, methyl -0.075020 -0.057835 1.6753 -0.03818 -0.3372 -0.3468
26 Naphthalene, 1-vinyl- -0.114970 0.084520 0.1210 -0.04457 0.3054 0.3153
27 Pyridine, 2-methyl-5-vinyl- -0.145218 0.119726 1.7369 -0.03716 -0.0706 -0.0603
28 Pyridine, 2-vinyl- -0.086862 -0.028320 1.8107 -0.04106 0.1004 -0.2346
29 Pyridine, 4-vinyl- -0.099799 0.037907 2.4607 -0.05125 -0.1612 -0.1716
30 Acrolein 0.019444 -0.177162 Validation set 3.1609 -0.06506 -0.5686 -0.5785
31 Methacrylate,3,5-dimethylad-amantyl -0.067817 -0.106720 1.5753 -0.03652 -0.2007 -0.3665
32 Itaconic anhydride -0.003363 -0.152358 4.8068 -0.07674 -0.2596 -0.2562
33 Hexatriene, tetrachloro- -0.055149 0.004375 2.4091 -0.07042 -0.0706 -0.2397
34 Styrene,P-acetoxy- -0.160990 0.142581 1.6986 -0.03490 0.1004 0.0144
35 Tetrazole, 1-vinyl- -0.155896 0.374473 5.4083 -0.05513 -0.7352 -0.8857
36 Acrylamide, N-methylol -0.041787 -0.091072 3.5004 -0.04772 -0.1549 -0.3555
37 Pyridine, 2-vinyl-5-ethyl- -0.127255 0.069087 2.2222 -0.03792 0.0374 -0.1629
38 Vinyl ethyl sulfide -0.178523 0.177255 1.6443 0.00808 -0.7400 -0.8920
39 Vinyl hendecanoate -0.223756 0.575421 1.5823 -0.00790 -1.3010 -1.3153
Table. (Contd.)
No. Monomers qAC1 Q 2 AC E log
ELUMO Exp. Calc.
40 p -Vinylbenzylmethylcarbinol —0.158462 0.134567 1.4136 —0.03498 —0.0270 0.0706
41 Styrene —0.146739 0.116960 0.1907 —0.03054 0.0000 0.0609
42 Acrylate, a-phenyl-, methyl —0.155336 0.028886 4.0697 —0.04292 0.1072 0.0884
43 Acrylate, benzyl —0.013452 —0.142093 2.4432 —0.04471 —0.6990 —0.5839
44 Acrylate, butyl —0.001012 —0.170791 1.7185 —0.04178 —0.7447 —0.7518
45 Methacrylic acid —0.059861 —0.060163 Prediction set 1.5819 —0.04337 —0.2807 —0.3751
46 Methacrylonitrile —0.091307 0.144638 3.8840 —0.04510 —0.4815 —0.5573
47 Methacrylamide, N-phenyl- —0.085817 —0.075495 3.4554 —0.03884 —0.0555 —0.1812
48 Methacrylate, phenyl —0.042366 —0.113254 3.8061 —0.04936 —0.2924 —0.2816
49 Methacrylate, 2-hydroxyethyl —0.078067 —0.057835 3.2751 —0.03920 —0.1938 —0.2696
50 Acrylamide —0.016236 —0.165551 3.5335 —0.03079 —0.1549 —0.2906
51 Isopropenyl isocyanate —0.260985 0.668904 2.3130 —0.01190 —1.0177 —1.1875
52 Isopropenyl methyl ketone —0.054776 —0.162201 2.8722 —0.04909 —0.3188 —0.2434
53 Oxazoline, 2-isopropenyl- —0.116250 0.054263 1.1946 —0.02564 —0.1938 —0.4437
54 Oxazoline, 2-isopropenyl-4,4-dimethyl- —0.121448 0.057456 1.4826 —0.02327 —0.1675 —0.4600
55 Silane, 3-methacryloxypropyl, trimethoxy- —0.107692 —0.037012 3.6349 —0.02992 —0.0615 —0.1170
56 p-Vinylbenzoic acid —0.113873 0.048918 4.9875 —0.06677 0.0124 —0.2028
57 Styrene, a-methyl —0.184001 0.170400 0.3137 —0.02195 —0.2219 0.0446
58 Vinyl chloroacetate —0.185126 0.505409 3.4374 —0.04997 —1.5230 —1.2450
59 Methacrylate, 2-chloroethyl —0.073993 —0.066793 2.9844 —0.04715 —0.5229 —0.2539
60 Vinylidene chloride —0.282129 0.761752 1.5146 —0.01686 —0.9686 —1.1520
with 6-31G(d) basis set. These descriptors included Mulliken charges of C1, C2 and R3 (qMCi, qMC 2, and qMR3), Mulliken charges of C1, C2 and R3 with hydrogens summed into heavy atoms (QMCi, QMC2, and Q.r „3), atomic polar tensor charges of C1, C2 and R3
MR
(q ,, q2, and q 3), atomic polar tensor charges of
C1, C2 and R3 with hydrogens summed into heavy atoms (Q i, Q 2 , and Q 3), the energies of the highest
AC AC AR
occupied molecular orbital (EHOMO) and the lowest unoccupied molecular orbital (ELUMO), LUMO and HOMO orbital energy difference (AEg — Elumo — - Eh
), and the total dipole moment
HOMO.
The genetic algorithm (GA) method was used to select an optimum subset of descriptors for SVM models. For the last few years, the GA
Для дальнейшего прочтения статьи необходимо приобрести полный текст. Статьи высылаются в формате PDF на указанную при оплате почту. Время доставки составляет менее 10 минут. Стоимость одной статьи — 150 рублей.