0% found this document useful (0 votes)
70 views755 pages

978 3 642 05253 8

Uploaded by

vikneswp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views755 pages

978 3 642 05253 8

Uploaded by

vikneswp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture Notes in Artificial Intelligence 5855

Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science


Hepu Deng Lanzhou Wang
Fu Lee Wang Jingsheng Lei (Eds.)

Artificial Intelligence
and Computational
Intelligence
International Conference, AICI 2009
Shanghai, China, November 7-8, 2009
Proceedings

13
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany

Volume Editors
Hepu Deng
RMIT University, School of Business Information Technology
City Campus, 124 La Trobe Street, Melbourne, Victoria 3000, Australia
E-mail: [email protected]

Lanzhou Wang
China Jiliang University, College of Metrological Technology and Engineering
Hangzhou 310018, Zhejiang, China
E-mail: [email protected]

Fu Lee Wang
City University of Hong Kong, Department of Computer Science
83 Tat Chee Avenue, Kowloon Tong, Hong Kong, China
E-mail: [email protected]

Jingsheng Lei
Hainan University, College of Information Science and Technology
Haikou 570228, China
E-mail: [email protected]

Library of Congress Control Number: 2009936961

CR Subject Classification (1998): I.2, F.2, I.5, I.4, J.3, J.2

LNCS Sublibrary: SL 7 – Artificial Intelligence

ISSN 0302-9743
ISBN-10 3-642-05252-5 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-05252-1 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2009
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12781256 06/3180 543210
Preface

The 2009 International Conference on Artificial Intelligence and Computational Intel-


ligence (AICI 2009) was held during November 7–8, 2009 in Shanghai, China. The
technical program of the conference reflects the tremendous growth in the fields of
artificial intelligence and computational intelligence with contributions from a large
number of participants around the world.
AICI 2009 received 1,203 submissions from 20 countries and regions. After rigor-
ous reviews, 79 high-quality papers were selected for this volume, representing an
acceptance rate of 6.6%. These selected papers cover many new developments and
their applications in the fields of artificial intelligence and computational intelligence.
Their publications reflect a sustainable interest from the wide academic community
worldwide in tirelessly pursuing new solutions through effective utilizations of artifi-
cial intelligence and computational intelligence to real-world problems.
We would like to specially thank all the committee members and reviewers, without
whose timely help it would have been impossible to review all the submitted papers to
assemble this program. We also would like take this opportunity to express our heartfelt
appreciation for all those who worked together in organizing this conference, establish-
ing the technical programs and running the conference meetings. We greatly appreciate
the authors, speakers, invited session organizers, session Chairs, and others who made
this conference possible. Lastly, we would like to express our gratitude to the Shanghai
University of Electric Power for the sponsorship and support of the conference.
We trust that the participants of this conference had opportunities to meet both old
and new friends from all over the world, to exchange ideas, to initiate collaborations
in research, to develop lasting friendships, as well as to visit places in Shanghai and
China.

November 2009 Hepu Deng


Lanzhou Wang
Fu Lee Wang
Jingsheng Lei
Organization

Organizing Committee

General Co-chairs

Jialin Cao Shanghai University of Electric Power, China


Jingsheng Lei Hainan University, China

Program Committee
Co-chairs

Hepu Deng RMIT University, Australia


Lanzhou Wang China JILIANG University, China

Local Arrangements

Chair

Hao Zhang Shanghai University of Electric Power, China

Proceedings

Co-chairs

Fu Lee Wang City University of Hong Kong, Hong Kong


Jun Yang Shanghai University of Electric Power, China

Publicity

Chair

Tim Kovacs University of Bristol, UK

Sponsorship

Chair

Zhiyu Zhou Zhejiang Sci-Tech University, China


VIII Organization

Program Committee
Ahmad Abareshi RMIT University, Australia
Stephen Burgess Victoria University, Australia
Jennie Carroll RMIT University, Australia
Eng Chew University of Technology Sydney, Australia
Vanessa Cooper RMIT University, Australia
Minxia Luo China Jiliang University, China
Tayyab Maqsood RMIT University, Australia
Ravi Mayasandra RMIT University, Australia
Elspeth McKay RMIT University, Australia
Alemayehu Molla RMIT University, Australia
Konrad Peszynski RMIT University, Australia
Siddhi Pittayachawan RMIT University, Australia
Ian Sadler Victoria University, Australia
Pradip Sarkar RMIT University, Australia
Carmine Sellitto Victoria University, Australia
Peter Shackleton Victoria University, Australia
Sitalakshmi Venkatraman University of Ballarat, Australia
Leslie Young RMIT University, Australia
Adil Bagirov University of Ballarat, Australia
Philip Branch Swinburne University of Technology, Australia
Feilong Cao China Jiliang University, China
Maple Carsten University of Bedfordshire, UK
Caroline Chan Deakin University, Australia
Jinjun Chen Swinburne University of Technology, Australia
Richard Dazeley University of Ballarat, Australia
Yi-Hua Fan Chung Yuan Christian University Taiwan, Taiwan
Richter Hendrik HTWK Leipzig, Germany
Furutani Hiroshi University of Miyazaki, Japan
Bae Hyeon Pusan National University, Korea
Tae-Ryong Jeon Pusan National University, Korea
Sungshin Kim Pusan National University, Korea
Wei Lai Swinburne University of Technology, Australia
Edmonds Lau Swinburne University of Technology, Australia
Qiang Li University of Calgary, Canada
Xiaodong Li RMIT University, Australia
Kuoming Lin Kainan University, Taiwan
YangCheng Lin National Dong Hwa University, Taiwan
An-Feng Liu Central South University, China
Liping Ma University of Ballarat, Australia
Costa Marly Federal University of the Amazon, Brazil
Jamie Mustard Deakin University, Australia
Syed Nasirin Brunel University, UK
Lemai Nguyen Deakin University, Australia
Heping Pan University of Ballarat, Australia
Organization IX

Craig Parker Deakin University, Australia


Wei Peng RMIT University, Australia
Adi Prananto Swinburne University of Technology, Australia
Takahashi Ryouei Hachinohe Institute of Technology, Japan
Cheol Park Soon Chonbuk National University, Korea
Andrew Stranier University of Ballarat, Australia
Zhaohao Sun University of Ballarat, Australia
Sleesongsom Suwin Chiangrai College, Thailand
Arthur Tatnall Victoria University, Australia
Luba Torline Deakin University, Australia
Schetinin Vitaly University of Bedfordshire, UK
Zhichun Wang Tianjin University, China
Shengxiang Yang University of Leicester, UK
Chung-Hsing Yeh Monash University, Australia
Yubo Yuan China Jiliang University, China
Hossein Zadeh RMIT University, Australia
Dengsheng Zhang Monash University, Australia
Irene Zhang Victoria University, Australia
Jianying Zhang Cancer Victoria, Australia
Tony Zhang Qingdao Univesity, China
Weijian Zhao China Jiliang University, China
Yan-Gang Zhao Nagoya Institute of Technology, Japan
Xiaohui Zhao Swinburne University of Technology, Australia

Reviewers
Adil Bagirov Chen Lifei Du Junping
Ahmad Abareshi Chen Xiang Du Xufeng
Bai ShiZhong Chen Ling Duan Yong
Bi Shuoben Chen Dongming Duan Rongxing
Bo Rui-Feng Chen Ming Fan Jiancong
Cai Weihua Chen Ting Fan Xikun
Cai Zhihua Chen Yuquan Fan Shurui
Cai Kun Chen Ailing Fang Zhimin
Cao Yang Chen Haizhen Fang Gu
Cao Qingnian Chu Fenghong Fuzhen Huang
Cao Guang-Yi Chung-Hsing Yeh Gan Zhaohui
Cao Jianrong Congping Chen Gan Rongwei
Carmine Sellitto Cui Shigang Gao chunrong
Chai Zhonglin Cui Mingyi Gao Xiaoqiang
Chang Chunguang DeJun Mu Gao Jiaquan
Chen Yong Liang Deng Liguo Gao Jun
Chen Haiming Deng-ao Li Gao Boyong
Chen Yong Ding Yi-jie Gu Xuejing
Chen Pengnian Dingjun Chen Gu Tao
X Organization

Guang Yang Li Xiang Peng Qinke


Guo Li Jun Li Ming Perng Chyuan
Guo Yecai Li Guangwen Phaisangittisagul Ekachai
Han Fei Li Guohou Qiang Niu
Hao Xiuqing Li Hongjiao Qiang Li
He Hongjiang Liang Benliang Qiao Mei
He Shengping Lin Yuesong Qiu Jiqing
He Bingwei Lin Dong Mei Rao Congjun
He Xingshi Lin Chen Ren Li
He Zhiwei Lin Jipeng Ren Kaijun
He Yue Liu Jun Ren Chunyu
Hendrik Richter Liu Zhijie Richard Dazeley
Ho Zih-Ping Liu Xiaoji Ruiming Jia
Hu Fu Yuan Liu Hui Sang Yan-fang
Huang Pei-Hwa Liu Anping Sanyou Zeng
Huang Zhiwei Liu Zhide Shen Xiaojing
Huang Rongbo Liu Jihong Shen Xiaohua
Huang Nantian Liu Heng Sheng Zhongqi
Ji Genlin Liu Jin Sheng Xueli
Ji Tingwei Liu Wenyu Shi Qian
Ji Xia Liu Dianting Shi Guoliang
Ji Chang-Ming Liu Lei Shi Xuerong
Jia Qiuling Liu Jin Shuai Ren
Jia Fang Liu Jinglei Si Chun-di
Jiang Jiafu Liu Hui Sitalakshmi Venkatraman
Jianying Zhang Liu Guangli Song Xiaoyu
Jiao Feng Liu Yang Su H.T.
Jiechun Chen Liu Ying Sun Zhanquan
Jingbo Xia Liu Hao Sun Yanxia
Jingfang Liu Liu Jian Sun Quanping
Jin-min Wang Liu Ying-chun Sun Feixian
Ju-min Zhao Lu Yaosheng Sun Ziwen
Koga Takanori Lu Bin Suwin Sleesongsom
Kwok Ngai Ming Lu Yuzheng Tan Junshan
Lai Jiajun Lu Huijuan Tan Chunqiao
Lai Lien-Fu Ma Xianmin Tang Xianghong
Lai-Cheng Cao Matsuhisa Takashi Tao Luo
Lei Zhang Meng Jianliang Tao Zhengru
Lei Xiujuan Miao Xuna Tao Zhang
Leng Cuiping Ni Jian-jun Tian Fu-qing
Li Yu Nie Qing Tong Zhi-Wei
Li Xinyun Ning Ning Uchino Eiji
Li Xiao Niu Junyu Ullrich Carsten
Li Shun-Ming Ouyang Zhonghui Wan Li
Li Yun Pan Zhenghua Wang Cheng
Li Haiming Pan Jingchang Wang Hao
Li Chengjia Pan Defeng Wang Li-dong
Organization XI

Wang Hao Wu Xiaoqin Ye Feiyue


Wang Li-dong Wu Zhangjun Yi-Hua Fan
Wang Jie-sheng Wu Da-sheng Yin Chunxia
Wang Guoqiang Xi Liang Yong Yang
Wang Caihong Xia Qingjun Yong Qidong
Wang Jingmin Xiao Xiaohong You Xiao-ming
Wang Shuangcheng Xiao Hongying You-hua Jiang
Wang Chujiao Xie Yanmin Yu Xinqiao
Wang Junwei Xie Zhenping Yuhui Zhao
Wang XiaoMing Xie Jianhong Yuyi Zhai
Wang Xin Xiuli Yu Zhan Yong-zhao
Wang Feng Xiumin Yang Zhang Huaixiang
Wang Yuanqing Xiuxia Yang Zhang Meng
Wang Yong Xizhong Shen Zhang Sumin
Wang Xiang Xu Jie Zhang Yizhuo
Wang Yu-Jie Xu Hailong Zhang Erhu
Wang Yufei Xu Hua Zhang Jianxin
Wang Hongsheng Xu Xinsheng Zhang Caiqing
Wang Xiuqing Xu Yan Zhang Jian
Wang Wenchuan Xu Zhangyan Zhang Yanjie
Wang Dong Xue Heru Zhang Dabin
Wang Haibo Xue Yang Zhang Xiang
Wang Dawei Xue Heng Zhang Jingjun
Wang Hui Xueqin Lu Zhang Zhenya
Wang Junnian Yan Ping Zhang Ke
Wang Xuhong Yan Wang Zhang Xuejun
Wang Xiaolu Yan Tie Zhang Xuehua
Wan-gan Song Yang Genghuang Zhang Junran
Wei Peng Yang Ming Zhang Xueming
Wen Mi Yang Jiadong Zhang Changjiang
Wenjun Liu Yang Xiaobing Zhao Ping
Wu Xia Yang Jun Zhao Fu
Wu Hongli Yang Liping Zhao Liaoying
Wu Xiaojun Yang-Cheng Lin Zhao Hui
Wu Xiaojin Yao Zhijun
Table of Contents

Support Vector Machine


An Experimental Research on Vector Control of Induction Motor Based
on Simple Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Yinhai Zhang, Jinfa Ge, Weixia Liu, and Qin Wang

Rough Set Theory


An United Extended Rough Set Model Based on Developed Set Pair
Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Xia Ji, Longshu Li, Shengbing Chen, and Yi Xu

Learning Rules from Pairwise Comparison Table . . . . . . . . . . . . . . . . . . . . . 18


Liping An and Lingyun Tong

Particle Swarm Optimization


A Novel Hybrid Particle Swarm Optimization for Multi-Objective
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Siwei Jiang and Zhihua Cai

Research on Constrained Layout Optimization Problem Using


Multi-adaptive Strategies Particle Swarm Optimizer . . . . . . . . . . . . . . . . . . 38
Kaiyou Lei
ARIMA Model Estimated by Particle Swarm Optimization Algorithm
for Consumer Price Index Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Hongjie Wang and Weigang Zhao

Neural Computation
A Uniform Solution to HPP in Terms of Membrane Computing . . . . . . . . 59
Haizhu Chen and Zhongshi He

Self-organizing Quantum Evolutionary Algorithm Based on Quantum


Dynamic Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Sheng Liu and Xiaoming You

Intelligent Agents and Systems


Can Moral Hazard Be Resolved by Common-Knowledge in
S4n-Knowledge? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Takashi Matsuhisa
XIV Table of Contents

Casuist BDI-Agent: A New Extended BDI Architecture with the


Capability of Ethical Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Ali Reza Honarvar and Nasser Ghasem-Aghaee

Information Security
Research of Trust Chain of Operating System . . . . . . . . . . . . . . . . . . . . . . . 96
Hongjiao Li and Xiuxia Tian
A Novel Application for Text Watermarking in Digital Reading . . . . . . . . 103
Jin Zhang, Qing-cheng Li, Cong Wang, and Ji Fang

Immune Computation
Optimization of Real-Valued Self Set for Anomaly Detection Using
Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Liang Xi, Fengbin Zhang, and Dawei Wang

Genetic Algorithms
A GP Process Mining Approach from a Structural Perspective . . . . . . . . . 121
Anhua Wang, Weidong Zhao, Chongchen Chen, and Haifeng Wu
Effects of Diversity on Optimality in GA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Glen MacDonald and Gu Fang
Dynamic Crossover and Mutation Genetic Algorithm Based on
Expansion Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Min Dong and Yan Wu
Multidisciplinary Optimization of Airborne Radome Using Genetic
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Xinggang Tang, Weihong Zhang, and Jihong Zhu
Global Similarity and Local Variance in Human Gene Coexpression
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Ivan Krivosheev, Lei Du, Hongzhi Wang, Shaojun Zhang,
Yadong Wang, and Xia Li
A Grid Based Cooperative Co-evolutionary Multi-Objective
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Sepehr Meshkinfam Fard, Ali Hamzeh, and Koorush Ziarati

Fuzzy Computation
Fuzzy Modeling for Analysis of Load Curve in Power System . . . . . . . . . . 176
Pei-Hwa Huang, Ta-Hsiu Tseng, Chien-Heng Liu, and
Guang-Zhong Fan
Table of Contents XV

Biological Computing
Honey Bee Mating Optimization Vector Quantization Scheme in Image
Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Ming-Huwi Horng

Towards a Population Dynamics Theory for Evolutionary Computing:


Learning from Biological Population Dynamics in Nature . . . . . . . . . . . . . 195
Zhanshan (Sam) Ma

Applications of Computational Intelligence


Application of Improved Particle Swarm Optimization Algorithm in
UCAV Path Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Qianzhi Ma and Xiujuan Lei

Modal Analysis for Connecting Rod of Reciprocating Mud Pump . . . . . . 215


Zhiwei Tong, Hao Liu, and Fengxia Zhu

A Time-History Analysis Algorithm of a Non-viscously Damped System


Using Gauss Precise Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Hongyu Shen and Zhongdong Duan

Application of LSSVM-PSO to Load Identification in Frequency


Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Dike Hu, Wentao Mao, Jinwei Zhao, and Guirong Yan

Local Model Networks for the Optimization of a Tablet Production


Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Benjamin Hartmann, Oliver Nelles, Aleš Belič, and
Damjana Zupančič–Božič

Retracted: Implementation of On/Off Controller for Automation of


Greenhouse Using LabVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
R. Alimardani, P. Javadikia, A. Tabatabaeefar, M. Omid, and
M. Fathi

Ant Colony Algorithm


Structure Design of the 3-D Braided Composite Based on a Hybrid
Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Ke Zhang

Robotics
Robot Virtual Assembly Based on Collision Detection in Java3D . . . . . . . 270
Peihua Chen, Qixin Cao, Charles Lo, Zhen Zhang, and Yang Yang
XVI Table of Contents

Robust Mobile Robot Localization by Tracking Natural Landmarks . . . . 278


Xiaowei Feng, Shuai Guo, Xianhua Li, and Yongyi He
Multi-Robot Dynamic Task Allocation Using Modified Ant Colony
System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Zhenzhen Xu, Feng Xia, and Xianchao Zhang

Pattern Recognition
A Novel Character Recognition Algorithm Based on Hidden Markov
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Yu Wang, Xueye Wei, Lei Han, and Xiaojin Wu
New Algorithms for Complex Fiber Image Recognition . . . . . . . . . . . . . . . 306
Yan Ma and Shun-bao Li
Laplacian Discriminant Projection Based on Affinity Propagation . . . . . . 313
Xueping Chang and Zhonglong Zheng
An Improved Fast ICA Algorithm for IR Objects Recognition . . . . . . . . . 322
Jin Liu and Hong Bing Ji
Facial Feature Extraction Based on Wavelet Transform . . . . . . . . . . . . . . . 330
Nguyen Viet Hung
Fast Iris Segmentation by Rotation Average Analysis of
Intensity-Inversed Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Wei Li and Lin-Hua Jiang

Neural Networks
A New Criterion for Global Asymptotic Stability of Multi-delayed
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Kaiyu Liu and Hongqiang Zhang
An Improved Approach Combining Random PSO with BP for
Feedforward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Yu Cui, Shi-Guang Ju, Fei Han, and Tong-Yue Gu
Fuzzy Multiresolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Li Ying, Shang Qigang, and Lei Na
Research on Nonlinear Time Series Forecasting of Time-Delay NN
Embedded with Bayesian Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Weijin Jiang, Yusheng Xu, Yuhui Xu, and Jianmin Wang
An Adaptive Learning Algorithm for Supervised Neural Network with
Contour Preserving Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Piyabute Fuangkhon and Thitipong Tanprasert
Table of Contents XVII

Natural Language Processing


Application Study of Hidden Markov Model and Maximum Entropy in
Text Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Rong Li, Li-ying Liu, He-fang Fu, and Jia-heng Zheng

Automatic Expansion of Chinese Abbreviations by Web Mining . . . . . . . . 408


Hui Liu, Yuquan Chen, and Lei Liu

Adaptive Maximum Marginal Relevance Based Multi-email


Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Baoxun Wang, Bingquan Liu, Chengjie Sun, Xiaolong Wang, and
Bo Li

Semantic Relation Extraction by Automatically Constructed Rules . . . . . 425


Rongfeng Huang

Machine Vision
Object Recognition Based on Efficient Sub-window Search . . . . . . . . . . . . 435
Qing Nie, Shouyi Zhan, and Weiming Li

Machine Learning
A Multi-Scale Algorithm for Graffito Advertisement Detection from
Images of Real Estate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Jun Yang and Shi-jiao Zhu

An Improved Path-Based Transductive Support Vector Machines


Algorithm for Blind Steganalysis Classification . . . . . . . . . . . . . . . . . . . . . . . 453
Xue Zhang and ShangPing Zhong

Logic Reasoning and Theorem-Proving


Resolution with Limited Factoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
Dafa Li

Knowledge Representation and Acquisition


Formal Analysis of an Airplane Accident in N Σ-Labeled Calculus . . . . . . 469
Tetsuya Mizutani, Shigeru Igarashi, Yasuwo Ikeda, and
Masayuki Shio

Using Concept Space to Verify Hyponymy in Building a Hyponymy


Lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Lei Liu, Sen Zhang, Lu Hong Diao, Shu Ying Yan, and Cun Gen Cao
XVIII Table of Contents

Advanced Self-adaptation Learning and Inference Techniques for Fuzzy


Petri Net Expert System Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Zipeng Zhang, Shuqing Wang, and Xiaohui Yuan

Intelligent Signal Processing


MIMO Instantaneous Blind Identification Based on Second-Order
Temporal Structure and Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Xizhong Shen and Guang Meng

Seed Point Detection of Multiple Cancers Based on Empirical Domain


Knowledge and K-means in Ultrasound Breast Image . . . . . . . . . . . . . . . . . 507
Lock-Jo Koo, Min-Suk Ko, Hee-Won Jo, Sang-Chul Park, and
Gi-Nam Wang

Intelligent Scheduling
A Controlled Scheduling Algorithm Decreasing the Incidence of
Starvation in Grid Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Minna Liu, Kairi Ou, Yuelong Zhao, and Tian Sun

A Space Allocation Algorithm for Minimal Makespan in Space


Scheduling Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Chyuan Perng, Yi-Chiuan Lai, Zih-Ping Ho, and Chin-Lun Ouyang

Intelligent Information Retrieval


A New Approach for Chest CT Image Retrieval . . . . . . . . . . . . . . . . . . . . . . 535
Li-dong Wang and Zhou-xiang Shou

Intelligent Information Fusion


Depth Extension for Multiple Focused Images by Adaptive Point
Spread Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
Yuanqing Wang and Yijun Mei

Intelligent Image Processing


Image Annotation Using Sub-block Energy of Color Correlograms . . . . . . 555
Jingsheng Lei

Heuristic Searching Methods


Optimal Evolutionary-Based Deployment of Mobile Sensor Networks . . . 563
Niusha Shafiabady and Mohammad Ali Nekoui
Table of Contents XIX

Fuzzy Logic and Soft Computing


Research of Current Control Strategy of Hybrid Active Power Filter
Based on Fuzzy Recursive Integral PI Control Algorithm . . . . . . . . . . . . . . 568
Zhong Tang and Daifa Liao

A Fuzzy Query Mechanism for Human Resource Websites . . . . . . . . . . . . . 579


Lien-Fu Lai, Chao-Chin Wu, Liang-Tsung Huang, and
Jung-Chih Kuo

Selecting Cooperative Enterprise in Dynamic Enterprise Alliance Based


on Fuzzy Comprehensive Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
Yan Li, Guoxing Zhang, and Jun Liu

Distributed AI and Agents


An Evolutionary Solution for Cooperative and Competitive Mobile
Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Jiancong Fan, Jiuhong Ruan, and Yongquan Liang

Towards an Extended Evolutionary Game Theory with Survival


Analysis and Agreement Algorithms for Modeling Uncertainty,
Vulnerability, and Deception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
Zhanshan (Sam) Ma

Data Mining and Knowledge Discovering


Uncovering Overlap Community Structure in Complex Networks Using
Particle Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Fabricio Breve, Liang Zhao, and Marcos Quiles

Semi-supervised Classification Based on Clustering Ensembles . . . . . . . . . 629


Si Chen, Gongde Guo, and Lifei Chen

Research on a Novel Data Mining Method Based on the Rough Sets


and Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
Weijin Jiang, Yusheng Xu, Jing He, Dejia Shi, and Yuhui Xu

Applications of Artificial Intelligence


The Techno-Economic Analysis of Reducing NOx of Industrial Boiler
by Recycled Flue Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
Caiqing Zhang, Wenpei Hu, Yu Sun, and Bo Yang

Urban Traffic Flow Forecasting Based on Adaptive Hinging


Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
Yang Lu, Jianming Hu, Jun Xu, and Shuning Wang
XX Table of Contents

Turbine Fault Diagnosis Based on Fuzzy Theory and SVM . . . . . . . . . . . . 668


Fei Xia, Hao Zhang, Daogang Peng, Hui Li, and Yikang Su

Architecture of Multiple Algorithm Integration for Real-Time Image


Understanding Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
Bobo Duan, Chunyang Yang, Wei Liu, and Huai Yuan

Others
Formalizing the Modeling Process of Physical Systems in MBD . . . . . . . . 685
Nan Wang, Dantong OuYang, Shanwu Sun, and Chengli Zhao

Study on Stochastic Programming Methods Based on Synthesizing


Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
FaChao Li, XianLei Liu, and ChenXia Jin

Rate Control, Routing Algorithm and Scheduling for Multicast with


Network Coding in Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Xu Na Miao and Xian Wei Zhou

Design and Experimental Study on Spinning Solid Rocket Motor . . . . . . . 715


Heng Xue, Chunlan Jiang, and Zaicheng Wang

An Energy-Based Method for Computing Radial Stiffness of Single


Archimedes Spiral Plane Supporting Spring . . . . . . . . . . . . . . . . . . . . . . . . . 724
Enlai Zheng, Fang Jia, Changhui Lu, He Chen, and Xin Ji

Erratum
Implementation of On/Off Controller for Automation of
Greenhouse Using LabVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E1
R. Alimardani, P. Javadikia, A. Tabatabaeefar, M. Omid, and
M. Fathi

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735


An Experimental Research on Vector Control of
Induction Motor Based on Simple Model

Yinhai Zhang, Jinfa Ge, Weixia Liu, and Qin Wang

College of Informatics & Electronics


Zhejiang Sci-Tech University
Hangzhou City, Zhejiang Province, China
[email protected], [email protected]

Abstract. Given the heavy computation, easy saturation and cumulate errors of
conventional direct vector control, the vector control of induction motor based on
simple model is studied and the detailed scheme is described on the basis of the
decomposing and approximating the rotor flux. Because of the direct closed-loop
control of the magnetizing current and the torque current and the complex current
regulator is completed by PI regulator, so the direct vector control of induction
motor is simplified. The experimental results show that the proposed method is
effective in decreasing the dynamic disturbance and has the advantages of the
simplicity of the code program, rare saturation and shocks.

Keywords: vector control, induction motor, rotor flux, decoupler.

1 Introduction
The vector control can make the induction motor gain the excellent dynamic per-
formance that is similar to the one of the DC motor. Now, the vector control that based
on the rotor field orientated control is paid enough attention because it easily realizes
the absolute decoupling between the magnetizing current and the torque current [1-2].
The direct vector control and the indirect vector control are the two most commonly
used in the vector control of induction motor system. The former has the speed
closed-loop control which includes torque closed-loop and the flux closed-loop control
system, the latter is an open-loop flux control system. There are a large number of
operations such as the differential and the product in the direct vector control. Fur-
thermore, the complex processes of current regulator in the voltage-type inverter bring
the heavy computation [3-5], easy saturation, cumulate errors, and other uncertain
disturbance, resulting in deterioration of system performance.
Therefore, in order to overcome the above-mentioned disadvantages of the direct
vector control and maintain its advantages of performance, the vector control of in-
duction motor based on simple model is studied and the detailed scheme is described in
this paper. After the decomposing and approximating the rotor flux, the magnetizing
current and the torque current are directly close-looped and the complex process of
current regulator in the voltage-type inverter is completed by PI regulator to achieve the
regulation without static error. Thus, the proposed method simplifies the direct vector

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 1–8, 2009.
© Springer-Verlag Berlin Heidelberg 2009
2 Y. Zhang et al.

control of induction motor, and makes the direct vector control system easy, clear, less
code program, rare saturation and shocks.

2 Conventional Direct Vector Control


Fig.1 shows the system of the conventional direct vector control of induction motor.
This system which is built based on the rotor field orientated control includes the speed
control subsystem and the flux control subsystem, and the inner loop of the speed
control subsystem is the torque closed-loop control system. There are the speed regu-
lator (ASR), the torque regulator (ATR) and the flux regulator (AΨR) in the direct
vector control system of induction motor. The current transform includes the Clarke
and the Park conversion.

Park −1t.
ω * Te* i s*t ust* us*α
m, t SA
− ˆ
− Te SB
ωr ψ r* *
ism *
usm
α,β
us*β SC
− ψˆ r
θˆs

iA
pn Lm iB
Lr i iC
st

Fig. 1. The conventional direct vector control system of induction motor

The following are the main formulas that used in the conventional direct vector
control system of the induction motor.
Rotor flux is expressed as follows
Lm
ψr = ism (1)
Tr p + 1
The feedback of the torque calculation equation is
pn Lm
Te = ist ψ r (2)
Lr
The angular speed of synchronous rotation is given by t-he following formula
Lm ist
ωs = ωr + ωsl = ωr + (3)
Trψ r
In the most example applications, the voltage-type inverter is used widely, so it is
necessary to change the current to voltage. This process is called current regulator. And
the equations of the conversion are
An Experimental Research on Vector Control of Induction Motor 3

⎧ Ls σTr p + 1
⎪usm = Rs (1 + R p T p + 1 )ism
⎪ s r
⎪ Tr p + 1 ist
⎪ − σLs (ωr + )ist
⎪ Tr ism
⎨ (4)
⎪u = [ R ( σLs p + 1) + Ls (σT p + 1)]i
⎪ st s
Rs Tr
r st

⎪ σT p + 1
+ ωr Ls r )ism
⎪ Tr p + 1

The parameters of equation (1) to (4) are described as follows:
L2m
σ =1−
Ls Lr
Lm — Mutual inductance
Lr — Rotor inductance
Rs — Stator resistance
Ls — Stator inductance
Tr — Rotor time constant
p — Differential operator
ωr — Rotor angular speed
ωsl — Slip angular speed
pn — The number of pole pairs
ism — M-axis stator current
ist — T-axis stator current
usm — M-axis stator voltage
ust — T-axis stator voltage

From the equation (1) to (3), we can see that the accuracy of estimation of equation (2)
and (3) depend on (1). It shows that the importance of the accuracy of estimation of the
rotor flux. Because of the existence of differential part, the transient drift will appear in
the part of direct current following the transient changes of the digital operation when
the rectangular discrete integrator is used, thus making the value of estimated rotor flux
inaccurate and then reducing the calculation accuracy of the torque and the synchro-
nous speed. Furthermore, the existence of a large number of the differential and product
operations in the equations described by (4) will bring the heavy computation, easy
saturation and cumulate errors. Especially, in the low speed region, the torque is very
prone to oscillate.

3 Vector Control System Based on Simple Model


Fig. 2 gives the vector control system based on simple model. In this paper, the rotor
flux is decomposed and approximated. Then the magnetizing current and the torque
current are for closed-loop control directly in order to overcome these shortcomings of
4 Y. Zhang et al.

Park −1t.
ω* i s*t u st* u s*α
m, t SA
− −i SB
ωr α,β
st

i* *
usm u*sβ SC
sm
−i sm

θˆs
isα
m, t α,β iA
i st

α , β isβ a, b, c iB
ωr ism
Park t. Clarke t.

Fig. 2. The vector control system of induction motor based on simple model

the conventional direct vector control. The system includes the speed control subsys-
tem and the torque current control subsystem. And the inner loop of the speed control
subsystem is the torque current closed-loop control subsystem. The system is com-
posed of three PI regulators, the speed regulator, the torque current regulator and the
magnetizing current regulator.
The process of the main formulas derivation that used in the vector control system
based on simple model are as follows, in which they are discretized to fit DSP computing.
The equation (1) can be rewrote as
Tr pψ r + ψ r = Lm ism (5)
And the rotor flux equation that based on MT reference frame is
ψ r = Lmism + Lr irm (6)
Lσ r is rotor leakage inductance and it is small enough compared with the mutual in-
ductance Lm. So we can consider that Lm and Lr are approximately equal in value ac-
cording to L r = L m + L σ r . Furthermore, in the rotor field orientated control system,
the rotor flux must maintain constant in order to achieve an ideal speed-adjusting
performance under the rating frequency. And above the rating frequency, the flux
weakening control is adopted and is usually achieved through the look-up table. Lmism is
regard as constant and contained by Lrirm . So the equation (6) can be simplified as
ψ r = Lm irm (7)
In equation (7), irm is the M-axis component of rotor current, and applying equation (7)
to (5), we can obtain
dirm
Tr + irm = ism (8)
dt
After discrete differential, equation (8) can be rewrote as
An Experimental Research on Vector Control of Induction Motor 5

Ts
irm (k + 1) = irm (k ) + (ism (k ) − irm (k )) (9)
Tr
Where Ts is the switching period. And as the same method, equation (3) can be modi-
fied as
1 ist (k )
ωs (k + 1) = ωr (k + 1) + (10)
Tr irm (k + 1)
The equation (9) and (10) are derived based on the decomposition and approximation
of the rotor flux, which are considered as crucial observation equations of the system
showed in Fig. 2, being used to calculate the angular speed of synchronous rotation.
Comparing to the conventional direct vector control system, we can find that in the
simplified vector control system it is not to calculate the angular speed of synchronous
rotation by calculating the rotor flux, but to obtain directly through the rotor magnet-
izing current. Then the heavy computational processes are ignored, so the saturation
drift and cumulative errors are avoided. And we will have a good foundation during the
study of the real-time high-performance vector control system.
It can be seen from Fig. 2 that the magnetizing current and the torque current are for
closed-loop control directly rather than the torque and flux in the simplified vector
control system that compared to the conventional vector control. So the calculation of
torque feedback is avoided, and the whole system can regulate the torque and the flux
more quickly and efficiently. In addition, the complex process of current regulator is
included by PI regulator to achieve the regulation without static error. So it make the
system simple, clear and easy computation, and more suitable for real-time
high-performance control.

4 Experiment Analysis
The hardware and software environments of the experiment: the power inverter part is
designed with IGBT (IRGB15B60KD) which is produced by IR. The control system
part adopts the TI 2407DSP EVM by Wintech. The current sensor is LA28-NP pro-
duced by the company of LEM. The resolution of the speed encoder is 1000P/R, and
the type of digital oscilloscope is TektronixTDS2002. Switching frequency is 16 KHz,
switching period Ts = 62.5µs. PI parameters: in the current loop, P=1.12, I=0.0629 and
in the speed loop, P= 4.89, I=0.0131.
The parameters of the induction motor: the rated power: Pn =500W, the rated cur-
rent: In =2.9A, the rated voltage: Vn=220V, the rotor resistance: Rr =5.486Ω, the stator
resistance: Rs =4.516Ω, the mutual inductance: Lm =152mH, Lσr =13mH, Lr =165mH,
pn=2, Ls =168mH.
Fig. 3 gives the trajectory of the total flux circle when the operation frequency f
=5Hz in the vector control system of induction motor based on simple model. Ac-
cording to the experimental waveform, we can find that the trajectory of the total flux
circle is near round and that the system has a good running status when the system is
running at low speed.
6 Y. Zhang et al.

Ψβ (pu)

Ψα (pu)

Fig. 3. The total flux circle with the operation frequency f =5Hz
/450rpm/div

/450rpm/div

ωr
ωr
iA /1.5A/div

ist (pu)

t/50ms/div t/100ms/div

(a) rotor speed and phase current (b) rotor speed and torque current
Fig. 4. The start-up of induction motor with no-load

Fig. 4 represents the experimental waveforms in the process of induction motor


starting up with no-load to running at the set speed of f =30Hz by the simplified vector
control system. Fig. 4a gives the waveform of the rotor speed and the stator phase
current. Fig. 4b shows the waveform of the rotor speed and the torque current. In Fig. 4,
the peak to peak value of phase current is less than 2A, and the adjusting time is 200ms,
and the maximum overshoot of the speed is about 13%. During the increasing speed, ist
reaches the biggest value to meet the requirements of increasing speed quickly and
when the speed reach to a certain degree, the torque current gradually decreased until
the speed reach the feed-forward speed.
An Experimental Research on Vector Control of Induction Motor 7
/300rpm/div

ism (pu)
ωr
ist (pu)

ist (pu)
t/100ms/div t/500ms/div

(a) rotor speed and torque current (b) magnetizing current and torque current

Fig. 5. The experimental wave with random load at f =10Hz

The experimental waveforms that under the random load when the motor runs at the
steady state of f =10Hz is represented by Fig. 5. Fig. 5a gives the waveforms of the rotor
speed and torque current, and we can see that the maximum dynamic speed downhill is
about 16% and the recovery time is less than the time of disturbance operation. Fig. 5b
shows the waveforms of the magnetizing current and torque current, and we can find
that the magnetizing current ism always remain unchanged and be able to get ideal de-
coupling with the torque current. So the vector control of induction motor based on
simple model works well in the anti-disturbance performance.

5 Conclusion
In this paper, the vector control of induction motor based on simple model is studied
and the detailed scheme is described through the decomposing and approximating the
rotor flux. The proposed method makes the system easy and has less code program,
shorter execution time. The experimental results show that the anti-disturbance per-
formance of the simplified vector control system is reliable and efficient and suitable
for real-time high-performance control system.

References
1. Noguchi, T., Kondo, S., Takahashi, L.: Field-oriented control of an induction motor with
robust on-line tuning of its parameters. IEEE Trans. Industry Applications 33(1), 35–42
(1997)
2. Telford, D., Dunnigan, M.W.: Online identification of induction machine electrical parame-
ters for vector control loop tuning. IEEE Trans. Industrial Electronics 50(2), 253–261 (2003)
8 Y. Zhang et al.

3. Zhang, Y., Chen, J., Bao, J.: Implementation of closed-loop vector control system of induc-
tion motor without voltage feedforward decoupler. In: IEEE 21st Annual Applied Power
Electronics Conference and Exposition, APEC, pp. 1631–1633 (2006)
4. Lorenz, Lawson, R.D., Donald, B.: Performance of feedforward current regulators for
field-oriented induction machine controllers. IEEE Trans. Industry Applications 23(1.4),
597–602 (1987)
5. Del Blanco, F.B., Degner, M.W., Lorenz, R.D.: Dynamic analysis of current regulators for
AC motors using complex vectors. IEEE Trans. Industry Applications 35(6), 1424–1432
(1999)
An United Extended Rough Set Model Based on
Developed Set Pair Analysis Method

Xia Ji, Longshu Li, Shengbing Chen, and Yi Xu

Key Lab of Ministry of Education for CI & SP,


Anhui University,
Hefei Anhui, China
[email protected]

Abstract. Different with traditional set pair analysis method, a new method to
micro-decompound the discrepancy degree is proposed in line with the distrib-
uting actuality of missing attributes. Integrated with rough set theory, advances
the united set pair tolerance relation and gives corresponding extended rough set
model. Different values of identity degree and discrepancy degree can modulate
the performance of this model and extend it's application range. Then expound
some existed extended rough set model are the subschema of it. At last simula-
tion experiments and conclusion are given, which validate the united set pair
tolerance relation model can improve classification capability.

Keywords: Set pair analysis, Otherness, Rough set, United set pair tolerance
relation.

1 Introduction

Classic rough set theory is based on the equivalence relation, and for a complete
information system. But in real life, acknowledge obtain always face the incomplete
information system for the reason of data measure error or the limitation of data
acquisition. There are two kind of null value: (1)omit but exist;[2]; (2)lost and not
allowed to be compared. Kryszkiewicz[3] established tolerance relationship for
type(1). Stefanowski[4] and others built similar relationship for type(2). Wang GY[5]
proposed the limited tolerance relation by in-depth study of tolerance relations and
similar relation.
The incomplete degree and the distribution of loss data in various systems are dif-
ferent from each other. And any extended rough set model can achieve a good effect
sometimes but not always. Some quantified extended rough set model, like quantified
tolerance relation and quantified limited tolerance relation[6], has a good effect , but the
process of quantifying costs large calculation.
The set pair analysis (Set Pair Analysis, SPA) theory was formally proposed by
China scholars ZHAO Ke-qin in 1989, which is used to study the relationship between
two set [7].It uses "a + bi + cj "as an associated number to deal with uncertain sys-
tem,such as fuzzy, stochastic or intermediary. It studies the uncertain of two sets from

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 9–17, 2009.
© Springer-Verlag Berlin Heidelberg 2009
10 X. Ji et al.

three aspects: identity, difference and opposition. At present, the set pair analysis the-
ory is widely used in artificial intelligence, system control, management deci-
sion-making and other fields [7].
In this paper, we expand the set pair analysis theory firstly, then put forward a united
extended rough set model based on it.This model can convert to different existed
models with different identity and discrepancy thresholds, including tolerance relation,
similarity relation and limited tolerance relation. Experiments on the UCI data sets
show validity ,rationality and effectiveness of this model.

2 Set Pair Analysis Theory and Rough Set Theory

Set Pair is composed of two sets A and B ,namely H=(A,B).Under general circum-
stances (W), it’s a formula:

u W (A , B ) =
S F P
+ i+ j (1)
N N N

Here N is the total number of features, S is the number of identity features, and P is the
number of contrary features, of the set pair discussed. F = N-S-P is the number of
features of the two sets that are neither identity nor contrary. S/N, F/N, and P/N are
called identity degree, discrepancy degree, and contrary degree of the two sets under
certain circumstances respectively. In order to clarify formula(1),we set a=S/N, b=F/N,
c=P/N, and then uW(A,B) can be rewritten as:
u = a + bi + cj (2)

It is obvious that 0 ≤ a, b, c ≤ 1, and the “a”, “b”, and “c” satisfy the equation a + b + c = 1.
An incomplete information system is the following tuple: I=(U,A,F), where U is a
finite nonempty set of objects, A is a finite nonempty set of attributes, Va is a nonempty

set of values of a A, and F={Fl: U→ρ(Va)} is an information function that maps an
∈ ∈
object in U to exactly one value in Va. For every al A and every xi U, if Fl(xi)is a
single point set, then (U,A,F)is a complete information system. If there is some al ∈ A

and some xi U makes Fl(xi) is not a single point set, then (U,A,F)is an incomplete
information system. An incomplete information system is the special case of a com-
plete information system.
Let I=(U,A,F) is an incomplete information system. For any B ⊆ A,the tolerance
relation TR(B) proposed by M.Kryskiewcz is defined as follows:[3]
TR(B)={(x,y) ∈ U×U| ∀ a ∈ A,a(x)=a(y) ∪ a(x)=* ∪ a(y)=*}
Let I=(U,A,F) is an incomplete information system. For any B ⊆ A ,the similarity
relation SR(B) proposed by Stefanowski is defined as follows:[4]

SR(B)={(x,y)∈U×U|∀b∈B,b(x)=b(y)orb(x)=*}
An United Extended Rough Set Model Based on Developed Set Pair Analysis Method 11

Let I=(U,A,F) is an incomplete information system. For any B ⊆ A, the limited toler-
ance relation LTR(B) proposed by WANG GY is defined as follows:[5]

PB(x)={b|b∈B and b(x)≠*}

LTR (B) = ((x, y) ∈U×U|∀b∈B,b(x)=b(y)=*or((P (x)∩P (y)≠Φ)


B B

and (b(x)≠*&b(y)≠*)→(b (x) = b (y )))}.

3 A Unified Extended Rough Set Model Based on Expanded Set


Pair Analysis Theory

A New Expanded Level Decomposition of Set Pair Analysis

The connection degree u=a +bi +cj can be expanded breadthwise and lengthways ac-
cording to the demand of research The transverse expanding form is:

(a1+a2+... +an)+(blil+b2i2+.. .+ bnin)+(c1jl+c2j2+... +cnjn)

The lengthways expanding form is (see Fig. l):

u(x,y)=a + bi + cj

a1 b1 c1

Fig. 1. The traditional micro-de-compound method of discrepancy degree

On incomplete information system S = (U, A, V),"bi" is also deserved to be studied.


Here N is the total number of attributes, S is the number of identity attributes, and P is the
number of contrary attributes, F = N-S-P is the number of missing attributes. Different
from traditional set pair analysis theory, we de-compound "bi" from another angle: dis-
tribution of missing attributes. Specific decomposition shown in Fig. 2.

u(x,y)=a + bi + cj

b1 b2 b3

Fig. 2. Developed micro-de-compound method of discrepancy degree


12 X. Ji et al.

Here for any a in attribute set A, b1=|{a| a(x)=*&a(y)≠*}|, b2=|{a| a(x)≠*&a(y)=*}|,


b3=|{a| a(x)=*&a(y)=*}| The "b1" represents that attribute of x is missing. The "b2"

represents that attribute of y is missing. And "b3" represents attribute of x and y are all
missing.
The traditional set pair analysis theory decompose the discrepancy term "bi" ac-
cording to possible value of missing attribute. While in this paper we from the angle of
distribution of missing attributes to decompose discrepancy term "bi".In the later sec-
tion we'll prove that the symmetry and transitivity of binary relation depend on the
threshold of b1 and b2.

B Extended Rough Set Model Based on United Set Pair Tolerance Relation

The connection degree between two objects is composed of three parts: the ratio of
known and same attributes (the identity degree a), the ratio of known but different at-
tributes (contrary degree cj) and the ratio of unknown attributes (discrepancy degree b ).The
contribution of contrary degree to similarity is negative, without consideration to deal
with noise data, so we set contrary degree be 0,that is c=0.According to expanded set pair
analysis theory, we give the united set pair tolerance relation.

Definition 1. Let I= (U, A, V)is an incomplete information system, ∀ x, y ∈ U, B ⊆ A,


define unified set pair tolerance relation (USPTR) as follows:

USPTR(B)={(x, y)| uB(x, y)=a+(b1+b2+b3)i,a+b1+b2+b3=1,a≥α,bi≤βi}

Here α+β1+β2+β3=1.The united set pair tolerance relation class of x is defined as :



USPTRB(x)={y| y USPTR (B)}.

Definition 2. Given incomplete information system S = (U, A, V), B ⊆ A, X ⊆ U, the


upper and lower approximation of X are defined respectively as follows:


USPTR B X ) = {x | USPTRB ( x) ∩ X ≠ Φ}

USPTRB ( X ) = {x | USPTRB ( x) ⊆ X }

When B=A, USPTR B ( X ) and USPTR B ( X ) can be abridged as USPTR ( X ) and


USPTR(X ) .

Definition 3. Given incomplete information system S = (U, A, V),X ⊆ U, the positive


domain, negative domain, boundary region, roughness of X in universe U are defined
as follows:
An United Extended Rough Set Model Based on Developed Set Pair Analysis Method 13

Positive Domain: pos ( X ) = USPTR ( X )


Negative Domain: NEG ( X ) = U − pos ( X )
Boundary Domain : BND( X ) = USPTR( X ) − USPTR( X )
Roughness: ρ ( X ) = 1 − USPTR ( X ) / USPTR ( X )

C Performance Analysis of Extended Rough Set Model Based on United Set Pair
Tolerance Relation

Unified set pair tolerance relation is reflexive,but whether it satisfy symmetry and
transitivity is not a conclusion. When the identity degree and discrepancy degree
thresholds are changed, this extended model will have different performance. We have
three theorems as follows.

Let I= (U,A,V)is an incomplete information system, ∀ x, y U, B ⊆ A,α is threshold
of identity degree a, β1,β2 and β3 are thresholds of discrepancy degree bi respectly.

Theorem 1. If threshold of b1 doesn't equal to threshold of b2, that is β1≠β2,


USPTR(B)doesn't satisfy symmetry.


Proof. Let y USPTRB(x), that is u(x,y)=a+(b1+b2+b3)i, b1≤β1 and b2≤β2. Because
β1≠β2, there may be some y in USPTRB(x) make b1>β2 or b2>β1.
u(y,x)=a+(b2+b1+b3)i, so x ∉ USPTRB(y).

Theorem 2. If β1=0 or β2=0,USPTR(B) satisfies transitivity.


Proof. Firstly prove the situation of β1=0. Let y USPTRB(x) and z ∈USPTR (y),
∈A, if a(x)≠*, then a(y)=a(x). Simul-
B

then (x,y)=a+(b2+b3)i, u(y,z)=a+(b2+b3)i. ∀ a


taneously a(y)≠* and a(z)=a(y). So ∀ a ∈A, if a(x)≠*, we can get a(z)≠*,that is
u(x,z)=a(b2+b3)i, z ∈USPTR (x). The transitivity of USPTR(B)is proven. The situa-
B

tion of β2=0 is similar, here we doesn't give the details.

Theorem 3. If β1=0 and β2=0,USPTR(B) satisfies symmetry and transitivity.


Proof. According to Theorem1 and Theorem2, the proof is easy, so here we doesn't give
unnecessary details.

The performance of united set pair tolerance relation will change with the modification
of identity and discrepancy thresholds, and some representative existed extended rough
set models are its subschema., see Table 1.
14 X. Ji et al.

Table 1. The relationship between united set pair tolerance relation and some existed models

Thresholds Combination Performance Existed Rough Set Models

α+β1+β2+β3=1 reflexivity , symmetry Tolerance Relation

α+β2+β3=1,β1=0 reflexivity , transitivity Similarity Relation

β3=1 or α>0 reflexivity , symmetry Limited Tolerance Relation

α=1 reflexivity , symmetry , transitivity Equivalence Relation

4 Simulation Experiments

In this section, we select three test databases in UCI machine learning repository, Iris
and Zoo and cmc-data. Details see in Tbale2 .We get nine incomplete databases though
getting rid of data by random function, which are I-5%, I-10%, I-30%, Z-5%, Z-10%
,Z-30%,C-5%,C-10% and C-30%.

Table 2. Experiment Database

Database Instance Number Attribute Number


Iris 4 150
Zoo 101 17
cmc-data 1473 10

Let U be test database, E(xi) be the complete equivalent class of xi ,and R(xi) be the
tolerance class of xi under different extended rough set models. We do ten times ran-
dom experiments on every database ,and take their average as final result. The com-
parability( u ) of test result can be computed through formula(1) and formula(2).
|U|
| E(xi ) ∩ R(xi ) |
∑| E(x ) | + | R(x ) | − | E(x ) ∩ R(x ) |
i =1
u= i i i i (3)
|U |

10

∑µ t
(4)
µ= t =1

10

The bigger u , the higher comparability, the better classification effect.


An United Extended Rough Set Model Based on Developed Set Pair Analysis Method 15

Table 3. Comparability( u ) Comparison of Four Extended Rough Set Models

Tolerance Similarity Limited Tolerance United Set Pair


Database
Relation Relation Relation Tolerance Relation
Z-5% 0.821467 0.708342 0.862431 0.894354
Z-10% 0.809237 0.636242 0.811237 0.873016
Z-30% 0.462345 0.613456 0.473458 0.691325
I-5% 0.932402 0.883422 0.943679 0.988792
I-10% 0.899306 0.825033 0.917963 0.984485
I-30% 0.459135 0.524384 0.564023 0.899621
C-5% 0.563526 0.612432 0.642135 0.734956
C-10% 0.431256 0.516243 0.542251 0.660125
C-30% 0.339534 0.504135 0.462131 0.617243

(1) β1=β2

(2) β1≠β2

Fig. 3. The influence of identity and discrepancy to comparability ( u )


16 X. Ji et al.

We do two groups of experiment. Table3 gives experiment result comparison of


several extended rough set models. The u of every extended model decreases as the
degree of loss data increases, whatever in Zoo, Iris or cmc-data. While u decreases
degree of united set pair tolerance relation is lower than other three models. Meanwhile
united set pair tolerance relation has more steady performances on three databases. The
tolerance relation and limited tolerance relation are disagree with cmc-data, similarity
relation is disagree with Zoo. Totally speaking, united set pair tolerance relation has
better classification effects in nine situations of three databases.
According to the threshold determination difficulty, the second experiment try to
narrow down its range. In Figue1, identity degree α value range is[0.3,0.9] and

discrepancy degree β is separated into two situations which are β1=β2 and β1≠β2.
Figue1shows that when α is taken small value, like 0.1and 0.3,the classification per-
formance is very bad. When α arrives 0.9,the performance is unstable and decrease

somewhat. When α [0.5,0.7],the new extended model has a steady and good per-
formance. On all conditions, the classification performance of β1=β2 is better than that
of β1≠β2.In other words, the united set pair tolerance relation with symmetry has a
better performance in classification.

5 Conclusions

In this paper, we firstly develop set pair analysis method, give a developed mi-
cro-de-compound method of discrepancy degree, then propose united set pair tolerance
relation and give corresponding extended rough set model. The threshold α, β1 and β2
can adjust according to the incomplete degree of databases. So this model is more
objective, flexible and effective. The experiments on three databases of UCI, which are
Iris, Zoo and cmc-data, can clearly manifest the advantage of new methods in this
paper. But how to fast calculate the identity and discrepancy degree thresholds is still a
problem need to solve.

Acknowledgement

This work is supported by the National Natural Science Foundation of China


(NO.60273043), the Natural Science Foundation of Anhui(NO.050420204), the Edu-
cation Burea Natural Science Foundation of Anhui(NO.2006KJ098B) and High School
Surpass Talent Foundation of Anhui(NO.05025102).

References

1. Pawlak, Z.: Rough set. International Journal of Computer and Information Sciences 11(5),
341–356 (1982)
2. Grzymala Busse, J.W.: On the unknown attribute values in learning from examples. In: Raś,
Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS (LNAI), vol. 542, pp. 368–377. Springer,
Heidelberg (1991)
An United Extended Rough Set Model Based on Developed Set Pair Analysis Method 17

3. Stefanowski, J., Tsoukias: A incomplete information tables and rough classification. Com-
putational Intelligence 17, 545–566 (2001)
4. Kryszkiewicz, M.: Rough Set Approach to Incomplete Information System. Information
Sciences (S0020-0255) 112(1/4), 39–49 (1998)
5. Wang, G.-y.: The extension of rough set theory in incomplete information system. Com-
puter Research and Development 39(10), 1238–1243 (2002)
6. Sun, C.-m., Liu, D.-y., Sun, S.-y.: Research of rough set method oriented to incomplete
information system. Mini-micro Computer System 10(10), 1869–1873 (2007)
7. Zhao, K.-q.: Set pair and its prilimitary applications. Hangzhou Zhejiang Science Publishers
(2000)
8. Xu, Y., Li, L.-s., Li, X.-j.: Extended rough set model based on set pair power. Journal of
System Simulation 20(6), 1515–1522 (2008)
9. Stefanowski, J., Tsoukias, A.: On the Extension of Rough Sets under Incomplete Informa-
tion. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI),
vol. 1711, pp. 73–82. Springer, Heidelberg (1999)
10. Lei, Z., Lan, S.: Rough Set Model Based on New Set Pair Analysis. Fuzzy Systems and
Mathmatics (S1001-7402) 20(4), 111–116 (2006)
Learning Rules from Pairwise Comparison Table

Liping An1 and Lingyun Tong2


1
Business School, Nankai University, Tianjin, 300071, China
[email protected]
2
School of Management, Hebei University of Technology, Tianjin, 300130, China
[email protected]

Abstract. Multicriteria choice problems deal with a selection of a small subset


of best actions from a larger set and multicriteria ranking problems aim to ob-
tain an ordering of actions from a set from the best to the worst. In order to con-
struct a comprehensive decision-rule preference model for choice and ranking,
we propose the concepts of decision matrices and decision functions to generate
the minimal decision rules from a pairwise comparison table. Decision rules de-
rived can be used to obtain a recommendation in multicriteria choice and rank-
ing problems.

Keywords: Multiple criteria decision analysis, rough sets, choice, ranking,


decision rules.

1 Introduction
Multiple Criteria Decision Analysis (MCDA) aims at helping a decision maker (DM)
to prepare and make a decision where more than one point of view has to be consid-
ered. There are three major models used until now in MCDA: (1) the functional
model expressed in terms of a utility function within multiple attribute utility theory
[1]; (2)the relational model expressed in the form of an outranking relation [2] and a
fuzzy relation [3]; (3) the function-free model expressed in terms of symbolic forms,
like “if… then…” decision rules [6] or decision trees, or in a sub-symbolic form, like
artificial neural nets [4].
Both functional and relational models require that the DM gives some preference
information, such as importance weights, substitution ratios and various thresholds
on particular criteria, which is often quite difficult for DMs not acquainted with the
MCDA methodology [4]. According to Slovic [5], people make decisions by search-
ing for rules that provide good justification of their choices. The decision rule ap-
proach [6] follows the paradigm of artificial intelligence and inductive learning.
These decision rules are induced from preference information supplied by the DM in
terms of some decision examples. Therefore, the decision rules are intelligible and
speak the language of the DM [7].
Roy [8] has stated that the objective of an MCDA is to solve one of the following
five typologies of problems: classification, sorting, choice, ranking and description.
Classification concerns an assignment of a set of actions to a set of pre-defined
classes. The actions are described by a set of regular attributes and the classes are not

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 18–27, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Learning Rules from Pairwise Comparison Table 19

necessarily ordered. Sorting regards a set of actions evaluated by criteria, i.e.


attributes with preference-ordered domains. In this problem, the classes are also pref-
erence-ordered. Choice deals with a selection of a small subset of best actions, evalu-
ated by attributes and criteria, from a larger set. Ranking aims to obtain an ordering of
actions, evaluated by attributes and criteria, from a set from the best to the worst.
Description identifies the major distinguishing features of the actions and performs
their description based on these features.
Recently, a complete and well-axiomatized methodology for constructing decision
rule preference models from decision examples based on the rough sets theory [9] has
been proposed. Several attempts have already been made to use this theory to decision
analysis [10]. However, the classical rough set theory is based on the indiscernibility
relation that is not able to deal with preference-ordered attribute domains and decision
classes. Therefore, the use of classical rough sets has been limited to problems of
multiattribute classification and description only. For this reason, Greco et al. [6, 11]
have proposed an extension of the rough sets approach, called Dominance-based
Rough Set Approach (DRSA). DRSA is able to deal with sorting problems involving
both criteria and regular attributes. This innovation is mainly based on substitution of
the indiscernibility relation by a dominance relation in the rough approximation. In
order to construct a comprehensive preference model that could be used to support the
sorting task, Greco et al. [11] distinguish between qualitative attributes and quantita-
tive attributes. The rules are derived from rough approximations of decision classes.
They satisfy conditions of completeness and dominance, and manage with possible
inconsistencies in the set of examples.
In the case of multicriteria choice and ranking problems, the original version of
rough set theory has been extended by Greco et al. [7, 12, 13] in two ways: substitut-
ing the classical indiscernibility relation by a dominance relation; and substituting the
information table by a pairwise comparison table, which permits approximation of a
comprehensive preference relation in multicriteria choice and ranking problems.
In this paper, the decision matrix of a pairwise comparison table is defined to con-
struct the decision functions which are Boolean functions. The set of ‘‘if … then …’’
decision rules are decoded from prime implicants of the Boolean functions.

2 The Pairwise Comparison Table and Multigraded Dominance


In this section, we recall some basic concepts from [7, 12, 13] to be used in this paper.

2.1 The Pairwise Comparison Table

Let C be the set of criteria used for evaluation of actions from A. For any criterion
q∈C, let Tq be a finite set of binary relations defined on A on the basis of the evalua-
tions of actions from A with respect to the considered criterion q, such that for every
(x, y)∈A×A exactly one binary relation t∈Tq is verified.
The preferential information has the form of pairwise comparisons of reference ac-
tions from B⊆A, considered as exemplary decisions. The pairwise comparison table
(PCT) is defined as data table SPCT =(B, C∪{d}, TC∪Td, g), where B⊆B×B is a non-
empty set of exemplary pairwise comparisons of reference actions, TC= ∪ q∈C Tq , d is a
20 L. An and L. Tong

decision corresponding to the comprehensive pairwise comparison, and g:


B×(C∪{d})→TC∪Td is a total function such that g[(x, y), q]∈Tq for every (x, y)∈A×A
and for each q∈C, and g[(x, y), d]∈Td for every (x, y)∈B.
It follows that for any pair of reference actions (x, y)∈B there is verified one and
only one binary relation t∈Td. Thus, Td induces a partition of B. The data table SPCT
can be seen as decision table, since the set of considered criteria C and decision d are
distinguished.

2.2 The Definition of Multigraded Dominance

Let CO be the set of criteria expressing preferences on an ordinal scale, and CN, the set
of criteria expressing preferences on a quantitative scale or a numerical nonquantita-
tive scale, such that CO∪CN=C and CO∩CN=∅. Moreover, for each P⊆C, let PO be the
subset of P composed of criteria expressing preferences on an ordinal scale, i.e.
PO=P∩CO, and PN, the subset of P composed of criteria expressing preferences on a
quantitative scale or a numerical non-quantitative scale, i.e. PN=P∩CN. For each P⊆C,
we have P=PO∪PN and PO∩PN=∅. The following three situations are considered:
(1) P=PN and PO=∅.
The exemplary pairwise comparisons made by the DM can be represented in terms
of graded preference relations Pqh : for each q∈C and for every (x, y)∈A×A, Tq={ Pqh :
h∈Hq}, where Hq is a particular subset of the relative integers and

• x Pqh y, h>0, means that action x is preferred to action y by degree h with respect
to the criterion q,
• x Pqh y, h<0, means that action x is not preferred to action y by degree h with re-
spect to the criterion q,
• x Pq0 y means that action x is similar to action y with respect to the criterion q.

For each q∈C and for every (x, y)∈A×A, [x Pqh y, h>0]⇒[y Pqk x, k≤0] and [x Pqh y,
h<0]⇒[y Pqk x, k≥0].
Given PN⊆C (PN≠∅), (x, y), (w, z)∈A×A, the pair of actions (x, y) is said to domi-
nate (w, z), taking into account the criteria from PN, denoted by (x, y) D P N (w, z), if x
is preferred to y at least as strongly as w is preferred to z with respect to each q∈PN.
Precisely, “at least as strongly as” means “by at least the same degree”, i.e. hq≥kq,
where hq, kq∈Hq, x Pq q y and w Pq q z, for each q∈P.
h k

The upward cumulated preferences, denoted by Pqfh , and downward cumulated


preferences, denoted by Pqph , having the following interpretation:

• x Pqfh y means “x is preferred to y with respect to q by at least degree h”,


• x Pqph y means “x is preferred to y with respect to q by at most degree h”.
Learning Rules from Pairwise Comparison Table 21

Exact definition of the cumulated preferences, for each (x, y)∈A×A, q∈C and
h∈Hq, is the following:

• x Pqfh y if x Pqk y, where k∈Hq and k≥h,


• x Pqph y if x Pqk y, where k∈Hq and k≤h.

(2) P=PO and PN=∅.


In this paper, we modify the original definition of multigraded dominance with re-
spect to P=PO and PN=∅ [12] as follows:
Given (x, y), (w, z)∈A×A, the pair (x, y) is said to dominate the pair (w, z) with re-
spect to PO if, for each q∈PO, (1) cq(x) f cq(y) and cq(z) f cq(w), denoted by (x,
y) D{1q} (w, z), or (2) cq(x) f cq(w) and cq(z) f cq(y) for cq(x) f cq(y) and cq(w) f cq(z),
denoted by (x, y) D{2q} (w, z).

(3) PN≠∅ and PO≠∅.


Given (x, y), (w, z)∈A×A, the pair (x, y) is said to dominate the pair (w, z) with re-
spect to criteria from P, if (x, y) dominates (w, z) with respect to both PN and PO.

2.3 Lower and Upper Approximations

Given P⊆C and (x, y)∈A×A, the P-dominating set and the P-dominated set are de-
fined, respectively, as:
• a set of pairs of actions dominating (x, y), DP+ (x, y)={(w, z)∈A×A: (w, z)DP(x, y)},
• a set of pairs of actions dominated by (x, y), DP− (x, y)={(w, z)∈A×A: (x, y)DP(w, z)}.

In a PCT, the set Td is composed of two binary relations defined on A:


(1) x outranks y (denoted by xSy or (x, y)∈S), where (x, y)∈B and “x outranks y”
means “x is at least as good as y”,
(2) x does not outrank y (denoted by xSCy or (x, y)∈SC), where (x, y)∈B and
S∪SC=B.
The P-dominating sets and the P-dominated sets defined on B for all given pairs of
reference actions from B are “granules of knowledge” that can be used to express P-
lower and P-upper approximations of comprehensive outranking relations S and Sc,
respectively:
P (S)={(x, y)∈B: DP+ (x, y)⊆S}, P (S)= ∪ DP+ (x, y).
( x , y )∈S

P (S )={(x, y)∈B: D (x, y)⊆S }, P (S )= ∪
c
P
c c
DP− (x, y).
( x , y )∈S c

The P-boundaries (P-doubtful regions) of S and Sc are defined as


BnP(S)= P (S)− P (S), BnP(Sc)= P (Sc)− P (Sc).
From the above it follows that BnP(S)=BnP(Sc).
22 L. An and L. Tong

2.4 Decision Rules

Using the approximations of S and Sc based on the dominance relation defined above, it
is possible to induce a generalized description of the available preferential information
in terms of decision rules. The decision rules have in this case the following syntax:
(1) Df -decision rules
IF x Pqf1 h ( q1 ) y and x Pqf2h ( q2 ) y and… x Pqfeh ( qe ) y and cqe +1 (x) f rqe +1 and cqe +1 (x) p sqe +1
and … cq p (x) f rq p and cq p (x) p s q p , THEN xSy,
where P={ql, q2, …, qp}⊆C, PN={ql, q2, …, qe}, PO={qe+l, qe+2, …, qp}, (h(ql), h(q2),
…, h(qp))∈ H q1 × H q2 ×…× H q p and ( rqe +1 , …, rq p ), ( sqe +1 , …, s q p )∈ C qe +1 ×…× C q p .
These rules are supported by pairs of actions from the P-lower approximation of S
only.
(2) Dp -decision rules
IF x Pqp1 h ( q1 ) y and x Pqp2h ( q2 ) y and… x Pqpeh ( qe ) y and cqe +1 (x) p rqe +1 and cqe +1 (x) f sqe +1
and … cq p (x) f rq p and cq p (x) p s q p , THEN xScy,
where P={ql, q2, …, qp}⊆C, PN={ql, q2, …, qe}, PO={qe+l, qe+2, …, qp}, (h(ql), h(q2),
…, h(qp))∈ H q1 × H q2 ×…× H q p and ( rqe +1 , …, rq p ), ( sq e +1 , …,

sq p )∈ Cqe +1 ×…× Cq p .
These rules are supported by pairs of actions from the P-lower approximation of Sc
only.
(3) Df p -decision rules
ph ( qe+1 )
IF x Pqf1 h ( q1 ) y and x Pqf2h ( q2 ) y and… x Pqfeh ( qe ) y and x Pqe+1 y and x Pqpeh+ 2( qe+ 2 ) y and…
ph ( q f )
x Pq f y cq f +1 (x) f rq f +1 and cq f +1 (y) p sq f +1 and … cqg (x) f rq g and cqg (x) p sqg
and cq g +1 (x) p rqg +1 and cq g +1 (y) f s qg +1 and … cq p (x) p rq p and cq p (x) f s q p , THEN
xSy or xScy,
where O′={ql, q2, …, qe}⊆C, O″={qe+l, qe+2, …, qf}⊆C, PN=O′∪O″, O′ and O″ not
necessarily disjoint, PO={qf+l, qf+2, …, qf}, (h(ql), h(q2), …,
h(qf))∈ H q1 × H q2 ×…× H q f and ( rq f +1 , …, rq p ), ( sq f +1 , …, s q p )∈ C q f +1 ×…× C q p .
These rules are supported by pairs of actions from the P-boundary of S and Sc only.
Applying the decision rules induced from a given SPCT, a final recommendation for
choice or ranking can be obtained upon a suitable exploitation of this structure [12].

3 Decision Matrix and Decision Functions of a PCT


In this section, by analogy with the indiscernibility matrix and its Boolean functions
[14] to generate all minimal decision rules in classical rough set theory, we propose
Learning Rules from Pairwise Comparison Table 23

the concepts of decision matrices and decision functions to generate the minimal
decision rules.

Definition 1. Let SPCT =(B, C∪{d}, TC∪Td, g) be a pairwise comparison table,


h k
B=S∪SC, (x, y)∈S, (w, z)∈SC, pi∈PN, qj∈PO. x Ppi pi y and w Ppi pi z, for each pi∈PN, h pi ,
k pi ∈ H pi . The decision matrix of S in SPCT, denoted by M(S), is defined as:

M(S)=m[(x, y), (w, z)] S × SC


,

where m[(x, y), (w, z)]={pi, qj: (x, y) D{ pi } (w, z), h pi > k pi , (x, y) D{1q j } (w, z) and

cq j (x)≠ cq j (y) or cq j (w)≠ cq j (z), (x, y) D{2q j } (w, z) and cq j (x)≠ cq j (w) or
cq j (y)≠ cq j (z)}.

Definition 2. Let M(S) be the decision matrix of S in SPCT. The decision function of
(x, y) with respect to M(S), is defined as:

fS[(x, y)]= ∧ { (∨ pi* ) ∨ (∨ q *j ) : pi, qj∈m[(x, y), (w, z)] and m[(x, y), (w, z)]≠∅}.
( w, z ) i j

where pi* and q *j are corresponding to the attribute pi and qj, respectively.
The decision function fS[(x, y)] is a Boolean function that expresses how a pair (x,
y)∈S can be discerned from all of the pairs (w, z)∈SC. Turning fS[(x, y)] into disjunc-
tive normal form, the prime implicants of fS[(x, y)] reveal the minimal subsets of P
that are needed to discern the pair (x, y) from the pair in SC and correspond to the
minimal Df -decision rules.
Similarly, we can define the decision matrix of SC, M(SC), and the decision func-
tion with respect to M(SC).
An example. Let us consider the example used in [6]. The students are evaluated
according to the level in Mathes, Phys and Lit. Marks are given on a scale from 0 to
20. Three students presented in Table 1 are considered.

Table 1. Students’ evaluation table

Student Maths Phys Lit Comprehensive valuation


a 18 16 10 13.9
b 10 12 18 13.6
c 14 15 15 14.9

DM’s preference-order of the students is represented in terms of pairwise evalua-


tions in Table 2.
The relation (x, y)D{Maths}(w, z) is true if and only if
(1) cMaths(x) f cMaths(y) and cMaths(z) f cMaths(w), or (2) cMaths(x) f cMaths(w) and
cMaths(z) f cMaths(y), where c q j (x) f cq j (y) and cq j (w) f cq j (z).
24 L. An and L. Tong

Table 2. Pairwise comparison table

Comprehensive
Pair of students Maths Phys Lit
outranking relation
(a, a) 18, 18 16, 16 10, 10 S
(a, b) 18, 10 16, 12 10, 18 S
(a, c) 18, 14 16, 15 10, 15 Sc
(b, a) 10, 18 12, 16 18, 10 Sc
(b, b) 10, 10 12, 12 18, 18 S
(b, c) 10, 14 12, 15 18, 15 Sc
(c, a) 14, 18 15, 16 15, 10 S
(c, b) 14, 10 15, 12 15, 18 S
(c, c) 14, 14 15, 15 15, 15 S

The analogous partial preorders can be induced using dominance relations on Phys
and on Lit. The entities to compute the lower and upper approximations of S are
shown in Table 3.

Table 3. The entities to compute the lower and upper approximations of S

Pair of
D{+Maths} (x, y) D{+Physics } (x, y) D{+Literature} (x, y) D P+ (x, y)
students
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c), (b, (a, a), (b, a), (b, b), (a, a), (b,
(a, a)
(b, b), (c, b), (c, c) b), (c, b), (c, c) (b, c), (c, a), (c, c) b), (c, c)
(a, a), (a, b), (a, c),
(a, b) (a, b) (a, b) (b, a), (b, b), (b, c), (a, b)
(c, a), (c, b), (c, c)
(a, a), (a, b), (a, c),
(a, c) (a, b), (a, c) (a, b), (a, c) (b, a), (b, b), (b, c), (a, c)
(c, a), (c, b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(b, a) (b, a), (b, b), (b, c), (b, a), (b, b), (b, c), (b, a) (b, a)
(c, a), (c, b), (c, c) (c, a), (c, b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c), (a, a), (b, a), (b, b), (a, a), (b,
(b, b)
(b, b), (c, b), (c, c) (b, b), (c, b), (c, c) (b, c), (c, a), (c, c) b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(b, c) (b, b), (b, c), (c, b), (b, b), (b, c), (c, b), (b, a), (b, c) (b, c)
(c, c) (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(c, a) (b, b), (c, a), (c, b), (b, b), (c, a), (c, b), (b, a), (c, a) (c, a)
(c, c) (c, c)
(a, a), (b, a), (b, b),
(c, b) (a, b), (c, b) (a, b), (c, b) (b, c), (c, a), (c, b), (c, b)
(c, c)
(a, a), (a, b), (a, c), (b, (a, a), (a, b), (a, c), (a, a), (b, a), (b, b), (a, a), (b,
(c, c)
b), (c, b), (c, c) (b, b), (c, b), (c, c) (b, c), (c, a), (c, c) b), (c, c)

P (S)={(a, a), (a, b), (b, b), (c, a), (c, b), (c, c)},
P (S)={(a, a), (a, b), (b, b), (c, a), (c, b), (c, c)},
BnP(S)=∅.
Learning Rules from Pairwise Comparison Table 25

The entities to compute the lower and upper approximations of SC are shown in
Table 4.

Table 4. The entities to compute the lower and upper approximations of SC

Pair of
D{−Maths} (x, y) D{−Physics } (x, y) D{−Literature} (x, y) D P− (x, y)
students
(a, a), (b, a), (b, b), (a, a), (b, a), (b, b), (a, a), (a, b), (a, c), (a, a), (b,
(a, a)
(b, c), (c, a), (c, c) (b, c), (c, a), (c, c) (b, b), (c, b), (c, c) b), (c, c)
(a, a), (a, b), (a, c), (a, a), (a, b), (a, c),
(a, b) (b, a), (b, b), (b, c), (b, a), (b, b), (b, c), (a, b) (a, b)
(c, a), (c, b), (c, c) (c, a), (c, b), (c, c)
(a, a), (a, c), (b, a), (a, a), (a, c), (b, a),
(a, c) (b, b), (b, c), (c, a), (b, b), (b, c), (c, a), (a, b), (a, c) (a, c)
(c, c) (c, c)
(a, a), (a, b), (a, c),
(b, a) (b, a) (b, a) (b, a), (b, b), (b, c), (b, a)
(c, a), (c, b), (c, c)
(a, a), (b, a), (b, b), (a, a), (b, a), (b, b), (a, a), (a, b), (a, c), (a, a), (b,
(b, b)
(b, c), (c, c) (b, c), (c, c) (b, b), (c, b), (c, c) b), (c, c)
(a, a), (a, b), (a, c),
(b, c) (b, a), (b, c) (b, a), (b, c) (b, b), (b, c), (c, b), (b, c)
(c, c)
(a, a), (a, b), (a, c),
(c, a) (b, a), (c, a) (b, a), (c, a) (b, b), (c, a), (c, b), (c, a)
(c, c)
(b, a), (b, b), (b, c), (b, a), (b, b), (b, c),
(c, b) (a, b), (c, b) (c, b)
(c, a), (c, b), (c, c) (c, a), (c, b), (c, c)
(a, a), (b, a), (b, b), (a, a), (b, a), (b, b), (a, a), (a, b), (a, c), (a, a), (b,
(c, c)
(b, c), (c, a), (c, c) (b, c), (c, a), (c, c) (b, b), (c, b), (c, c) b), (c, c)

P (Sc)={(a, c), (b, a), (b, c)}, P (Sc)={(a, c), (b, a), (b, c)}, BnP(S)=∅.
The decision matrix of S in SPCT,
( a, c ) (b, a) (b, c)
( a, a ) Lit Maths, Phys Maths, Phys
( a, b) Maths, Phys Maths, Phys Maths, Phys
M(S)= (b, b) Lit Maths, Phys Maths, Phys
( c, a ) Lit Maths, Phys ∅
( c, b ) ∅ Maths, Phys Maths, Phys
(c , c ) Lit Maths, Phys Maths, Phys

The decision functions of (x, y) with respect to M(S):

fS[(a, a)]=Lit∧(Maths∨Phys)∧(Maths∨Phys)=(Lit∧Maths)∨(Lit∧Phys), similarly,


fS[(a, b)]=Maths∨Phys,
fS[(b, b)]=(Lit∧Maths)∨(Lit∧Phys),
fS[(c, a)]=(Lit∧Maths)∨(Lit∧Phys),
26 L. An and L. Tong

fS[(c, b)]=Maths∨Phys,
fS[(c, c)]=(Lit∧Maths)∨(Lit∧Phys).
Then, the following minimal Df -decision rules can be obtained:
IF (cMaths(x) f 18 and cMaths(y) p 18) and (cLit(x) f 10 and cLit(y) p 10) THEN xSy,
(a, a),
IF (cPhys(x) f 16 and cPhys(y) p 16) and (cLit(x) f 10 and cLit(y) p 10) THEN xSy, (a, a),
IF cMaths(x) f 18 and cMaths(y) p 10 THEN xSy, (a, b),
IF cPhys(x) f 16 and cPhys(y) p 12 THEN xSy, (a, b),
IF (cMaths(x) f 10 and cMaths(y) p 10) and (cLit(x) f 18 and cLit(y) p 18) THEN xSy,
(b, b),
IF (cPhys(x) f 12 and cPhys(y) p 12) and (cLit(x) f 18 and cLit(y) p 18) THEN xSy, (b, b),
IF (cMaths(x) f 14 and cMaths(y) p 18) and (cLit(x) f 15 and cLit(y) p 10) THEN xSy,
(c, a),
IF (cPhys(x) f 15 and cPhys(y) p 16) and (cLit(x) f 15 and cLit(y) p 10) THEN xSy, (c, a),
IF cMaths(x) f 14 and cMaths(y) p 10 THEN xSy, (c, b),
IF cPhys(x) f 15 and cPhys(y) p 12 THEN xSy, (c, b),
IF (cMaths(x) f 14 and cMaths(y) p 14) and (cLit(x) f 15 and cLit(y) p 15) THEN xSy,
(c, c),
IF (cPhys(x) f 15 and cPhys(y) p 15) and (cLit(x) f 15 and cLit(y) p 15) THEN xSy, (c, c).
Similarly, we can obtain the Dp -decision rules as follows.
IF (cMaths(x) p 18 and cMaths(y) f 14) and (cLit(x) p 10 and cLit(y) f 15) THEN xSCy,
(a, c),
IF (cPhys(x) p 16 and cPhys(y) f 15) and (cLit(x) p 10 and cLit(y) f 15) THEN xSCy, (a, c),
IF cMaths(x) p 10 and cMaths(y) f 18 THEN xSCy, (b, a),
IF cPhys(x) p 18 and cPhys(y) f 10 THEN xSCy, (b, a),
IF cMaths(x) p 10 and cMaths(y) f 14 THEN xSCy, (c, b),
IF cPhys(x) p 18 and cPhys(y) f 15 THEN xSCy, (c, b).

4 Conclusions
Learning decision rules from preference-ordered data differs from usual machine learn-
ing, since the former involves preference orders in domains of attributes and in the set of
decision classes. This requires that a knowledge discovery method applied to prefer-
ence-ordered data respects the dominance principle, which is addressed in the Domi-
nance-Based Rough Set Approach. This approach enables us to apply a rough set
approach to multicriteria choice and ranking. In this paper, we propose the concepts of
decision matrices and decision functions to generate the minimal decision rules from a
pairwise comparison table. Then, the decision rules can be used to obtain a recommen-
dation in multicriteria choice and ranking problems. We will further present some
extensions of the approach that make it a useful tool for other practical applications.
Learning Rules from Pairwise Comparison Table 27

Acknowledgement. It is a project supported by National Natural Science Foundation


of China (No. 70601013).

References
1. Keeney, R.L., Raiffa, H.: Decision with Multiple Objectives-Preferences and Value Trade-
offs. Wiley, New York (1976)
2. Roy, B.: The Outranking Approach and the Foundation of ELECTRE Methods. Theory
and Decision 31(1), 49–73 (1991)
3. Fodor, J., Roubens, M.: Fuzzy Preference Modelling and Multicriteria Decision Support.
Kluwer, Dordrecht (1994)
4. Zopounidis, C., Doumpos, M.: Multicriteria Classification and Sorting Methods: A litera-
ture Review. European Journal of Operational Research 138(2), 229–246 (2002)
5. Slovic, P.: Choice between Equally-valued Alternatives. Journal of Experimental Psychol-
ogy: Human Perception Performance 1, 280–287 (1975)
6. Greco, S., Matarazzo, B., Slowinski, R.: Rough Set Theory for Multicriteria Decision
Analysis. European Journal of Operational Research 129(1), 1–47 (2001)
7. Fortemps, P., Greco, S., Slowinski, R.: Multicriteria Decision Support Using Rules That
Represent Rough-Graded Preference Relations. European Journal of Operational Re-
search 188(1), 206–223 (2008)
8. Roy, B.: Méthodologie multicritère d’aide à la décision. Economica, Paris (1985)
9. Pawlak, Z.: Rough Sets. International Journal of Computer and Information
Sciences 11(5), 341–356 (1982)
10. Pawlak, Z., Slowinski, R.: Rough Set Approach to Multi-attribute Decision Analysis.
European Journal of Operational Research 72(3), 443–459 (1994)
11. Greco, S., Matarazzo, B., Slowinski, R.: Rough Sets Methodology for Sorting Problems in
Presence of Multiple Attributes and Criteria. European Journal of Operational Re-
search 138(2), 247–259 (2002)
12. Greco, S., Matarazzo, B., Slowinski, R.: Extension of the Rough Set Approach to Multicri-
teria Decision Support. INFOR 38(3), 161–193 (2000)
13. Greco, S., Matarazzo, B., Slowinski, R.: Rough Approximation of a Preference Relation by
Dominance Relations. European Journal of Operational Research 117(1), 63–83 (1999)
14. Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Table.
In: Intelligent Decision Support: Handbook of Applications and Advances of the Rough
Set Theory, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1991)
A Novel Hybrid Particle Swarm Optimization
for Multi-Objective Problems

Siwei Jiang and Zhihua Cai

School of Computer Science, China University of Geosciences, Wuhan 430074, China


[email protected],[email protected]

Abstract. To solve the multi-objective problems, a novel hybrid parti-


cle swarm optimization algorithm is proposed(called HPSODE). The new
algorithm includes three major improvement: (I)Population initialization
is constructed by statistical method Uniform Design, (II)Regeneration
method has two phases: the first phase is particles updated by adap-
tive PSO model with constriction factor χ, the second phase is Differ-
ential Evolution operator with archive, (III)A new accept rule called
Distance/volume fitness is designed to update archive. Experiment on
ZDTx and DTLZx problems by jMetal 2.1, the results show that the new
hybrid algorithm significant outperforms OMOPSO, SMPSO in terms
of additive Epsilon, HyperVolume, Genetic Distance, Inverted Genetic
Distance.

Keywords: Multi-Objective Optimization, Uniform Design, Particle


Swam OPtimization, Differential Evolution, Minimum Reduce Hyper-
volume, Spread.

1 Introduction
Multi-objective Evolutionary Algorithms(MOEAs) are powerful tools to solve
multi-Objective optimize problems(MOPs), it gains popularity in recent years
[1]. Recently, some elitism algorithms are proposed as: NSGA-II [2], SPEA2 [3],
GDE3 [4], MOPSO [5, 6, 7].
NSGA-II adopt a fast non-dominated sorting approach to reduce computer
burden, it uses Ranking and Crowding Distance to choose the candidate solutions
[2]. SPEA2 proposes a fitness assignment with cluster technique, it designs a
truncation operator based on the nearest Neighbor Density Estimation metric [3].
GDE3 is a developed version of Differential Evolution, which is suited for global
optimization with an arbitrary number of objectives and constraints [4].
MOPSO is proposed by Coello et al, it adopts swarm intelligence to optimize
MOPs, and it uses the Pareto-optimal set to guide the particles flight [5]. Sierra
adopt Crowding Distance to filter the leaders, the different mutation methods
are acted on divisions particles [6]; Mostaghim introduces a new Sigam-method
to find local best information to guide particle [7].

The Project was supported by the Research Foundation for Outstanding Young
Teachers, China University of Geosciences(Wuhan)( No:CUGQNL0911).

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 28–37, 2009.

c Springer-Verlag Berlin Heidelberg 2009
A Novel Hybrid Particle Swarm Optimization 29

Population initialization is an important part in EAs, but it has been long


ignored. Orthogonal design and Uniform Design belong to a sophisticated branch
of statistics [9,10]. Zeng and Cai adopt orthogonal design to solve MOPs, results
show that it is more powerful than the Random Design [11, 12]. Leung utilizes
the uniform design to initial population and designs a new crossover operator,
which can find the Pareto-optimal solutions scattered uniformly [13].
Zhang proposes a hybrid PSO with differential evolution operator to solve the
single objective problem, which provide the bell-shaped mutation to guarantee
the evolutionary population diversity [8].
Interested in the hybrid PSO with differential evolutionary operator, we pro-
pose a hybrid particle swarm optimization called HPSODE: the first population
is constructed by uniform design; offspring regeneration method has two phases,
in first phase the particles are updated according to its own experience(pbest )
and social information(gbest ) with constriction factor χ, in second phase the
archive is operated by Differential Evolution; a new accept rule is designed as
Distance/Volume function to update the archive.
The rest of the paper is organized as follows. In section 2, we briefly introduce
the uniform design. In section 3, we describe the two phases of regeneration
method. In section 4, we design a new archive update rule. In section 5, Ex-
periment on bio-objective and tri-objective has show that the new algorithm
is powerful than MOPSO, SMPSO in terms of additive Epsilon, hypervolume,
Genetic Distance, Inverted Genetic Distance. In section 6, we make conclusions
and discuss the future research on MOEAs.

2 Population Initialization: Uniform Design


Population initialization has long been ignored in MOEAs, but it is a very im-
portant component for MOEAs. Orthogonal Design and Uniform Design are
experimental design method, which belong to a sophisticated branch of statis-
tics. Both of them can get better distribute population in feasible searching space
than Random Design, then the statistical population provide more information
to generate next offspring.
In this section, we briefly describe an experimental design method called
uniform design. We define the uniform array as UR (C), where Q is the level,
it’s primer, R, C represent the row and column of uniform array, they must be
satisfid with: 
R=Q>n
(1)
C=n
where n is the number of variables. When select a proper parameters of Q, σ
form table 1, uniform array can be created by Equation 2

Ui,j = (i ∗ σ j−1 mod Q) + 1 (2)

For one decision variable Xj with the boundary [lj , uj ], then the quantize techni-
cal divide the domain into Q levels αj1 , αj2 , · · · , αjQ , where the design parameter
30 S. Jiang and Z. Cai

Table 1. Parameter σ for different number of factors and levels

number of levels of per factor number of factors σ


5 2-4 2
7 2-6 3
11 2-10 7
13 2 5
3 4
4-12 6
17 2-16 10
19 2-3 8
4-18 14
23 2,13-14,20-22 7
8-12 15
3-7,15-19 17
29 2 12
3 9
4-7 16
8-12,16-24 8
13-15 14
25-28 18
31 2,5-12,20-30 12
3-4,13-19 22

Q is primer and αi is given by


u j − lj
αjk = lj + (k − 1)( ), 1 ≤ k ≤ Q (3)
Q−1
In other words, the domain [lj , uj ] is quantized Q − 1 fractions, and any two
successive levels are same as each other.

3 Offspring Regeneration Method


In this paper, we propose a hybrid Particle Swarm Optimization algorithm called
HPSODE, which has three populations: the first population is the original par-
ticles, the second population is local best particles which store only one evolu-
tionary step according to original particles, the third population is global best
particle which is utilized as archive. The regeneration method has two phases,
original particle with adaptive PSO update, archive with differential evolution
operator.
In first phase, the particle’s new position is updated by its own experience
(pbest ) and social information(gbest ) with constriction factor χ:

Vid (t + 1) = χ[ωVid (t) + C1 ϕ1 (pbid (t) − Xid (t)) + C2 ϕ2 (gbid (t) − Xid (t))]
Xid (t + 1) = Xid (t) + Vid (t + 1)
(4)
where χ = √2 with ϕ = C1 + C2 , when ϕ ≤ 4, χ = 1.0. And ω = 0.1,
2−ϕ− ϕ2 −4ϕ
C1 , C2 ∈ rand[1.5, 2.5],ϕ1 , ϕ2 ∈ rand[0.0, 1.0].
The global best particle is binary tournament choose from archive which has
larger crowding distance, it forces the original particles to explore the sparse
A Novel Hybrid Particle Swarm Optimization 31

space. If the new particle position is non-dominated by original particle, replace


the original particle and local best particle, Xi (t) = Xi (t + 1), pbi(t) = Xi (t + 1),
then add the new particle to archive; otherwise discard the new particle and
unchange the local best.
In second phase, the third population is update by differential evolution op-
erator: ⎧
⎪ Xid (t + 1) = Xr1 d (t) + F (Xr2 d (t) − Xr3 d (t))

if rndd (0, 1) < CR||d == drnd (5)


Xid (t + 1) = Xr1 d (t) otherwise
where d is the dimension of solution, the three rand solutions are choose from
archive and r1 = r2 = r3 = i. where F = 0.5, CR = 0.1. If the new solution is
non-dominated by the archive’s solutions, then add it to archive.
When the number of archive is small, the guide information is rare, it will
lead the original particle to assemble to few best positions, and then the final
Pareto-optimal set will get worse diversity. Differential evolution operation is
simple to implement and it has powerful capacity to scatter the feasible space.
The DE operation on archive is useful to enhance the diversity of archive, and
provide good guide information to the original particle.

4 Archive Update Rule

When MOEAs get a set of equal good solutions full to the setting size(usually
archiveSize = 100), an accept rule must be designed to decide which one should
be cut off from archive. It’s a critical issue in MOEAs which directly influence
the quality of finally optimal set in convergence and spread metric. Some popular
accept rules have been presented: NSGA-II adopts the Ranking and Crowding
Distance metric, SPEA2 uses the nearest Neighbor Density Estimation metric.
In this paper, we design a new accept rule called Minimum Reduce Hypervol-
ume. Hypervolume is a quality indicator proposed by Zitzler et al, it is adopted
in jMetal 2.1 [14]. Hypervolume calculates the volume covered by members of a
non-dominated set of solutions (the region enclosed into the discontinuous line
respect the worst point W in the figure 1 is ADBECW in dashed line).
If two solutions D and E are non-dominated each other, NSGA-II choose the
solution D remained in archive if CD(D) > CD(E), it maintains the spread
along the Pareto-front.

CD(D) = AD + D B
(6)
CD(E) = BE  + E  C

If one solution is deleted, it will lead to a hypervolume decrease, because the


higher hypervolume means the better quality of optimal set, we will delete the
solution which reduce the hypervolume minimum. MRV chooses the solution E
remained in archive if hv(D) < hv(E)(then the hypervolume is ABECW rather
32 S. Jiang and Z. Cai

D1
D’
CD(D)=AD’+D’B hv(D)=DD1*DD2

D D D2

E1
E’
CD(E)=BE’+E’C hv(E)=EE1*EE2

E E2
E

Fig. 1. The comparison of Crowding Distance and MRV is described in the left and
right figure

than ABDCW ), it maintains the hyervolume along the Pareto-front.



hv(D) = DD1 ∗ DD2
(7)
hv(E) = EE1 ∗ EE2
Crowding Distance maintains the spread and expands the solution to feasible
searching place, and Minimum Reduce Hypervolume maintains the hypevolume
and forces the solution near to Pareto-front. We combine the two properties, a
new fitness assignment for solution s is designed as follows(called
Distance/Volume fitness):


⎨ DV (s) = CD(s) + scale ∗ hv(s)
n
CD(si ) (8)

⎩ scale = i=1
n
i=1 hv(si )

The factor scale is designed to equal the influence of crowding distance and
MRV, but if it is too large, we set it is to 1000 when scale > 1000.

5 Experiment Results
Experiment is based on jMetal 2.1 [14], which is a Java-based framework aimed
at facilitating the development of metaheuristics for solving MOPs, it provides
large block reusing code and fair comparison for different MOEAs. The paper
selects the algorithm OMOPSO [6], SMPSO [14] as the compare objectives.
Each algorithm independent runs for 100 times and maximum evolution times
is 25, 000. The test problems are choose from ZDTx, DTLZx problem family.
The performance metrics have five categories:
1
Unary additive epsilon indicator(I+ ). The metric indicator of an approx-
1
imation set A (I+ (A)) gives the minimum factor  by which each point in the
A Novel Hybrid Particle Swarm Optimization 33

real front can be added such that the resulting transformed approximation set
is dominated by A:
1
I+ (A) = inf∈R{∀z 2 ∈ R\∃z 1 ∈ A : zi2 ≤ zi1 + ∀i} (9)

Hypervolume. This quality indicator calculates the volume (in the objective
space) covered by members of a nondominated set of solutions with a reference
point.The hypervolume (HV) is calculated by:
|Q|

HV = volume( vi ) (10)
i=1

Inverted Generational Distance. The metric is to measure how far the el-
ements are in the Pareto-optimal set from those in the set of non-dominated
vectors found. It is defined as:

|N | 2
i=1 di
IGD = (11)
N

Generational Distance. The metric is to measure how far the elements are in
the set of non-dominated vectors found from those in the Pareto-optimal set. It
is defined as: 
|n| 2
i=1 di
GD = (12)
n
Spread. The Spread indicator is a diversity metric that measures the extent of
spread achieved among the obtained solutions. This metric is defined as:
n−1
df + dl + i=1 |di − d|
∆= (13)
df + dl + (n − 1)d
1
The higher Hypervolume and lower I+ ,GD, IGD, Spread mean the better al-
gorithm. The results are compared with median and interquartile range(IQR)
which measures of location (or central tendency) and statistical dispersion, re-
spectively, the best result is grey background.
From table 2, in term of median and IQR for additive epsilon metric, HPSODE
get best results in all of the 12 MOPs.
From table 3, in term of median and IQR for HyperVolume metric, HPSODE
get best results in all of the 12 MOPs.
From table 4, in term of median and IQR for Genetic Distance metric, HP-
SODE get best results in 11 MOPs, only worse in DTLZ6. OMOPSO get best
results only in 1 MOPs: DTLZ6.
From table 5, in term of median and IQR for Inverted Genetic Distance metric,
HPSODE get best results in 10 MOPs, only worse in ZDT2, DTLZ6. OMOPSO
get best results only in 1 MOPs: ZDT2. SMOPSO get best results only in 1
MOPs: DTLZ6.
34 S. Jiang and Z. Cai

1
Table 2. Unary additive epsilon indicator(I+ ) Median and IQR

OMOPSO SMPSO HPSODE


ZDT1 6.15e − 033.7e−04 5.63e − 033.4e−04 5.33e − 032.0e−04
ZDT2 5.93e − 034.7e−04 5.49e − 033.0e−04 5.27e − 032.2e−04
ZDT3 6.87e − 032.1e−03 5.85e − 031.0e−03 4.56e − 036.4e−04
ZDT4 5.39e + 004.7e+00 6.43e − 035.7e−04 5.40e − 032.2e−04
ZDT6 4.92e − 035.9e−04 4.72e − 033.9e−04 4.45e − 033.1e−04
DTLZ1 1.09e + 013.2e+00 5.65e − 027.8e−03 4.68e − 026.5e−03
DTLZ2 1.29e − 012.0e−02 1.40e − 011.9e−02 1.22e − 012.1e−02
DTLZ3 1.03e + 023.6e+01 1.61e − 011.3e−01 1.19e − 011.9e−02
DTLZ4 1.12e − 012.3e−02 1.20e − 013.8e−02 1.03e − 012.2e−02
DTLZ5 5.14e − 036.6e−04 4.87e − 035.5e−04 4.36e − 032.8e−04
DTLZ6 4.35e − 035.2e−04 4.35e − 033.9e−04 4.26e − 033.9e−04
DTLZ7 1.95e − 017.6e−02 1.71e − 016.9e−02 1.25e − 013.6e−02

Table 3. Hypervolume Median and IQR

OMOPSO SMPSO HPSODE


ZDT1 6.611e − 014.4e−04 6.618e − 011.2e−04 6.621e − 011.7e−05
ZDT2 3.280e − 014.0e−04 3.285e − 011.1e−04 3.288e − 011.7e−05
ZDT3 5.140e − 011.1e−03 5.154e − 014.2e−04 5.160e − 011.2e−05
ZDT4 0.000e + 000.0e+00 6.613e − 012.3e−04 6.621e − 012.3e−05
ZDT6 4.013e − 011.1e−04 4.012e − 011.1e−04 4.015e − 011.5e−05
DTLZ1 0.000e + 000.0e+00 7.372e − 019.0e−03 7.688e − 017.5e−03
DTLZ2 3.537e − 017.1e−03 3.463e − 011.0e−02 3.821e − 015.1e−03
DTLZ3 0.000e + 000.0e+00 3.406e − 014.3e−02 3.864e − 015.8e−03
DTLZ4 3.591e − 019.7e−03 3.577e − 011.6e−02 3.839e − 016.3e−03
DTLZ5 9.344e − 021.6e−04 9.366e − 021.6e−04 9.407e − 024.4e−05
DTLZ6 9.493e − 025.5e−05 9.488e − 026.4e−05 9.496e − 024.1e−05
DTLZ7 2.610e − 019.9e−03 2.746e − 017.4e−03 2.916e − 013.7e−03

From table 6, in term of median and IQR for Spread metric, HPSODE get
best results in all of the 6 MOPs. OMOPSO get best results only in 2 MOPs.
SMOPSO get best results only in 4 MOPs.
The comparison for three algorithms in five categories: epsilon indicator, hy-
pervolume, Generation Distance, Inverted genetic Distance, Spread, it shows
that the new algorithm is more efficient to solve the MOPs. Now, we summarize
the highlight as follows:

1. HPSODE adopt the statistical method Uniform Design to construct the first
population, which can get well distributed solutions in feasible space.
2. HPSODE combine the PSO and Differential Evolution operation to gener-
ate next population. DE operation enhances the diversity for global guide
population.
A Novel Hybrid Particle Swarm Optimization 35

Table 4. Generational Distance Median and IQR

OMOPSO SMPSO HPSODE


ZDT1 1.43e − 043.8e−05 1.24e − 044.9e−05 7.41e − 053.5e−05
ZDT2 7.68e − 052.6e−05 5.27e − 055.4e−06 4.66e − 052.8e−06
ZDT3 2.25e − 043.3e−05 1.88e − 042.3e−05 1.76e − 041.4e−05
ZDT4 6.57e − 015.8e−01 1.35e − 044.6e−05 7.67e − 054.2e−05
ZDT6 2.19e − 024.4e−02 2.18e − 034.3e−02 5.37e − 042.0e−05
DTLZ1 1.00e + 011.3e+00 5.06e − 031.4e−03 7.00e − 041.9e−04
DTLZ2 3.32e − 034.1e−04 3.94e − 039.2e−04 1.25e − 034.6e−04
DTLZ3 3.51e + 016.0e+00 5.29e − 035.7e−02 1.17e − 031.8e−04
DTLZ4 5.83e − 035.0e−04 5.98e − 036.4e−04 4.95e − 033.9e−04
DTLZ5 3.00e − 044.3e−05 2.85e − 045.6e−05 2.50e − 043.3e−05
DTLZ6 5.62e − 043.0e−05 5.69e − 042.6e−05 5.67e − 042.6e−05
DTLZ7 5.26e − 031.2e−03 4.84e − 031.3e−03 3.57e − 031.1e−03

Table 5. Inverted Generational Distance Median and IQR

OMOPSO SMPSO HPSODE


ZDT1 1.37e − 042.3e−06 1.35e − 041.0e−06 1.34e − 041.2e−06
ZDT2 1.43e − 043.2e−06 1.40e − 041.9e−06 1.44e − 042.9e−06
ZDT3 2.14e − 041.4e−05 2.00e − 047.2e−06 1.90e − 042.7e−06
ZDT4 1.61e − 011.5e−01 1.38e − 041.4e−06 1.34e − 041.5e−06
ZDT6 1.20e − 047.9e−06 1.18e − 047.9e−06 1.10e − 047.0e−06
DTLZ1 2.71e − 011.0e−01 6.38e − 044.5e−05 5.27e − 043.5e−05
DTLZ2 7.58e − 044.6e−05 8.03e − 045.1e−05 7.04e − 044.1e−05
DTLZ3 2.03e + 006.7e−01 1.43e − 037.8e−04 1.11e − 036.8e−05
DTLZ4 1.23e − 031.6e−04 1.17e − 032.1e−04 1.17e − 031.5e−04
DTLZ5 1.49e − 055.0e−07 1.44e − 056.8e−07 1.41e − 053.8e−07
DTLZ6 3.38e − 051.1e−06 3.43e − 051.3e−06 3.40e − 051.4e−06
DTLZ7 2.66e − 033.1e−04 2.72e − 034.8e−04 2.43e − 033.0e−04

Table 6. Spread Median and IQR

OMOPSO SMPSO HPSODE


ZDT1 8.29e − 021.5e−02 7.90e − 021.3e−02 9.49e − 021.6e−02
ZDT2 7.99e − 021.5e−02 7.21e − 021.8e−02 9.48e − 021.4e−02
ZDT3 7.13e − 011.2e−02 7.11e − 019.8e−03 7.05e − 013.8e−03
ZDT4 8.89e − 018.8e−02 9.84e − 021.6e−02 1.01e − 012.1e−02
ZDT6 1.07e + 009.9e−01 3.17e − 011.2e+00 6.76e − 021.2e−02
DTLZ1 6.90e − 011.0e−01 6.78e − 014.2e−02 7.28e − 015.5e−02
DTLZ2 6.22e − 014.9e−02 6.38e − 015.1e−02 6.32e − 015.7e−02
DTLZ3 7.28e − 011.4e−01 6.65e − 012.0e−01 6.43e − 015.5e−02
DTLZ4 6.36e − 016.9e−02 6.59e − 011.3e−01 6.41e − 015.3e−02
DTLZ5 2.02e − 016.8e−02 1.70e − 018.5e−02 1.24e − 012.3e−02
DTLZ6 1.24e − 014.0e−02 1.39e − 013.9e−02 1.19e − 012.4e−02
DTLZ7 7.07e − 015.6e−02 7.08e − 017.0e−02 6.99e − 015.8e−02
36 S. Jiang and Z. Cai

3. HPSODE design a new accept rule Distance/volume fitness, which expands


the solution to feasible searching place and forces the solution near to Pareto-
front.
4. HPSODE significant outperforms OMOPSO, SMPSO in terms of additive
Epsilon, HyperVolume, Genetic Distance, Inverted Genetic Distance.
5. HPSODE get better results than OMOPSO, SMPSO in term of spread
metric.

6 Conclusion and Future Reasearch


Population initialization, regeneration method, archive update rule are three
critical issues in MOEAs. In this paper, we propose a hybrid PSO algorithm
called HPSODE which include uniform design initialization, two phase regener-
ation with DE operator, new accept rule Distance/Volume fitness. Experimental
results show that HPSODE can get higher Hypervolume, lower additive epsilon,
GD, IGD and competitive spread. Our future research will focus on enhance the
Quantum PSO algorithm to solve the MOPs.

References
1. Coello, C.A.C.: Evolutionary multi-objective optimization: A historical view of the
Field. IEEE Computational Intelligence Magazine 1(1), 28–36 (2006)
2. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjec-
tive genetic algorithm: NSGA - II. IEEE Transactions on Evolutionary Computa-
tion 6(2), 182–197 (2002)
3. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength Pareto evo-
lutionary algorithm, Technical Report 103, Computer Engineering and Networks
Laboratory (2001)
4. Kukkonen, S., Lampinen, J.: GDE3: The third evolution step of generalized dif-
ferential evolution. In: Proceedings of the 2005 IEEE Congress on Evolutionary
Computation (1), pp. 443–450 (2005)
5. Coello Coello, C.A., Toscano Pulido, G., Salazar Lechuga, M.: Handling Multiple
Objectives With Particle Swarm Optimization. IEEE Transactions on Evolutionary
Computation 8, 256–279 (2004)
6. Sierra, M.R., Coello, C.A.C.: Improving PSO-based multi-objective optimization
using crowding, mutation and -dominance. In: Coello Coello, C.A., Hernández
Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 505–519. Springer,
Heidelberg (2005)
7. Mostaghim, S., Teich, J.: Strategies for Finding Good Local Guides in Multi-
objective Particle Swarm Optimization (MOPSO). In: 2003 IEEE Swarm Intel-
ligence Symposium Proceedings, Indianapolis, Indiana, USA, pp. 26–33. IEEE
Service Center (2003)
8. Zhang, W.J., Xie, X.F.: DEPSO: Hybrid particle swarm with differential evolution
operator. In: IEEE International Conference on Systems Man and Cybernetics, pp.
3816–3821 (2003)
9. Fang, K.T., Ma, C.X.: Orthogonal and uniform design. Science Press (2001)
(in Chinese)
A Novel Hybrid Particle Swarm Optimization 37

10. Leung, Y.W., Wang, Y.: An orthogonal genetic algorithm with quantization for
global numerical optimization. IEEE Transactions on Evolutionary Computa-
tion 5(1), 41–53 (2001)
11. Zeng, S.Y., Kang, L.S., Ding, L.X.: An orthogonal multiobjective evolutionary
algorithm for multi-objective optimization problems with constraints. Evolutionary
Computation 12, 77–98 (2004)
12. Cai, Z.H., Gong, W.Y., Huang, Y.Q.: A novel differential evolution algorithm based
on -domination and orthogonal design method for multiobjective optimization.
In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) EMO 2007.
LNCS, vol. 4403, pp. 286–301. Springer, Heidelberg (2007)
13. Leung, Y.-W., Wang, Y.: Multiobjective programming using uniform design and
genetic algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part
C 30(3), 293 (2000)
14. Durillo, J.J., Nebro, A.J., Luna, F., Dorronsoro, B., Alba, E.: jMetal: A
Java Framework for Developing Multi-Objective Optimization Metaheuristics,
Departamento de Lenguajes y Ciencias de la Computación, University of
Málaga, E.T.S.I. Informática, Campus de Teatinos, ITI-2006-10 (December 2006),
http://jmetal.sourceforge.net
Research on Constrained Layout Optimization Problem
Using Multi-adaptive Strategies Particle Swarm
Optimizer∗

Kaiyou Lei

Faculty of Computer & Information Science, Southwest University, Chongqing, 400715, China
[email protected]

Abstract. The complex layout optimization problems with behavioral con-


straints belong to NP-hard problem in math. Due to its complexity, the general
particle swarm optimization algorithm converges slowly and easily to local op-
tima. Taking the layout problem of satellite cabins as background, a novel adap-
tive particle swarm optimizer based on multi-modified strategies is proposed in
the paper, which can not only escape from the attraction of local optima of the
later phase to heighten particle diversity, and avoid premature problem, but also
maintain the characteristic of fast speed search in the early convergence phase
to get global optimum, thus, the algorithm has a better search performance to
deal with the constrained layout optimization problem. The proposed algorithm
is tested and compared it with other published methods on constrained layout
examples, demonstrated that the revised algorithm is feasible and efficient.

Keywords: particle swarm optimization, premature problem, constrained layout.

1 Introduction
The classical layout problem is generally divided in to two types: Packing problem
and Cutting problem. The main problem is to increase the space utility ratio as much
as possible under the condition of non-overlapping between piece (object) and piece
(object) and between piece (object) and container. They are called as layout problems
without behavioral constraints (unconstrained layout). In recent years, another more
complex layout problem is attracting a lot of attention, such as the layout design of
engineering machine, spacecraft, ship, etc. Solving these kinds of problems need to
consider some additional behavioral constraints, for instance, inertia, equilibrium,
stability, vibration, etc. They are called as layout problem with behavioral constraints
(constrained layout). Constrained layout belong to NP-hard problem, its optimization
is hand highly [1].
As a newly developed population-based computational intelligence algorithm, Par-
ticle Swarm Optimization (PSO) was originated as a simulation of simplified social
model of birds in a flock [2]. The PSO algorithm is easy to implement and has been
proven very competitive in large variety of global optimization problems and applica-
tion areas compared to conventional methods and other meta-heuristics [3].


The work is supported by Key Project of Chinese Ministry of Education (104262).

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 38–47, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Research on Constrained Layout Optimization Problem 39

Since its introduction, numerous variations of the basic PSO algorithm have been
developed in the literature to avoid the premature problem and speed up the conver-
gence process, which are the most important two topics in the research of stochastic
search methods. To make search more effective, there are many approaches suggested
by researchers to solve the problems, such as variety mutation and select a single
inertia weight value methods, etc, but these methods have some weakness in common,
they usually can not give attention to both global search and local search, preferably,
so to trap local optima, especially in complex constrained layout problems [4].
In this paper, particle swarm optimizer with a better search performance is pro-
posed, which employ multi-adaptive strategies to plan large-scale space global search
and refined local search as a whole according to the specialties of constrained layout
problems, and to quicken convergence speed, avoid premature problem, economize
computational expenses, and obtain global optimum. We tested the proposed algo-
rithm and compared it with other published methods on three constrained layout ex-
amples. The experimental results demonstrated that this revised algorithm can rapidly
converge at high quality solutions.

2 Mathematical Model and PSO


Mathematical Model. Taking the layout problem of satellite cabins as background, the
principle is illuminated [1]. There is a two-dimension rotating circular table (called
graph plane) with radius R and n dishes (called graph units) with uniform thickness,
each of which has a radius ri and mass mi (i=1,2…n), are installed on the graph plane
as shown in Fig.1. Suppose that the frictional resistance is infinite. Find the position
of each graph unit such that the graph units highly concentrate on the center of the
graph plane and satisfy the following constraints:
⑴There is no interference between any two graph units.
⑵No part of any graph unit protrudes out the graph plane.
⑶The deviation from dynamic-balancing for the system should not exceed a per-
missible value [δj].

Fig. 1. Representation of graph units layout

Suppose that the center of the graph plane is origin o of the Cartesian system, the
graph plane and graph units are just in plane xoy, the thickness of the graph units is
ignored, and xi,yi are the coordinates of the center oi of the graph unit i, which is its
mass center also. The mathematical model for optimization of the problem is given
by:
40 K. Lei

Find X i = xi , y i [ ]T , i ∈ I , i = 1,2 ,3,L , n

{ (x )+ r }
(1)
min F ( X i ) = max + yi
2 2
i i

.t.
s

f1( Xi ) = ri +rj − (xi − xj ) +( yi − yj ) ≤ 0,i ≠ j; i, j ∈I


2 2
(2)

f 2 (X i ) = (x i
2
+ yi
2
) + r − R ≤ 0,
i i∈I
(3)

2 2 (4)
⎛ n ⎞ ⎛ n ⎞
f3 ( X i ) = ⎜⎜ mi xi ⎟⎟ + ⎜⎜ mi yi ⎟⎟ − [δ J ] ≤ 0, i ∈ I
∑ ∑
⎝ i =1 ⎠ ⎝ i =1 ⎠

PSO Algorithm. In the original PSO formulae, particle i is denoted as


Xi=(xi1,xi2,...,xiD), which represents a potential solution to a problem in D-
dimensional space. Each particle maintains a memory of its previous best position
Pbest, and a velocity along each dimension, represented as Vi=(vi1,vi2,...,viD). At
each iteration, the position of the particle with the best fitness in the search space,
designated as g, and the P vector of the current particle are combined to adjust the
velocity along each dimension, and that velocity is then used to compute a new
position for the particle.
In the standard PSO, the velocity and position of particle i at (t+1)th iteration are
updated as follows:

( )
v idt +1 = w ∗ v idt + c1 ∗ r1 ∗ p idt − x idt + c 2 ∗ r2 ∗ p gd
t
− x idt ( ) (5)

xidt +1 = xidt + vidt +1 (6)

Constants c1 and c2 are learning rates; r1 and r2 are random numbers uniformly dis-
tributed in the interval [0, 1]; w is an inertia factor.
To speed up the convergence process and avoid the premature problem, Shi pro-
posed the PSO with linearly decrease weight method (LDWPSO)[3,4]. Suppose wmax
is the maximum of inertia weight, wmin is the minimum of inertia weight, run is cur-
rent iteration times, runMax is the total iteration times, the inertia weight is formu-
lated as:
w = wmax − ( wmax − wmin ) * run / runMax (7)
Research on Constrained Layout Optimization Problem 41

3 Multi-Adaptive Strategies Particle Swarm Optimizer


(MASPSO)
Due to the complexity of a great deal local and global optima, PSO is revised as
MASPSO by four adaptive strategies to adapt the constrained layout optimization.

3.1 Adaptive harmonization Strategy of w

The w has the capability to automatically harmonize global search abilities and local
search abilities, avoid premature and gain rapid convergence to global optimum. First
of all, larger w can enhance global search abilities of PSO, so to explore large-scale
search space and rapidly locate the approximate position of global optimum, smaller
w can enhance local search abilities of PSO, particles slow down and deploy refined
local search, secondly, the more difficult the optimization problems are, the more
fortified the global search abilities need, once located the approximate position of
global optimum, the refined local search will further be strengthen to get global opti-
mum[5,6,7].
According to the conclusions above, we first constructed ⑻
as new inertia weight
decline curve for PSO, demonstrated in Figure 2.

w = 1 (1 + exp (0 .015 ∗ (run − runMax 3 ))) (8)

Fig. 2. Inertia weight decline curve by formula [8]

t
3.2 Adaptive Difference Mutation Strategy of Global Optimum p gd

Considering that the particles may find the better global optimum in the current best
region, the algorithm is designed to join mutation operation with the perturbation
operator. The runmax1, which is an iteration times of the transformation point, divides
runMax into two segments to respectively mutate according to themselves characteris-
tics, and further enhance the global search abilities and local search abilities to find a
satisfactory solution. pη is the mutation probability within (0.1, 0.3).The computed
equation is defined as:

if run ≤ run max1 p gd


t
= p gd
t
∗ (1 + 0.5η ); esle p gd
t
= p gd
t
∗ (1 + 0.15η ). (9)

whereη is Gauss (0,1) distributed random variable.


42 K. Lei

3.3 Adaptive Radius Rs Strategy

In equation (3), the graph plane radius is constant, which will influence the fitness
value computation and the result of search. In the search process, the more smallness
the periphery envelope circle radiuses are, the more convergence graph units are.
Evidently, taking the current periphery envelope circle radius as layout radius, the
search progress in smaller radius area, and quicken convergence speed, economize
computational expenses. Suppose the periphery envelope circle radius is Rs, the com-
puted equation is defined as:
t +1 t +1 t +1 t +1
if RS > RS , RS = RS ,esle RS = RS
t t
(10)

3.4 Adaptive Difference Mutation Strategy of Xi

Analysis on equation (5), if the best position of all particles has no change for a long
time, then the velocity updated is decided by w*vidt. Due to w<1, the velocity is less
and less, so the particle swarm is tended to get together, farther, the algorithm is
trapped into local optima, premature problem emerged like in genetic algorithms.
Considering that the adaptive difference mutation strategy of Xi is introduced. When
the best position of all particles has no or less change for a long time, the part of all
particles are kept down, deploy refined local search unceasingly; the rest are initial-
ized randomly, so to enhance global search abilities of PSO, break away from the
attraction of local optima, synchronously. Suppose runt is an iteration times, runk,
runk+t are the kth, (k+t)th iteration times, Rk, Rk+t are the maximal radius, accordingly;
Xρ are the initialized particles randomly, ρ is the dimension mutation probability,
accordingly; ε is an error threshold, the computed equation is defined as:

ifrun k − run k + t ≥ run t & Rk − R k + t ≤ ε , X i = X ρ , else X i = X i . (11)

where runt, ε, ρ are taken 20,0.005,0.2 in the paper.

3.5 Algorithm Designing and Flow

PSO, which is modified as MASPSO by the above methods, has the excellent search
performance to optimize constrained layout problems. According to MASPSO, The
design of the algorithm is as follows.
All particles are coded based on the rectangular plane axis system. Considering that
the shortest periphery envelope circle radius is the optimization criterion based on the
above constraint conditions for our problem, the fitness function and the penalty func-
tion, which are constructed in the MASPSO, respectively, can be defined as:
3
φ 1 (X i )= F (X i ) + ∑i=1
λ i f i (X i )u i ( f i ),
(12)
⎧0 fi ( X i ) ≤ 0
u i ( fi ) = ⎨ , i ∈ I.
⎩1 fi ( X i ) > 0
Research on Constrained Layout Optimization Problem 43

where λi (λ1=1,λ2=1,λ3=0.01) is the penalty factor.


The flow of the algorithm is as follows:
Step1.Set parameters of the algorithm.
Step2.Randomly initialize the speed and position of each particle.
Step3.Evaluate the fitness of each particle using ⑿
and determine the initial values
t t
of the individual and global best positions: p and p gd
id .
Step4.Update velocity and position using (5), (6)and(8).
Step6.Evaluate the fitness using (12) and determine the current values of the indi-
vidual and global best positions: pidt and p gd
t
.
t +1
Step7.Check the R t and R , determine the periphery envelope circle radius.
t
Step8.Mutate p gd and Xi using (9) and (10), respectively.
Step9.Loop to Step 4 and repeat until a given maximum iteration number is at-
tained or the convergence criterion is satisfied.

4 Computational Experiments

Taking he literature [8], [9] as examples, we tested our algorithm, the comparison of
statistical results are shown in Tab.1, Tab.2, and Tab.3, respectively.
Parameters used in our algorithm are set to be: c1=c2=1.5, runMax=1000. The run-
ning environment is: MATLAB7.1, Pentium IV 2GHz CPU, 256MRAM, Win
XPOS.

Example1. Seven graph units are contained in the layout problem. The radius of
a graph plane is R=50mm. The allowing value of static non-equilibrium J is

Table 1. Comparison of experimental results of example 1

(a) Datum and layout results of example 1

Num- Graph units Literature [8] result Literature [9] result Our result
ber r(mm) m(g) x(mm) y(mm) x (mm) y(mm) x(mm) y(mm)
1 10.0 100.00 -12.883 17.020 14.367 16.453 17.198 -13.618
2 11.0 121.00 8.8472 19.773 -18.521 -9.560 -19.857 6.096
3 12.0 144.00 0.662 0.000 2.113 -19.730 -1.365 19.886
4 11.5 132.00 -8.379 -19.430 19.874 -4.340 18.882 7.819
5 9.5 90.25 -1.743 0.503 -19.271 11.241 -16.967 -14.261
6 8.5 72.25 12.368 -18.989 -3.940 22.157 -0.606 -22.873
7 10.5 110.25 -21.639 -1.799 -0.946 2.824 -0.344 -3.010
44 K. Lei

Table 1. (continued)

(b) The performance comparison of layout results of example 1

The least circle Static non-


Computational Interfer- Computational
Radius included all equilibrium
method ence time(s)
graph units (mm) (g.mm)
Literature [8] algorithm 32.662 0.029000 0 1002
Literature [9] algorithm 31.985 0.018200 0 1002
Our algorithm 31.934 0.000001 0 548

(c) The example 1 layout results comparison based on 40 times algorithm running

Times Times
The least circle The least circle
radius included all Literature radius included all Literature
Our Our
graph units (mm) [9] graph units (mm) [9]
algorithm algorithm
algorithm algorithm
≤32.3 10 19 (32.9,33.1] 2 1
(32.3,32.5] 16 11 (33.1,33.3] 2 1
(32.5,32.7] 5 5 >33.3 3 0
(32.7,32.9] 2 3

(a) Literature [8] layout result (b) Literature [9] layout result (c) Our layout result

Fig. 3. The geometry layout of example 1

[gj]=3.4g*mm. The result is in Table 1 and the geometric layout is shown in Figure 3
(the particle size is 50).

Example2. Forty graph units are contained in the layout problem. The radius of a
graph plane is R=880 mm. The radius of a graph plane is R=880 mm. The allowing
value of static non-equilibrium J is [gj]=20g*mm. The result is in Table2 and the
geometric layout is shown in Figure 4 (the particle size is 100).
Research on Constrained Layout Optimization Problem 45

Table 2. Comparison of experimental results of example 2

(a) Datum and layout results of example 2

Graph units Literature [9] result Our result


Number
r(mm) m(g) x(mm) y(mm) x(mm) y(mm)
1 106 11 192.971 0 -528.304 -119.969
2 112 12 -69.924 0 103.353 -441.960
3 98 9 13.034 -478.285 -164.920 271.842
4 105 11 -291.748 21.066 -268.975 68.475
5 93 8 343.517 -351.055 -382.940 575.014
6 103 10 -251.1434 674.025 119.907 -45.977
7 82 6 95.268 -252.899 321.849 -433.652
8 93 8 -619.634 -421.032 684.118 71.745
9 117 13 725.062 0 -112.375 -364.996
10 81 6 127.487 175.174 -194.273 -553.490
11 89 7 358.251 -104.181 58.218 -246.094
12 92 8 694.612 -206.946 482.078 -312.251
13 109 11 -151.494 -350.475 299.075 347.118
14 104 10 -486.096 278.028 -381.489 -562.752
15 115 13 -406.944 -378.282 -378.466 266.641
16 110 12 -531.396 27.583 330.471 -106.831
17 114 12 -281.428 -570.129 -55.284 643.08
18 89 7 535.186 -82.365 499.100 362.345
19 82 6 349.187 -668.540 442.242 -545.018
20 120 14 494.958 -527.668 641.168 -172.081
21 108 11 -696.916 236.466 -205.525 478.499
22 86 7 -43.153 196.294 -534.084 -449.197
23 93 8 -143.066 -725.316 -478.615 74.719
24 100 10 -433.688 -159.158 511.793 171.389
25 102 10 -741.858 0.000 -7.548 403.484
26 106 11 292.820 431.997 -369.369 -350.518
27 111 12 -540.511 495.023 113.819 228.113
28 107 11 154.296 -671.681 -316.496 -144.125
29 109 11 -317.971 463.365 156.530 -656.467
30 91 8 41.295 -271.016 -625.532 -292.937
31 111 12 103.622 538.523 308.084 114.390
32 91 8 215.467 -213.844 235.152 -283.914
33 101 10 540.248 306.466 -545.860 404.331
34 91 8 58.125 341.687 -660.742 171.945
35 108 11 -235.120 227.217 406.575 539.906
36 114 12 510.413 520.918 182.065 555.234
37 118 13 -29.219 725.331 -94.105 -105.916
38 85 7 300.625 240.313 -26.879 -592.954
39 87 7 234.066 -494.031 505.239 -15.817
40 98 9 411.043 119.080 -56.975 107.473
46 K. Lei

Table 2. (continued)

(b) The performance comparison of layout results of example 2

The least circle Static non-


Computational Interfer- Computational
Radius included all equilibrium
method ence time(s)
graph units (mm) (g.mm)
Literature [8] algorithm 874.830 11.395000 0 1656
Literature [9] algorithm 843.940 0.003895 0 2523
Our algorithm 783.871 0.001267 0 1633

(a) Literature [9] layout result (b) Our layout result

Fig. 4. The geometry layout of example 2

5 Conclusions
From table1and table2, we can educe that: PSO with four improved adaptive strate-
gies harmonized the large-scale space global search abilities and the refined local
search abilities much thoroughly, which has rapid convergence and can avoid prema-
ture, synchronously. The effectiveness of the algorithm is validated for the con-
strained layout of the NP-hard problems; the algorithm outperformed the known best
ones in the in quality of solutions and the running time. In addition, the parameters of
、、
runt ε ρ are chose by human experience, which has the certain blemish. How to
choose the rather parameters are one of the future works.

References
1. Teng, H., Shoulin, S., Wenhai, G., et al.: Layout optimization for the dishes installed on
rotating table. Science in China (Series A) 37(10), 1272–1280 (1994)
2. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proc. of IEEE Int’l Conf.
Neural Networks, pp. 1942–1948. IEEE Computer Press, Indianapolis (1995)
Research on Constrained Layout Optimization Problem 47

3. Eberhart, R.C., Kennedy, J.: A new optimizer using particles swarm theory. In: Sixth Inter-
national Symposium on Micro Machine and Human Science, pp. 39–43. IEEE Service Cen-
ter, Piscataway (1995)
4. Angeline, P.J.: Using selection to improve particle swarm optimization. In: Proc. IJCNN
1999, pp. 84–89. IEEE Computer Press, Indianapolis (1999)
5. Lei, K., Qiu, Y., He, Y.: A new adaptive well-chosen inertia weight strategy to automati-
cally harmonize global and local search ability in particle swarm optimization. In: 1st Inter-
national Symposium on Systems and Control in Aerospace and Astronautics, pp. 342–346.
IEEE Press, Harbin (2006)
6. Shi, Y., Eberhart, R.C.: A modified particle swarm optimizer. In: Proc. of the IEEE Con.
Evolutionary Computation, pp. 69–73. IEEE Computer Press, Piscataway (1998)
7. Jianchao, Z., Zhihua, C.: A guaranteed global convergence particle swarm optimizer. Jour-
nal of Computer Research and Development 4(8), 1334–1338 (2004) (in Chinese)
8. Fei, T., Hongfei, T.: A modified genetic algorithm and its application to layout optimization.
Journal of Software 10(10), 1096–1102 (1999) (in Chinese)
9. Ning, L., Fei, L., Debao, S.: A study on the particle swarm optimization with mutation
operator constrained layout optimization. Chinese Journal of Computers 27(7), 897–903
(2004) (in Chinese)
ARIMA Model Estimated by Particle Swarm
Optimization Algorithm for
Consumer Price Index Forecasting

Hongjie Wang1 and Weigang Zhao2,


1
Railway Technical College, Lanzhou Jiaotong University, Lanzhou 730000, China
[email protected]
2
School of Mathematics & Statistics, Lanzhou University, Lanzhou 730000, China
[email protected]

Abstract. This paper presents an ARIMA model which uses particle


swarm optimization algorithm (PSO) for model estimation. Because the
traditional estimation method is complex and may obtain very bad re-
sults, PSO which can be implemented with ease and has a powerful opti-
mizing performance is employed to optimize the coefficients of ARIMA.
In recent years, inflation and deflation plague the world moreover the
consumer price index (CPI) which is a measure of the average price of
consumer goods and services purchased by households is usually observed
as an important indicator of the level of inflation, so the forecast of CPI
has been focused on by both scientific community and relevant authori-
ties. Furthermore, taking the forecast of CPI as a case, we illustrate the
improvement of accuracy and efficiency of the new method and the result
shows it is predominant in forecasting.

Keywords: ARIMA model, Particle swarm optimization algorithm,


Moment estimation, Consumer price index.

1 Introduction
The ARIMA, which is one of the most popular models for time series forecast-
ing analysis, has been originated from the autoregressive model (AR) proposed
by Yule in 1927, the moving average model (MA) invented by Walker in 1931
and the combination of the AR and MA, the ARMA models [1]. The ARMA
model can be used when the time series is stationary, but the ARIMA model
hasn’t that limitation. ARIMA model pre-processes the original time series by
some methods to make the obtained series stationary so that we can then ap-
ply the ARMA model. For example, for an original series which has only serial
dependency and hasn’t seasonal dependency, if the dth differenced series of this
original series is stationary, we can then apply the ARMA(p, q) model to the
dth differenced series and we say the original time series satisfy the modeling
condition of ARIMA(p, d, q) which is one kind of the ARIMA models. So the

Corresponding author.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 48–58, 2009.

c Springer-Verlag Berlin Heidelberg 2009
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 49

important point to note in ARIMA(p, d, q) modeling is that we must have either


a stationary time series or a time series that becomes stationary after one or
more differencing to be able to use it [2]. In addition, as we know, there are
four main stages in building an ARMA model: 1) model identification, 2) model
estimation, 3) model checking and 4) forecasting [3] (since we focus on the model
estimation here, the third step is ignored in this paper). Since the second step
is a complex process even when we adopt the relatively simple method (moment
estimation for example) and the approximate results of the parameters are un-
predictable which might greatly impact the forecasting precision when q > 1
that is the order of MA, PSO is proposed to achieve the second step.
Particle swarm optimization is a population based stochastic optimization
technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by
social behavior of bird flocking or fish schooling. The algorithm completes the
optimization through following the personal best solution of each particle and
the global best value of the whole swarm. And PSO can be implemented with
ease and has a simple principle, so it has been successfully applied in many fields.
Furthermore, we can see the new method has a higher forecasting precision than
the moment estimation from the case study in Section 4.
This paper is organized as follows: Section 2 reviews the detailed process of
ARIMA model (see [4]) and indicates the deficiencies of the traditional model
estimation; Section 3 introduces the particle swarm optimization algorithm and
proposes the PSO-based estimation method; In Section 4, as a case we forecast
the CPI to illustrate the excellent forecast ability of the new method.

2 Autoregressive Integrated Moving Average


Assume that {y1 , y2 , . . . , yN } is the original series without seasonal dependency
and the series has the limitation that at least 50 and preferably 100 observations
or more should be used [5], that is to say, N ≥ 50. Then we can model the
ARIMA(p, d, q) for this series, and it is obvious that the PSO is applicable for
the other ARIMA models (such as SARIMA model and so on) in the same way.

2.1 Model Identification


For an ARIMA(p, d, q), it is necessary to obtain the (p, d, q) parameters, where p
is the number of autoregressive terms, d is the number of non-seasonal differences
and q is the number of lagged forecast errors in the prediction equation [6].
The chief tools in identification are the autocorrelation function (ACF), the
partial autocorrelation function (PACF), and the resulting correlogram, which
is simply the plots of ACF and PACF against the lag length [2].
Since the ACF and the PACF is based on the covariance, it is necessary to
introduce the covariance first. For a time series {w1 , w2 , . . . , wn }, the covariance
at lag k (when k = 0, it is the variance), denoted by γk , is estimated in (1).
n−k
1
γ̂k = wi wi+k k = 0, 1, . . . , M (1)
n i=1
50 H. Wang and W. Zhao

where M is the maximum number of lags which is determined simply by dividing


the number of observations by 4, for a series with less than 240 based on Box
and Jenkins method [1]. Obviously, γ̂−k = γ̂k .
Then the ACF and PACF at lag k, denoted ρk and αkk respectively, can be
estimated according to (2) and (3).
γ̂k
ρ̂k = (2)
γ̂0

α̂11 = ρ̂1 ⎪

k ⎪

ρ̂k+1 − ⎪

j=1 ρ̂k+1−j α̂kj ⎪

α̂k+1,k+1 = k
1− ρ̂j α̂kj (3)
j=1 ⎪


α̂k+1,j = α̂kj − α̂k+1,k+1 · α̂k,k−j+1 ⎪




(j = 1, 2, . . . , k)
where k = 1, 2, . . . , M
Based upon the information given by the coefficients of ACF we can define
the following steps to determine d to make the series stationarity:
1) Let d = 0;
2) Define zt−d = ∇d yt = (1 − B)d yt , where t = d + 1, d + 2, . . . , N ;
3) Calculate the autocorrelation coefficients of {z1 , z2 , . . . , zN −d };
4) If the autocorrelation coefficients statistically rapidly decline to zero, the
corresponding d is the required one. Or else let d = d + 1 and return step 2).
where B denotes the backward shift operator, and ∇ is the differencing operator.
After determining the parameter d, the series {y1 , y2 , . . . , yN } converts to a
stationary series {z1 , z2 , . . . , zN −d } and we can estimate its mean in (4).
N −d
1
z̄ = zi (4)
N −d i=1

Then we can obtain a stationary time series with zero mean {x1 , x2 , . . . , xN −d },
where xt = zt −z̄. So the problem of modeling ARIMA(p, d, q) for {y1 , y2 , . . . , yN }
converts to the problem of modeling ARMA(p, q) for {x1 , x2 , . . . , xN −d }. And
the generalized form of ARMA(p, q) can be described as follow

ϕ(B)xt = θ(B)at (5)

where 
ϕ(B) = 1 − ϕ1 B − ϕ2 B 2 − · · · − ϕp B p
θ(B) = 1 − θ1 B − θ2 B 2 − · · · − θq B q
and at is stationary white noise with zero mean, ϕi (i = 1, 2, . . . , p) and θi (i =
1, 2, . . . , q) are coefficients of the ARMA model.
On the other hand, for identifying the orders of ARMA model, autocorrela-
tion and partial autocortrelation graphs which are drawn based on 1 ∼ M lag
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 51

numbers provide information about the AR and MA orders (p and q) [1]. Con-
cretely, p is determined as the number of the front a few significant coefficients
in the partial autocorrelation graph and similarly q is determined as the number
of the front a few significant coefficients in the autocorrelation graph.

2.2 Model Estimation


In Section 2.1, the category, structure and orders of the model has been de-
termined. But the coefficients {ϕi } and {θi } of the model is unknown. In this
section we review a usual estimation method(moment estimation).
According to Yule-Walker equation, we can estimate {ϕi } in (6).
⎡ ⎤ ⎡ ⎤−1 ⎡ ⎤
ϕ̂1 γ̂q γ̂q−1 · · · γ̂q−p+1 γ̂q+1
⎢ϕ̂2 ⎥ ⎢ γ̂q+1 γ̂q · · · γ̂q−p+2 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢γ̂q+2 ⎥
⎢ .. ⎥ = ⎢ .. .. .. ⎥ ⎢ .. ⎥ (6)
⎣ . ⎦ ⎣ . . . ⎦ ⎣ . ⎦
ϕ̂p γ̂q+p−1 γ̂q+p−2 · · · γ̂q γ̂q+p

Then we can estimate {θi } and σa2 which is the variance of {at } in (7).

γ̂0 (x̄) = σ̂a2 (1 + θ̂12 + · · · + θ̂q2 )
(7)
γ̂k (x̄) = σ̂a2 (−θ̂k + θ̂k+1 θ̂1 + · · · + θ̂q θ̂q−k ), k = 1, 2, . . . , q
p p
where γ̂k (x̄) = j=0 l=0 ϕ̂j ϕ̂l γ̂k+l−j , k = 0, 1, . . . , q
So if q = 1, according to (7) and the reversibility(that is |θ1 | < 1) of ARMA(p, q)
model, we can calculate σa2 and θ1 using (8) and (9).

2 γ̂0 (x̄) + γ̂02 (x̄) − 4γ̂12 (x̄)
σ̂a = (8)
2
Then
γ̂1 (x̄)
θ̂1 = − (9)
σ̂a2
But if q > 1, we can see it is difficult to solute (7). At this time, we can change
(7) to (10) to solve this problem by using linear iterative method.

σ̂a2 = γ̂0 (x̄)(1 + θ̂12 + · · · + θ̂q2 )−1 ⎪

γ̂k (x̄) (10)
θ̂k = −( 2 − θ̂k+1 θ̂1 − · · · − θ̂q θ̂q−k ), k = 1, 2, . . . , q ⎪

σ̂a

At first, give initial values for {θˆi } and σ̂a2 , for example, let θ̂i = 0 (i = 1, 2, . . . , q)
and σ̂a2 = γ̂0 (x̄) and mark them as {θ̂i (0)} and σ̂a2 (0). Then substitute them into
(10), we can obtain {θ̂i (1)} and σ̂a2 (1). The rest may be deduced by analogy, so
we can obtain {θ̂i (m)} and σ̂a2 (m). If max{|θ̂i (m) − θ̂i (m − 1)|, |σ̂a2 (m) − σ̂a2 (m −
1)|} is very small, we can consider {θ̂i (m)} and σ̂a2 (m) are the approximatively
estimated values of {θi } and σa2 .
52 H. Wang and W. Zhao

From the above, we can see that the moment estimation is cumbersome to
achieve. And when q > 1, the approximate results of the parameters is unpre-
dictable and this might greatly impact the accuracy of forecast. So a method of
PSO for model estimating is proposed in Section 3, and we can see this method
has a simple principle and can be implemented with ease. What’s more, the PSO
estimation for ARIMA model (simply marked as PSOARIMA) has a higher fore-
casting precision than the ARIMA model which will be shown in Section 4.

2.3 Model Forecasting

According to (5), we can obtain at = θ(B)−1 ϕ(B)xt . In addition, we can let



at = xt − πi xt−i = π(B)xt (11)
i=1

So there is ϕ(B) = θ(B)π(B), that is


∞ ∞ ∞
 
ϕi B i = ( θi B i )(1 − πi B i ) (12)
i=1 i=1 i=1

where  
 ϕi 1≤i≤p  θi 1≤i≤q
ϕi = , θi =
0 i>p 0 i>q

Through comparing the terms which have the same degree of B in (12), we can
obtain {πi } in (13), see [7].
⎧  

⎪ π1 = ϕ1 − θ1


i−1
   (13)

⎩ πi = ϕi − θi +
⎪ θi−j πj , i>1
j=1

Ignoring at and selecting a appropriate value K to replace ∞, we can obtain the


predicting formula according to (11), and it is shown as follow

K
x̂t = πi x̂t−i , if t − i ≤ 0, let xt−i = 0 (14)
i=1

where t = 1, 2, . . . , N − d + 1, and {x̂1 , x̂2 , . . . , x̂N −d } is simulation values of


{x1 , x2 , . . . , xN −d } and x̂N −d+1 is one forecasting value of {x1 , x2 , . . . , xN −d }.
Then we can obtain the prediction values of the original series, and the steps
are shown as follows:

1) Let ẑt = x̂t + z̄ (t = 1, 2, . . . , N − d + 1), so {ẑt } is the prediction series of


{zt };
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 53

2) Calculate {∇1 yt }, {∇2 yt }, . . ., {∇d−1 yt } whose number of components are


N − 1, N − 2, . . . , N − d + 1 respectively. Mark {yt } as {∇0 yt }. And mark
{ẑt } as {∇ t t

d y } which is the prediction series of {∇d y }, similarly, {∇i y } is
t
the prediction series of {∇i yt }, i = 0, 1, . . . , d − 1.
3) For all i from d to 1 do:
Let {∇ i−1 y } = {∇i−1 y }
t 1 t 1

and {∇ yt }j = {∇ yt }j−1 + {∇
i−1 i−1  iy }
t j−1
where j = 2, 3, . . . , N − i + 2.
where {∇i−1 yt }1 denote the first component of series {∇i−1 yt }.
According to the above procedure, we can obtain {∇ 0 y } (t = 1, 2, . . . , N + 1)
t
which is the prediction series of original series {yt }, and we mark it as {ŷt }.

3 Particle Swarm Optimization Algorithm


Suppose that the search space is n-dimensional and a particle swarm consists
of m particles, then the ith particle of the swarm and its velocity can be rep-
resented by two n-dimensional vector respectively, xi = (xi1 , xi2 , . . . , xin ) and
vi = (vi1 , vi2 , . . . , vin ) i = 1, 2, . . . , m, see [8]. Each particle has a fitness that
can be determined by the objective function of optimization problem, and knows
the best previously visited position and present position of itself. Each particle
has known the position of the best individual of the whole swarm. Then the
velocity of particle and its new position will be assigned according to the follows

vik+1 = w × vik + c1 τ1 (pxbesti − xki ) + c2 τ2 (gxbest − xki ) (15)

xk+1
i = xki + vik+1 (16)
where vik and xkiare the velocity and position of the ith particle at time k
respectively; pxbesti is the individual best position of the ith particle; gxbest is
the global best position of the whole swarm; τ1 , τ2 are random number uniformly
from the interval [0,1]; c1 , c2 ∈ [0, 2] are two positive constants called acceleration
coefficients namely cognitive and social parameter respectively and as default in
[9], c1 = c2 = 2 were proposed; w is inertia weight that can be determined by

w = wmax − (wmax − wmin )N um/N ummax (17)

where wmax and wmin are separately maximum and minimum of w; N ummax is
maximum iteration time and N um is the current iterative time. A larger inertia
weight achieves the global exploration and a smaller inertia weight tends to
facilitate the local exploration to fine-tune the current search area. Usually, the
maximum velocity Vmax is set to be half of the length of the search space [10].
In this paper, because there are p + q coefficients (ϕi , θj , i = 1, 2, . . . , p, j =
1, 2, . . . , q) to be selected, the dimension of the swarm n = p + q. What’s more,
when the coefficients {ϕi } and {θj } are denoted by Φ and Θ which have p and
54 H. Wang and W. Zhao

q components respectively, the form of one particle can be denoted by (Φ, Θ),
that is (ϕ1 , ϕ2 , . . . , ϕp , θ1 , θ2 , . . . , θq ). Then we mark the simulation series {x̂t }
(Φ,Θ)
which is obtained from (14) as {x̂t } when the coefficients are Φ and Θ.
In the investigation, the mean square error (MSE), shown as (18), serves as
the forecasting accuracy index and the objective function of PSO for identifying
suitable coefficients.
N −d
1 (Φ,Θ) 2
M SE (Φ,Θ) = (xt − x̂t ) (18)
N −d t=1

The procedure of estimating coefficients by PSO for this model is illustrated as


follows:

1. Randomly generate m initial particles x01 , x02 , . . . , x0m and initial velocities
v10 , v20 , . . . , vm
0
which have p + q components respectively.
2. for 1 ≤ i ≤ m do
0 0
3. pxbesti ⇐ x0i ; pf besti ⇐ M SE (xi (1:p),xi (p+1:p+q))
4. x1i ⇐ x0i + vi0
5. end for
6. for all i such that 1 ≤ i ≤ m do
7. if pf bests ≤ pf besti then
8. gxbest ⇐ pxbests ; gf best ⇐ pf bests
9. end if
10. end for
11. k ⇐ 1; wmax ⇐ 0.9; wmin ⇐ 0.4
12. while k < N ummax do
13. for 1 ≤ i ≤ m do
k k
14. if M SE (xi (1:p),xi (p+1:p+q)) < pf besti then
k k
15. pf besti ⇐ M SE (xi (1:p),xi (p+1:p+q)) ;
16. pxbesti ⇐ xi k

17. end if
18. end for
19. for all i such that 1 ≤ i ≤ m do
20. if pf bests ≤ pf besti then
21. gxbest ⇐ pxbests ; gf best ⇐ pf bests
22. end if
23. end for
24. w ⇐ wmax − k(wmax − wmin )/N ummax
25. for 1 ≤ i ≤ m do
26. vik+1 ⇐ w × vik + c1 τ1 (pxbesti − xki ) + c2 τ2 (gxbest − xki )
27. if vik+1 ≥ Vmax or vik+1 ≤ −Vmax then
28. vik+1 ⇐ vik+1 /|vik+1 | × Vmax
29. end if
30. xk+1
i ⇐ xki + vik+1
31. end for
32. end while
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 55

where x(1 : p), x(p + 1, p + q) denote the 1st ∼ pth and the (p + 1)th ∼ (p + q)th
components of the vector x respectively. So gxbest(1 : p) and gxbest(p + 1, p + q)
are the optimally estimated values of Φ and Θ.

4 Case Study
In this section, a case study of predicting the consumer price index (CPI) of 36
big or medium-sized cities in China is shown. The original data is CPI in Jan.
2001 ∼ Oct. 2008 which is obtained from CEInet statistical database. And mark
the original data as {yt }, t = 1, 2, . . . , 94.
We can easily see {yt } is a non-stationary series by observing its graph-
ics. Observing the autocorrelogram of the third-differenced data of the original
data(shown in Fig. 1, where the two solid horizontal lines represent the 95%
confidence interval), in which we can see the autocorrelation coefficients statis-
tically rapidly decline to zero after lag 1, we confirm the third-differenced data
is stationary, and mark it as {zt }, t = 1, 2, . . . , 91. So d is equal to 3. And we can
then mark {zt − z̄} as {xt }, t = 1, 2, . . . , 91 where z̄ = −0.0275 according to (4).
Then we can draw the autocorrelogram and partial autocorrelogram (Fig. 2).
From this figure, two facts stand out: first, the autocorrelation coefficient starts
at a very high value at lag 1 and then statistically rapidly declines to zero;
second, PACF up to 4 lags are individually statistically significant different from
zero but then statistically rapidly declines to zero too. So we can determine p = 4
and q = 1, that is to say, we can design an ARMA(4,1) model for {xt }.
Estimating the coefficients Φ = {ϕ1 , ϕ2 , ϕ3 , ϕ4 } and Θ = {θ1 } according to
(6), (8), (9) and PSO method with the optimizing range of [−1, 1], respectively,
we can obtain Φ = {−1.0750, −0.8546, −0.4216, −0.1254}, Θ = {0.1614} and
Φ = {−0.4273, −0.0554, 0.1979, 0.1487}, Θ = {0.8196} respectively. Then we
can obtain the fitted and prediction series, and the fitted graph is shown as Fig.
3. At the same time, we obtain their MSE are 1.3189 and 1.2935 respectively,
that is to say, the latter fit better than the former.

Fig. 1. The autocorrelogram of the third-differenced data up to 23 lags


56 H. Wang and W. Zhao

(a) Autocorrelogram (b) Partial autocorrelogram

Fig. 2. The correlogram of {xt } up to 23 lags

8 8
Original data Original data
Simulation values Simulation values
6 6

4 4

2 2

0 0

Ŧ2 Ŧ2

Ŧ4 Ŧ4

Ŧ6 Ŧ6

Ŧ8 Ŧ8
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

(a) ARMA(4,1) (b) ARMA(4,1) optimized by PSO

Fig. 3. Fitted results of {xt }

Table 1. Prediction of consumer price index in Nov. 2008

Actual ARIMA(4,3,1) PSOARIMA(4,3,1)


value Prediction Relative Error Prediction Relative Error
102.70 103.24 0.53% 103.09 0.38%

Furthermore, the prediction values of the original data {yt } can be obtained
according to Section 2.3 by both moment estimation method and PSO method
for estimation. The relevant results are shown in Table 1. From this table, we can
see that the accuracy of PSOARIMA(4, 3, 1) model is better than the ARIMA(4,
3, 1) model. What’s more, we present the prediction figures (Fig. 4) to compare
the accuracy of the two models clearly.
ARIMA Model Estimated by Particle Swarm Optimization Algorithm 57

110 110
Original data Original data
Actual value in Nov. 2008 Actual value in Nov. 2008
Prediction values Prediction values
108 108

106 106
Consumer price index

Consumer price index


104 104

102 102

100 100

98 98

96 96
2001.01 2002.01 2003.01 2004.01 2005.01 2006.01 2007.01 2008.01 2001.01 2002.01 2003.01 2004.01 2005.01 2006.01 2007.01 2008.01
Year.Month Year.Month

(a) ARIMA(4, 3, 1) (b) PSOARIMA(4, 3, 1)

Fig. 4. Prediction results of {yt }

Table 2. Prediction of consumer price index in Dec. 2008 ∼ Mar. 2009

Month Actual ARIMA(4,3,1) PSOARIMA(4,3,1)


Year value Predict R.E. Predict R.E.
12. 2008 101.30 101.39 0% 101.44 0.13%
01. 2009 101.10 99.725 1.36% 99.819 1.27%
02. 2009 98.50 100.31 1.83% 99.808 1.33%
03. 2009 98.90 96.029 2.90% 96.33 2.60%

Additionally, when we know the actual value in Nov. 2008, the value in
Dec. 2008 can be predicted. The rest may be deduced by analogy, we predict
the values in Jan. 2009 ∼ Mar. 2009, and the relevant results are shown in
Table 2. From this table we can see that accuracy of both models is very good,
but relative error of PSOARIMA(4, 3, 1) is smaller than ARIMA(4, 3, 1) gen-
erally.

5 Conclusion
In this paper, we proposed a novel design methodology which is a hybrid model
of particle swarm optimization algorithm and ARIMA model. PSO is used for
model estimation of ARIMA in order to overcome the deficiency that the tra-
ditional estimation method is difficult to implement and may obtain very bad
results. Furthermore, it is observed that the PSO-based model has worked more
accurately than the traditional moment estimation-based model through the
experimental results of forecasting the CPI.
On the other hand, we can see the power of the PSO for optimizing param-
eters. So in the future study, we will find more practical way to apply PSO
58 H. Wang and W. Zhao

in the other fields. But it is necessary to pay attention to that standard PSO
sometimes easily get in local optimization but not the holistic optimization,
fortunately, chaotic particle swarm optimization algorithm(CPSO) can help to
solve this problem successfully.

Acknowledgments. The research was supported by the Ministry of Education


oversea cooperation ”Chunhui Projects” under Grant (Z2007-1-62012) and The
Natural Science Foundation of Guansu Province in China under Grant (ZS031-
A25-010-G).

References
1. Ediger, V.Ş., Akar, S.: ARIMA forecasting of primary energy demand by fuel in
Turkey. Energy Policy 35, 1701–1708 (2007)
2. Erdogdu, E.: Electricity demand analysis using cointegration and ARIMA mod-
elling: A case study of Turkey. Energy Policy 35, 1129–1146 (2007)
3. Ong, C.-S., Huang, J.-J., Tzeng, G.-H.: Model identification of ARIMA family
using genetic algorithms. Applied Mathematics and Computation 164, 885–912
(2005)
4. Niu, D., Cao, S., Zhao, L., Zhang, W.: Power load forecasting technology and its
applications. China Electric Power Press, Beijing (1998) (in Chinese)
5. Ediger, V.Ş., Akar, S., Uğurlu, B.: Forecasting production of fossil fuel sources
in Turkey using a comparative regression and ARIMA model. Energy Policy 34,
3836–3846 (2006)
6. Valenzuela, O., Rojas, I., Rojas, F., Pomares, H., Herrera, L.J., Guillen, A., Mar-
quez, L., Pasadas, M.: Hybridization of intelligent techniques and ARIMA models
for time series prediction. Fuzzy Sets and Systems 159, 821–845 (2008)
7. Li, M., Zhou, J.-z., Li, J.-p., Liang, J.-w.: Predicting Securities Market in Shanghai
and Shenzhen by ARMA Model. Journal of Changsha Railway University 3, 78–84
(2000) (in Chinese)
8. Dingxue, Z., Xinzhi, L., Zhihong, G.: A Dynamic Clustering Algorithm Based
on PSO and Its Application in Fuzzy Identification. In: The 2006 International
Conference on Intelligent Information Hiding and Multimedia Signal Processing,
pp. 232–235. IEEE Press, New York (2006)
9. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: The IEEE Interna-
tional Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE Press, New
York (1995)
10. Alatas, B., Akin, E., Bedri Ozer, A.: Chaos embedded particle swarm optimization
algorithms. Chaos, Solitons and Fractals (2008)
A Uniform Solution to HPP in Terms of Membrane
Computing

Haizhu Chen and Zhongshi He

College of Computer Science, Chongqing University


Chongqing 400044, China
{cqchz,zshe}@cqu.edu.cn

Abstract. Membrane system is a computing model which imitates natural proc-


ess at cellular level. Research results have shown that this model is a promising
framework for solving NP-complete problems in polynomial time. Hamiltonian
path problem (HPP for short) is a well-known NP-complete problem, and there
are several semiform P systems dealing with it. In the framework of recognizer
P systems based on membrane 2-division, we presented a uniform polynomial
time solution to the HPP problem and gave some formal details of this solution
from the point of view of complexity classes.

Keywords: Natural Computing, Membrane Computing, Cellular Complexity


Classes, HPP.

1 Introduction
P systems are emergent branch of nature computing, which can be seen as a kind of
distributed parallel computing model. It is based upon the assumptions that the proc-
esses taking place in the living cells can be considered as computations. Up to now,
several variants of P systems with linear or polynomial time have been constructed to
solve some NP-complete problems, such as SAT [1, 2, 3], HPP [2, 4, 5], Subset Sum
[6], Knapsack [7], 2-Partition[8], Bin Packing[9].
Hamiltonian path problem (HPP for short) is a well-known NP-complete problem,
and there are several P systems dealing with it. The P systems in [4, 5] are all semi-
form way [10]. Specifying the starting and ending vertices, literature [2] presents a
uniform solution to HPP with membrane separation. It remains open confluently solv-
ing HPP in polynomial time by P systems with division rules instead of separation
rules in a uniform way [2].
Without specifying the starting vertex and ending vertex of a path, this paper
presents a uniform solution to HPP with membrane division where the communica-
tion rules (sending objects into membranes) [3] are not used. The remainder of the
paper is organized as follows: first recognizer P system with membrane division is
introduced in the next section. In Section3, the solution to HPP with membrane divi-
sion is presented, with some formal details of such solution given in Section4. Finally,
conclusions are drawn in the last section.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 59–68, 2009.
© Springer-Verlag Berlin Heidelberg 2009
60 H. Chen and Z. He

2 Recognizer P System with Membrane Divisions


This paper deals with recognizer P system with membrane division. It is necessary to
recall such system. Firstly, a P system with membrane division is a construct [3]:

Π =( V , H , µ , w1 , …, wm , R )

where:
m ≥1 (the initial degree of the system);
V is the alphabet of objects;
H is a finite set of labels for membranes;
µ is a membrane structure, consisting of m membranes, labeled (not necessarily in a
one-to-one manner) with elements of H ;
w1 , …, wm are strings over V describing the multisets (every symbol in a string
representing one copy of the corresponding object) placed in the m regions of µ
respectively;
R is a finite set of developmental rules, of the following forms:
(1) [ a → v ] eh , for h ∈ H , e ∈ {+, −, 0} , a ∈V , v ∈V *
(object evolution rules);
(2) [a]eh1 → [ ]eh2 b , for h ∈ H , e1 , e2 ∈ {+, −, 0} , a, b ∈V
(communication rules);
(3) [a]eh1 → [ b]eh2 [c]eh3 , for h ∈ H , e1 , e2 , e3 ∈{+, −, 0} a, b, c ∈V
(division rules for elementary membranes)
Note that, in order to simplify the writing, in contrast to the style customary in the
literature, we have omitted the label of the left parenthesis from a pair of parentheses
which identifies a membrane. All aforementioned rules are applied according to the
following principles:
(a) The rules of type (1) are applied in the parallel way.
(b) The rules of types (2), (3) are used sequentially, in the sense that one membrane
can be used by at most one rule of these types at a time.
(c) All objects and all membranes, which can evolve, should evolve simultaneously.
We will introduce two definitions[1]:
Definition 1. A P system with input is a tuple ( Π , ∑ , iΠ ), where: (a) Π is a P sys-
tem, with working alphabet Γ , with p membranes labeled with 1, ..., p, and initial
multisets w1 , …, w p associated with them; (b) ∑ is an (input) alphabet strictly con-
tained in Γ ; the initial multisets of Π are over Γ - ∑ ; and (c) iΠ is the label of a
distinguished (input) membrane.
Let win be a multiset over ∑ . The initial configuration of ( Π , ∑ , iΠ ) with input
win is ( µ , w1 , …, wiΠ ∪w in
, …, w p ).
A Uniform Solution to HPP in Terms of Membrane Computing 61

Definition 2. A recognizer P system is a P system with input, ( Π , ∑ , iΠ ), and with


output such that:
(1) The working alphabet contains two distinguished elements, yes and no.
(2) All its computations can halt.
(3) If C is a computation of Π , then either some object yes or no (but not both)
must have been released into the environment, and only in the last step of the
computation.
We say that C is an accepting computation (respectively, rejecting computation) if the
object yes (respectively, no) appears in the external environment associated with the
corresponding halting configuration of C.
For ease of presentation, in the rest of this paper, recognizer P system means
recognizer P system with input and we will not distinguish each other.
Section4 will prove that the family Π of recognizer P system in Section3 gives a
polynomial solution for the HPP. We would like to introduce definition about com-
plexity classes in P systems.
Let AM stand for the class of recognizer P systems with active membrane using 2-
division. We will introduce the definition of the complexity class PMCAM [6, 7, 8, 10]:
Definition 3. We say that a decision problem, X=(IX,θX), is solvable in polynomial
time by a family Π= ( Π (t ) )t∈N, of recognizer P systems using 2-division, and we
write X∈PMCAM, if
(1) Π is AM-consistent; that is, Π (t ) ∈ AM for each t (t∈N).
(2) Π is a polynomially uniform family by Turing Machine, that is, there exists a
deterministic Turing machine that builds Π (t ) from t ≥ 0 in polynomial time.
(3) There exists a polynomial encoding (g, h) of IX in Π verifying:
− Π is polynomially bounded, with respect to (g, h), that is, there exists a polynomial
function p, such that for each u∈IX, all computations of Π (h(u )) with input g(u)
halt in at most p(|u|) steps.
− Π is sound with respect to (g, h); that is, for all u∈IX, if there exists an accepting
computation of Π (h(u )) with input g(u), thenθX(u)=1 (the corresponding answer of
the problem is yes.
− Π is complete with respect to (g, h); that is, for all u∈IX, ifθX(u)=1 (the answer to
the problem is yes), then every computation of Π (h(u )) accepts g(u) (the system
also responds yes).
In the above definition every P system is required to be confluent: every computation
of a system with the same input must always give the same answer. The class PMCAM
is closed under polynomial-time reduction and complement.

3 Solving HPP with Membrane Divisions


A Hamiltonian path in a directed graph G is a path that visits each of the nodes of G
exactly only once. The Hamiltonian Path Problem (HPP for short) is the following
decision problem:
62 H. Chen and Z. He

Input: given a directed graph G = (V, E);


Output: determine if there exists a Hamiltonian path in G.
This section presents a uniform solution to HPP by a family of recognizer P sys-
tems with membrane division, in which the starting vertex and ending vertex of a path
are not specified. Unless stated otherwise, the solutions cited are described in the
usual framework of recognizer P systems with active membranes using 2-division,
with three electrical charges, without change of membrane labels, without coopera-
tion, and without priority. The solution is addressed via a brute force algorithm which
consists in the following phases.
Preparing stage: introduce n vertices from which we can start to search the possible
Hamiltonian paths (HPs);
Generating stage: generate all paths of length n in G (G has n vertices) through
membrane division;
Checking stage: for each of the previous paths, determine whether it visits all the
nodes of G;
Output stage: answer yes or no depending on the results from checking stage.

3.1 The Uniform Solution to HPP with Membrane Divisions

For any given directed graph G with n vertices, we construct a recognizer P system
Π ( n) to solve HPP. Therefore the family presented here is

Π = {( Π ( n) , ∑(n) , iΠ ): n∈N}

The input alphabet is


∑(n) ={xi, j, k | 1≤ i, j ≤ n, -1 ≤ k ≤ i}
Where, the object xi, j, k represents (i, j) ∈E. Over ∑(n) , the instance G (whose size is
n) can be encoded as win and win is the input multiset of Π ( n) :
win = {xi, j, k |1≤ i, j ≤ n, k = i}, (i, j) ∈E
The input membrane is iΠ =2 and the P system can be constructed as:

Π ( n) =( V (n) , H , µ , w1 , w2 , R )
Where,
∪{ v | 1≤ i ≤ n-1}∪{ t , z | 1≤ i ≤ n}
V (n) ={xi, j, k | 1 ≤ i, j ≤ n, -1 ≤ k ≤ i} i i i

∪{ c | 0 ≤ i ≤ n-1}∪{ r , r′ |1≤ i ≤ n}∪{ d , d ′ | 0≤ i ≤ n}


i

∪{ e | 0 ≤ i ≤ n+1}∪{f | 0≤ i ≤ 3n(n+3)/2+5}∪{r , g, e, s, yes, no}


i i i i

i i 0
H = {1, 2}
µ = [[ ] +2 ]10
w1 = f 0 , w2 = d 0 v1
And the set R contains the rules as follows:
A Uniform Solution to HPP in Terms of Membrane Computing 63

[vi ]+2 → [ti ]−2 [vi +1 ]+2 (1≤ i ≤ n-2), [vn−1 ]+2 → [t n−1 ]−2 [t n ]−2 , [t j → c j −1r j′g ]−2 (1≤ j ≤ n) (1)

At the preparing stage, we have { d 0 v1 } and an input multiset win in membrane la-
beled with 2, which is encoded according to the instance G. From v1 , other vertices in
G can be obtained by rules (1). This means that we can find HPs from the n vertices
respectively. Object r j′ represents the starting vertex and object cj-1 can be regarded as
a counter whose subscript is decreased by 1 each time in generating stage. These rules
are applied at the preparing stage.
[ g ]−2 → [ ]02 g (2)

After the starting vertexes appear in corresponding membranes respectively, we will


enter generating stage by rule (2) which will be applied in other computing step as
explained later.
[ xi , j , k → xi , j , k −1 ]02 , [ci → ci −1 ]02 , [ri′ → ri ]02 (1 ≤ i, j ≤ n, 0 ≤ k ≤ i) (3)

In the membranes labeled with 2, when their polarization is 0, the third subscript k of
xi, j, k and the subscript i of ci are decreased by 1 simultaneously. For the i-th vertex in
G that has been added into the current path, Objects ri′ and ri are two different ver-
sions of representation in the membranes with different polarizations. Xi, j, k can not
evolve any more once k reaches -1. c0 and xi, j,0 will appear after I steps if we have
(i, j) ∈E.
[c0 ]02 → [ ] 2+ c0 (4)

Vertex vi is the last vertex of the current path, and we will extend this path by adding
vj with the appearance of c0 if we have (i, j) ∈E. Several vertices will connect vi be-
sides vj, so new membranes are needed to be created to contain the new paths. Rule
(4) can change the polarization of membrane to positive in order to trigger membrane
divisions to obtain new membranes.
[ d i → d i′] +2 , [ ri → ri′] +2 , [ xi , j , 0 ] +2 → [ z j ] −2 [ g ] +2 (1 ≤ i, j ≤ n, 0 ≤ k ≤ i) (5)

Objects d i , d i′ both mean the fact that the length of current path is I, but they are two
different versions in membranes labeled with 2 with different polarizations. Xi, j, 0 in
the membranes with positive polarization will cause the membrane division to extend
the current path. Since not only one vertex connects with vi, there will be several
objects like xi, j, 0 whose third subscript k is 0. We non-deterministically choose one of
them to trigger the division, and the rest will be remained in the two new membranes
with different polarizations, which indicate that they will be processed by different
rules: the division will be triggered again by one of them in the new membrane with
positive polarization in order to obtain another new path; while all of them will be
deleted in the one with negative polarization.
[ z j → r j′c j −1 g ] −2 , [ d i′ → d i ] −2 , [ xi , j , k → xi , j , i ] −2 (6)
64 H. Chen and Z. He

[ xi , j , 0 → λ ] 2− , [ d i → d i +1 ] −2 (1 ≤ i, j ≤ n, k ≠ 0)
In the new membrane labeled with 2 with negative polarization, new object r j′ is
introduced, which indicates that vertex vj is added to the current path. At the same
time, new counter cj-1 is introduced and other objects, such as xi, j, k, return to xi, j, i.
After the length of the path is increased, the polarization is changed by rule (2). We
return to the beginning of generating stage and resume the procedure for extending
the current path.
[ g → λ ]+2 (7)

In the new membrane labeled with 2 with positive polarization, object g does not
work any more and is deleted.
[d n → se]02 (8)

When the length of path is n, we know that n vertices are in the path and we can enter
the next stage to check if the path is Hamiltonian or not. Two new objects s and e are
introduced to prepare for the checking stage.
[e → e0 r0 ] +2 , [ s] +2 → [ ]02 s (9)

With the appearance of dn, the counter ct (t>0) associated with the last vertex is intro-
duced and it still can evolve until c0 is sent out of the membrane. Objects e and s
evolve by rule (9) to trigger the next stage.
[ei → gei +1 ]02 , [r0 ]02 → [ ]2− r0 ( 0 ≤ i ≤ n) (10)

[ri → ri −1 ]−2 (1 ≤ i ≤ n) (11)

In checking stage, we use rules (10) and (11) with a loop. If the object r0 is present,
we eliminate it and perform a rotation of the objects ri, ..., rn with their subscripts
decreased by 1. It is clear that a membrane contains HP if and only if we can run n+1
steps of the loop. Note that this check method is not the same as the one in [10], for
the communication rule are not used.
[en +1 → yes] −2 (12)

[ yes]02 → [ ] +2 yes (13)

The object en+1 introduced by rules (10) means that the path is Hamiltonian, so it
evolves to object yes, which will leave the membrane labeled with 2 and enter
membrane 1.
[ g → λ ]10 , [c0 → λ ]10 , [r0 → λ ]10 , [ s → λ ]10 , [ f i → f i +1 ]10 (0≤ i ≤ 3n(n+3)/2+4) (14)
All of these rules are applied in parallel in membrane 1 with neutral polarization.
Since objects g, c0, r0 and s are sent out of the membranes labeled with 2 and will be
not used any more, it necessary to delete them to release computing recourse. Object fi
evolves at each computation step for counting how many steps have been performed.
A Uniform Solution to HPP in Terms of Membrane Computing 65

[ yes]10 → [ ]1+ yes, [ f time ]10 → [ ]1+ no time = 3n(n + 3) / 2 + 5 (15)

With changing the polarization to positive, object yes in membrane 1 will be sent out
the environment to show that there is a HP among the n vertices. If at step
3n(n+3)/2+5 the polarization of the membrane 1 has not changed to positive, the an-
swer must be no and no will be sent out to the environment. The special step will be
analyzed in subsection3.2. Once the polarization is change all rules in membrane 1
cannot be applied any more and the computation halts.

3.2 An Overview of the Computation

All of foregoing rules applied in membranes labeled with 2 are illustrated in Fig.1, in
which 0, + and - respectively represent the polarization of the membranes. Fig.1 looks
like state machine, and it can be seen that different polarizations and objects indicate
distinct stages. Rules (2), (4), (9), (10) are used to change the polarizations in order to
enter another stage of computation.
At preparing stage, we have v1 in membrane labeled with 2. From v1, other vertices
in G can be obtained by application of rules (1). After n starting vertices obtained, we
will enter next stage to find HPs respectively. Generating stage consists of three
phases: a vertex that connects with the last vertex in the current path is expected to
find in search phase, then it will be add into the path through divide phase, and the
length of the path will be increased and the objects related to others edges will recover
in restore phase. Once a vertex is added in the path, all of the edges start from it will
be consumed in the extensions of the path. These three phases will be looped until the
length of the path reaches n. Checking stage is needed to decide whether the path with
length of n is HP or not. The answer yes or no will be sent out at the output stage.

di , ri d ic, ric di odi1, ric

ri ri o ri1

Fig. 1. The rules applied in membranes labeled by 2


66 H. Chen and Z. He

The number of steps in each stage is as follows:


(1) Preparing and transition to generating: at most n+1
(2) Generating stage:
Search: at most n(n+1)/2-n
Divide: at most n(n-1)
Restore: 3(n-1)
(3) Checking: 2+2(n+1)
(4) Output yes: 2
The overall number of steps is:
n+1+n(n+1)/2-n+n(n-1)+3(n-1)+2+2(n+1)+2=3n(n+3)/2+4= O (n2)
Before the step 3n(n+3)/2+4, yes will be sent out if and if there is HP in the instance
G. If at step 3n(n+3)/2+5 the polarization of the membrane 1 has not changed to posi-
tive, the answer must be no and no will be sent out to the environment.

4 Some Formal Details

In this section, we will prove that the uniform family Π of recognizer P system in
Section3 gives a polynomial solution for the HPP. According to Definition3 in
Section2, we have:
Theorem 1. HPP ∈ PMCAM.
Proof: (1) It obviously that the family Π of recognizer P system defined is AM-
consistent, because for every P system of the family is a recognizer P system with
active membrane using 2-division and all of them are confluent.
(2) The family Π of recognizer P system is polynomially uniform Turing Machine.
For each n, which is the size of an instance G, a P system can be constructed. The
rules of Π (n) are defined in a recursive manner from n. The necessary resources to
construct the system are:
Size of the alphabet: n3+(7n2+27n)/2+14= Θ( n 3 )
Initial number of membrane: 2= Θ(1)
Initial number of objects: 3= Θ(1)
The total number of evolution rules: n3+(9n2+31n+38)/2= Θ( n 3 )
All these resources are bounded polynomially time of n; therefore a Turing machine
can build the P system in polynomial time according to n.
∪∈
(3) We have the functions g: I HPP → t NI∏(t) and h: I HPP →N defined for an in-
stance u=v1, …, vn as g(u)={xi, j, k : 1≤ i, j ≤ n} and h(u)=n. Both of them are total and
polynomially computable. Furthermore, (g, h) conforms an encoding of the set in-
stances of the HPP problem in the family of recognizer P system as for any instance u
we have that g(u) is a valid input for the P system.
Π is polynomially bounded, with respect to (g, h). A formal description of the
computation let prove that Π always halts and sends to the environment the object yes
or no in the last step. The number of steps of the P system is within 3n(n+3)/2+4 if the
A Uniform Solution to HPP in Terms of Membrane Computing 67

output is yes and 3n(n+3)/2+5 if the output is no, therefore there exists a polynomial
bound for the number of steps of the computation.
Π is sound with respect to (g, h). As we describe the computations of P system of
Π above, that object yes is send out of the system means that there are n different
vertices in the path, which is the actual Hamiltonian path in the instance G according
to the definition of HPP.
Π is complete with respect to (g, h). The P system searches the HP from every ver-
tex respectively at the beginning of the computation. We will get objects such as ri (1
≤ i ≤ n) and delete objects such as xi, j, k (1 ≤ i, j ≤ n, -1 ≤ k ≤ i) corresponding to the
edges whose starting vertex vi is added to the current path. When there is a Hamilto-
nian path in the instance G with n vertices, the system Π will obtain objects r1, r2, ...,
rn in some membrane, and output object yes to the environment. □
Since this class is closed under polynomial-time reduction and complement we
have:
Theorem 2. NP ∪ co-NP ⊆ PMC AM.

5 Conclusion
In this paper, we present a uniform solution for HPP in terms of P system with
membrane division. This has been done in the framework of complexity classes in
membrane computing.
It can be observed that many membranes will not evolve any more at some step be-
fore the whole system halts. Some kind of rule should be developed to dissolve these
membranes for releasing the computing resource, and it will contribute to the con-
struction of the efficient simulators of recognizer P systems for solving NP-complete
problems.
Acknowledgments. This research is supported by Major National S&T Program of
China (#2008ZX 07315-001).

References
1. Gutiérrez-Naranjo, M.A., Pérez-Jiménez, M.J., Romero-Campero, F.J.: A uniform solution
to SAT using membrane creation. Theoretical Computer Science 371(1-2), 54–61 (2007)
2. Pan, L., Alhazov, A.: Solving HPP and SAT by P systems with active membranes and
separation rules. Acta Informatica 43(2), 131–145 (2006)
3. Păun, G.: Computing with Membranes: Attacking NP-Complete Problems. In: Proceedings
of the Second International Conference on Unconventional Models of Computation,
pp. 94–115. Springer, London (2000)
4. Păun, A.: On P Systems with Membrane Division. In: Proceedings of the Second Interna-
tional Conference on Unconventional Models of Computation, pp. 187–201. Springer,
London (2000)
5. Zandron, C., Ferretti, C., Mauri, G.: Solving NP-Complete Problems Using P Systems
with Active Membranes. In: Proceedings of the Second International Conference on
Unconventional Models of Computation, pp. 289–301. Springer, London (2000)
68 H. Chen and Z. He

6. Pérez-Jiménez, M.J., Riscos-Núñez, A.: Solving the Subset-Sum problem by active mem-
branes. New Generation Computing 23(4), 367–384 (2005)
7. Jesús Pérez-Jímenez, M., Riscos-Núñez, A.: A Linear-Time Solution to the Knapsack
Problem Using P Systems with Active Membranes. In: Martín-Vide, C., Mauri, G., Păun,
G., Rozenberg, G., Salomaa, A. (eds.) WMC 2003. LNCS, vol. 2933, pp. 250–268.
Springer, Heidelberg (2004)
8. Gutiérrez-Naranjo, M.A., Pérez-Jiménez, M.J., Riscos-Núñez, A.: A fast P system for find-
ing a balanced 2-partition. Soft Computing 9(9), 673–678 (2005)
9. Pérez-Jiménez, M.J., Romero-Campero, F.J.: Solving the BINPACKING problem by rec-
ognizer P systems with active membranes. In: Proceedings of the Second Brainstorming
Week on Membrane Computing, Seville, Spain, pp. 414–430 (2004)
10. Pérez-Jiménez, M.J., Romero-Jiménez, A., Sancho-Caparrini, F.: Computationally Hard
Problems Addressed Through P Systems. In: Ciobanu, G. (ed.) Applications of Membrane
Computing, pp. 315–346. Springer, Berlin (2006)
Self-organizing Quantum Evolutionary Algorithm
Based on Quantum Dynamic Mechanism

Sheng Liu and Xiaoming You

School of Management, Shanghai University of Engineering Science,


Shanghai 200065, China
{ls6601,yxm6301}@163.com

Abstract. A novel self-organizing Quantum Evolutionary Algorithm based on


quantum Dynamic mechanism for global optimization (DQEA) is proposed.
Firstly, population is divided into subpopulations automatically. Secondly, by
using co-evolution operator each subpopulation can obtain optimal solutions.
Because of the quantum evolutionary algorithm with intrinsic adaptivity it can
maintain quite nicely the population diversity than the classical evolutionary al-
gorithm. In addition, it can help to accelerate the convergence speed because of
the co-evolution by quantum dynamic mechanism. The searching technique for
improving the performance of DQEA has been described; self-organizing algo-
rithm has advantages in terms of the adaptability; reliability and the learning
ability over traditional organizing algorithm. Simulation results demonstrate the
superiority of DQEA in this paper.

Keywords: Quantum evolutionary algorithm, Quantum dynamic mechanism,


Self-organizing and co-evolution.

1 Introduction
Most studies have used hybrid evolutionary algorithm in solving global optimization
problems. Hybrid evolutionary algorithm takes advantages of a dynamic balance be-
tween diversification and intensification. Proper combination of diversification and
intensification is important for global optimization. Quantum computing is a very
attractive research area; research on merging evolutionary algorithms with quantum
computing [1] has been developed since the end of the 90's. Han proposed the quan-
tum-inspired evolutionary algorithm (QEA), inspired by the concept of quantum com-
puting, and introduced a Q-gate as a variation operator to promote the optimization of
the individuals Q-bit [2]. Up to now, QEA has been developed rapidly and applied in
several applications of knapsack problem, numerical optimization and other fields [3].
It has gained more attraction for its good global search capability and effectiveness.
Although quantum evolutionary algorithms are considered powerful in terms of global
optimization, they still have several drawbacks such as premature convergence. A
number of researchers have experimented with biological immunity and cultural
dynamic systems to overcome these particular drawbacks implicit in evolutionary
algorithms [4], [5]; here we experiment with quantum dynamic mechanism.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 69–77, 2009.
© Springer-Verlag Berlin Heidelberg 2009
70 S. Liu and X. You

This paper presents a self-organizing quantum evolutionary algorithm based on


quantum dynamic mechanism for global optimization (DQEA). Firstly the population
will be divided into subpopulations automatically. Due to its intrinsic adaptivity the
quantum evolutionary algorithm can enable individual to draw closer to each optimal
solution. The self-organizing operator is adopted to enable individual to converge to
the nearest optimal solution which belongs to the same subpopulations. Meanwhile,
each subpopulation can co-evolve by quantum dynamic mechanism. The intensifica-
tion strategies are the self-organizing operator and the co-evolution. The experiment
results show that DQEA is very efficient for the global optimization; usually it can
obtain the optimal solution faster. We describe quantum dynamic mechanism for
improving the performance of DQEA, its performance is compared with that of
CRIQEA[6] and MIQEA [7]. The rest of this paper is organized as follows: Section 2
presents the quantum evolutionary algorithm and related work. Section 3 gives the
structure of DQEA and describes the improving performance of the algorithm. Sec-
tion 4 presents an application example with DQEA and the other algorithms [6], [7]
for several benchmark problems, and summarizes the experimental results. Finally
concluding remarks follow in Section 5.

2 Quantum Evolutionary Algorithm and Related Work


QEA utilizes a new representation, called a Q-bit, for the probabilistic representation
that is based on the concept of qubits [2]. A qubit may be in the “1” state, in the “0”
state, or in any superposition of the two [8]. The state of a qubit can be represented
as | ϕ >=α |0 >+β |1> , where α and β are complex numbers that specify the probability
amplitudes of the corresponding states. |α|2 gives the probability that the qubit will be
found in ‘0’ state and |β|2gives the probability that the qubit will be found in the ‘1’
state. Normalization of the state to unity always guarantees: |α|2+|β|2=1. One qubit is
defined with a pair of complex numbers(α,β), And an m-qubits represention is defined
as ⎡⎢ α 1 α 2 α 3 L α m ⎤⎥ where |αi|2+|βi|2=1, i=1,2……m.
⎣β1 β 2 β3 L β m ⎦
Q-bit representation has the advantage that it is able to represent a linear superposi-
tion of states probabilistically. If there is a system of m-qubits, the system can represent
2m states at the same time. However, in the act of observing a quantum state, it collapses
to a single state. The basic structure of QEA is described in the following[2].
QEA()
{ t • 0;
Initialize Q(t) ;
Make P(t) by observing the states of Q(t) ;
Evaluate P(t) ;
Store the optimal solutions among P(t) ;
While (not termination-condition) do
{t=t+1;
Make P(t) by observing the states of Q(t-1);
Evaluate P(t) ;
Update Q(t) using Q-gate U(t);
Store the optimal solutions among P(t) ;
Self-organizing Quantum Evolutionary Algorithm 71

}
}
end.
where Q(t) is a population of qubit chromosomes at generation t, and P(t) is a set of
binary solutions at generation t.
1) In the step of ‘initialize Q(t)’, all qubit chromosomes are initialized with the
same constant 1/ 2 . It means that one qubit chromosome represents the linear super-
position of all possible states with the same probability.
2) The next step makes a set of binary solutions, P(t), by observing Q(t) states. One
binary solution is formed by selecting each bit using the probability of qubit. And
then each solution is evaluated to give some measure of its fitness.
3) The initial best solution is then selected and stored among the binary solutions,
P(t).
4) In the while loop, A set of binary solutions, P(t), is formed by observing Q(t-1)
states as with the procedure described before, and each binary solution is evaluated to
give the fitness value. It should be noted that P(t) can be formed by multiple observa-
tions of Q(t-1).
5) In the next step, ‘update Q(t)’, a set of qubit chromosomes Q(t) is updated by ap-
plying rotation gate defined below
⎡ COS ( ∆ θ i) − SIN ( ∆ θ i)⎤
U (∆θ i ) = ⎢
⎣ SIN ( ∆ θ i) COS ( ∆ θ i)⎥⎦ (1)

where ∆θ i is a rotation angle of each Q-bit towards either 0 or 1 state depending on


its sign. Fig.1 shows the polar plot of the rotation gate ∆θ i should be designed in
compliance with practical problems. Table1 can be used as an angle parameters table
for the rotation gate.
The magnitude of ∆θ i has an effect on the speed of convergence, but if it is too
big, the solutions may diverge or have a premature convergence to a local optimum.
The sign of ∆θ i determines the direction of convergence.

|1>

(¦Ái',¦Âi' )
△θ i
(¦Ái,¦Âi)

|0>

Fig. 1. Polar plot of the rotation gate


72 S. Liu and X. You

6) The best solution among P(t) is selected, and if the solution is fitter than the
stored best solution, the stored solution is replaced by the new one. The binary solu-
tions P(t) are discarded at the end of the loop.

Table 1. Lookup table of ∆θ i , where f(·) is the fitness function, and bi and xi are the i-th bits of
the best solution b and the binary solution x, respectively

xi bi f(x)≥f(b) ∆θ i
0 0 false θ1
0 0 true θ2
0 1 false θ3
0 1 true θ4
1 0 false θ5
1 0 true θ6
1 1 false θ7
1 1 true θ8

3 Quantum Evolutionary Algorithm Based on Quantum Dynamic


Mechanism
Conventional quantum evolutionary algorithm (QEA) [2],[3] is efficacious, in which
the probability amplitude of Q-bit was used for the first time to encode the chromo-
some and the quantum rotation gate was used to implement the evolving of popula-
tion. Quantum evolutionary algorithm has the advantage of using a small population
size and the relatively smaller iterations number to have acceptable solution, but they
still have several drawbacks such as premature convergence and computational time.
The flowchart of self-organizing quantum evolutionary algorithm based on quan-
tum dynamic mechanism for global optimization (DQEA) is as follows:
DQEA()
{t=0;
Initialize Q(0) with random sequences ;
Evaluate Q(0) ;
Q(0) will be divided into subpopulations Qi(0)
For all Qi(t)
While (not termination - condition) do
{t=t+1;
Update Q(t-1) using Q-gates U(t-1);
Evaluate Qi(t);
Store the optimal solutions among Qi(t);
Implement the self-organizing operator for Qi(t):
{Self-adaptively cross-mutate each cell;
Select antibodies with higher affinity as memory cells;
Quantum dynamic equilibrium states;
}
Self-organizing Quantum Evolutionary Algorithm 73

Each subpopulation can co-evolve by quantum dynamic


mechanism;
}
}
end.

Denote particle S i to be the i-th subpopulation in Fig.2, where the particle S i in a


force-field corresponds to the entry S i in the matrix S=(U( S i ), ζ i j ), U( S i ) is utility
of particle S i ; ζ i j is the intention strength between S i and S j . A particle may be
driven by several kinds of forces [9]. The gravitational force produced by the force-
field tries to drive a particle to move towards boundaries of the force-field, which
embodies the tendency that a particle pursues maximizing the aggregate benefit of
systems. The pushing or pulling forces produced by other particles are used to em-
body social cooperation or competition. The self-driving force produced by a particle
itself represents autonomy of individual. Under the exertion of resultant forces, all the
particles may move concurrently in a force-field until all particles reach their equilib-
rium states.
Given an optimization problem: ⎧⎨min f ( X )
⎩ st.g ( X )
Let U i (t ) be the utility of particle S i at time t, and let J (t ) be the aggregate utility of
all particles. They are defined by Eq(2)

fitness (t ) = exp( f ( X ));


U i (t ) = 1 − exp(− fitness (t ));
n (2)
J (t ) = α ∑U (t ),
i =1
i 0 pα p1

At time t, the potential energy function Q(t) that is related to interactive forces among
particles is defined by Eq(3)

Qi (t ) = ∑ ∫
Ui ( t )
{[1 + exp(−ζ i j x)]−1 − 0.5}dx;
0
j
(3)
QD (t ) = ∑ Qi (t )
i

ζ i j is the intention strength, which is used to describe the effect of


cooperation( ζ i j <0) or competition( ζ i j >0).
At time t, the potential energy function P(t) that is related to the gravitational force
of force-field F is defined by Eq(4)

P(t ) = ε 2 ln((∑exp(−Ui (t )2 / (2 × ε 2 ))) / n),0 p ε p 1 (4)


i

Let S i (t ) be the vertical coordinate of particle S i at time t. The dynamic equation for
particle S i is defined by Eqs (5) and (6)
74 S. Liu and X. You

Fig. 2. Particles and force-field

dSi (t ) dU (t ) dJ (t ) dP (t ) dQ (t )
= λ1 i + λ 2 D − λ 3 D − λ 4 D
dt dt dt dt dt
∂J D (t ) ∂PD (t ) ∂QD (t ) dU i (t ) (5)
= (λ1 + λ 2 − λ3 − λ4 )
∂U i (t ) ∂U i (t ) ∂U i (t ) dt
= g ( Si (t ), U i (t )), 0 ≤ λ1, λ 2, λ 3, λ 4 ≤ 1

dU i (t )
= exp(− fitness (t )) × fitness ' (t )
dt
(6)
= (1 − U i (t )) × fitness ' (t )
= g1 ( Si (t ), U i (t ))
g, g1 are functions of S i (t ) and U i (t ) .
If all particles reach their equilibrium states at time t, then finish with success.

4 Experimental Study

In this section, DQEA is applied to the optimization of well-known test functions and
its performance is compared with that of CRIQEA [6] and MIQEA [7] algorithm.
When testing the algorithm on well understood problems, there are two measures of
performance: (i) CR: Convergence Rate to global minima; (ii) C: The average number
of objective function evaluations required to find global minima.
The test examples used in this study are listed below:

x12 + x2 2 x
f1 ( x1 , x2 ) = + 1 − cos( x1 ) × cos( 2 ), −600 ≤ x1 , x2 ≤ 600 (7)
4000 2

f2 (x1, x2 ) = 4x12 − 2.1x14 + x16 / 3 + x1x2 − 4x22 + 4x24 , − 5 ≤ xi ≤ 5; (8)


Self-organizing Quantum Evolutionary Algorithm 75

f1 (Griewank function ): This is a shallow bowl upon which a rectangular grid of


alternating hills and valleys is superimposed. It has a unique global minimum of zero
at the origin, surrounded by local minima. f1 possesses about 500 other local minima,
This function is designed to be a minefield for standard optimization techniques.
To compare the performance of the proposed algorithm with the algorithm
CRIQEA [6] and MIQEA [7], we use the same population size (N=100, 4 subpopula-
tions) for test problems, the experimental results (C: number of function evaluations
and CR: Convergence Rate to global minima) obtained in 20 independent runs are
averaged and summarized in Table 2.
From the Table 2, we can see that the experimental results using the new algorithm
are somewhat good. For the test problem, DQEA can convergence to global optimal
solutions more; meanwhile, the average number of function evaluations of DQEA is
less than that of CRIQEA and MIQEA for every test problems. The results show that
our algorithm is efficient for the global optimization.
f 2 (Camelback function): It has six local minima; two of them are global minimum
f min = −1.0316285 . It was used to evaluate how the co-evolution influences the con-
vergence of the algorithm DQEA by using a quantum dynamic mechanism. The result
for the case of Camelback function, averaged over 20 trials, are shown in Fig.3. Com-
parison of the result indicates that quantum dynamic mechanism can keep parallel
optimization in complex environment and control the convergence speed. For 60 popu-
lations, the optimization value was obtained with DQEA2 (2 particles, ξ12 = ξ 21 = − 1 )
2
1
after 65 generations, whereas DQEA3 (3 particles, ξ i j = − , i ≠ j ,1 ≤ i, j ≤ 3 ) was able
3
to obtain after 37 generations. For given problem size, with the more number of sub-
populations it can explore as far as possible different search space via different popula-
tions, so it can help to obtain the optimal solutions rapidly and converge fast.

Table 2. The comparison of this algorithm DQEA and other algorithms

Function Algorithm CR Worst Best C

f1 DQEA 8/20 0.00423 0 2100

f1 CRIQEA 5/20 0.00371 0 2200

f1 MIQEA 3/20 0.00412 0 3800

f2 DQEA 10/20 -1.033180 -1.031620 550

f2 CRIQEA 6/20 -1.032920 -1.031620 625

f2 MIQEA 3/20 -1.033620 -1.031620 1050

(* C: Average number of function evaluations CR: Convergence Rate to global minima)


76 S. Liu and X. You

Fig. 3. Comparison of the convergence speed of DQEA for different particles

The certainty of convergence of the DQEA may be attributed to its ability to main-
tain the diversity of its population. In fact, the superior performance of self-organizing
operator may be attributed to its adaptability and learning ability; a larger pool of
feasible solutions enhances the probability of finding the optimum solution. Coopera-
tion of subpopulations by quantum dynamic mechanism can help to obtain the opti-
mal solution faster. Comparison of the result indicates that self-organizing operator
can keep the individual diversity and control the convergence speed.

5 Conclusions
In this study, a self-organizing quantum evolutionary algorithm based on quantum
dynamic mechanism for global optimization (DQEA) is proposed. The efficiency of
quantum evolutionary algorithm is enhanced by using the self-organizing operator.
We estimate the performance of algorithm. The efficiency of the approach has been
illustrated by applying to some test cases. The results show that integration of the self-
organizing operator in the quantum evolutionary algorithm procedure can yield sig-
nificant improvements in both the convergence rate and solution quality. Cooperation
of subpopulations by quantum dynamic mechanism can help to convergence faster.
The next work is to exploit quantum dynamic mechanism further to deal with the
global optimization.
Acknowledgments. The authors gratefully acknowledge the support of Natural Sci-
ence Foundation of Shanghai (Grant No.09ZR1420800), Development Foundation of
SUES(Grant No.2008XY18 ) and Doctor Foundation of SUES.

References
1. Narayanan, M.M.: Genetic Quantum Algorithm and Its Application to Combinatorial Opti-
mization Problem. In: Proc. IEEE International Conference on Evolutionary Computation
(ICEC 1996), pp. 61–66. IEEE Press, Piscataway (1996)
Self-organizing Quantum Evolutionary Algorithm 77

2. Han, K.H., Kim, J.H.: Quantum-inspired Evolutionary Algorithm for a Class of Combinato-
rial Optimization. IEEE Transactions on Evolutionary Computation 6(6), 580–593 (2002)
3. Han, K.H., Kim, J.H.: Quantum-Inspired Evolutionary Algorithms with a New Termination
Criterion, H&epsi; gate, and Two-Phase Scheme. IEEE Transactions on Evolutionary Com-
putation 8(2), 156–169 (2004)
4. Fukuda, T., Mori, K., Tsukiyama, M.: Parallel search for multi-modal function optimization
with diversity and learning of immune algorithm. In: Artificial Immune Systems and Their
Applications, pp. 210–220. Spring, Berlin (1999)
5. Saleem, S.: Knowledge-Based Solutions to Dynamic Problems Using Cultural Algorithms.
PhD thesis, Wayne State University, Detroit Michigan (2001)
6. You, X.M., Liu, S., Sun, X.K.: Immune Quantum Evolutionary Algorithm based on Chaotic
Searching Technique for Global Optimization. In: Proc. The First International Conference
on Intelligent Networks and Intelligent Systems, pp. 99–102. IEEE Press, New York (2008)
7. You, X.M., Liu, S., Shuai, D.X.: Quantum Evolutionary Algorithm Based on Immune The-
ory for Multi-Modal Function Optimization. Journal of Petrochemical Universities 20, 45–
49 (2007)
8. Hey, T.: Quantum Computing: An Introduction. Computing & Control Engineering Jour-
nal 10, 105–112 (1999)
9. Shuai, D., Shuai, Q., Liu, Y., Huang, L.: Particle Model to Optimize Enterprise Computing.
In: Research and Practical Issues of Enterprise Information Systems. International Federa-
tion for Information Processing, vol. 205, pp. 109–118. Springer, Boston (2006)
Can Moral Hazard Be Resolved by
Common-Knowledge in S4n-Knowledge?

Takashi Matsuhisa

Department of Natural Sciences, Ibaraki National College of Technology


Nakane 866, Hitachinaka-shi, Ibaraki 312-8508, Japan
[email protected]

Abstract. This article investigates the relationship between common-


knowledge and agreement in multi-agent system, and to apply the agree-
ment result by common-knowledge to the principal-agent model under
non-partition information. We treat the two problems: (1) how we cap-
ture the fact that the agents agree on an event or they get consensus
on it from epistemic point of view, and (2) how the agreement theorem
will be able to make progress to settle a moral hazard problem in the
principal-agents model under non-partition information. We shall pro-
pose a solution program for the moral hazard in the principal-agents
model under non-partition information by common-knowledge. Let us
start that the agents have the knowledge structure induced from a re-
flexive and transitive relation associated with the multi-modal logic S4n.
Each agent obtains the membership value of an event under his/her pri-
vate information, so he/she considers the event as fuzzy set. Specifically
consider the situation that the agents commonly know all membership
values of the other agents. In this circumstance we shall show the agree-
ment theorem that consensus on the membership values among all agents
can still be guaranteed. Furthermore, under certain assumptions we shall
show that the moral hazard can be resolved in the principal-agent model
when all the expected marginal costs are common-knowledge among the
principal and agents.

Keywords: Moral hazard, Principal-agents model under uncertainty,


Common-Knowledge, Agreeing to disagree, Modal logic S4n.

AMS 2000 Mathematics Subject Classification: Primary 91A35,


Secondary 03B45.

Journal of Economic Literature Classification: C62, C78.

1 Introduction

This article considers the relationship between common-knowledge and agree-


ment in multi-agent system. How we capture the fact that the agents agree on
an event or they get consensus on it? We treat the problem from fuzzy set theo-
retical flavour. The purposes are first to introduce a knowledge revision system

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 78–85, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 79

through communication on multi-agent system, and by which we show that all


agents can agree on an event, and second to apply the result to solving the moral
hazard in a principal-agents model under asymmetric information. Let us con-
sider there are agents more than two and the agents have the structure of the
Kripke semantics for the multi-modal logic S4n.
Assume that all agents have a common probability measure. By the mem-
bership value of an event under agent i’s private information, we mean the
conditional probability value of the event under agents’ private information. We
say that consensus on the set can be guaranteed among all agents (or they agree
on it) if all the the membership values are equal.
R.J. Aumann [1] considered the situation that the agents have common-
knowledge of the membership values; that is, simultaneously everyone knows
the membership values, and everyone knows that ‘everyone knows the values’
and everyone knows that “everyone knows that ‘everyone knows the values” and
so on. He showed the famous agreement theorem for a partition information
structure equivalent to the multi-modal logic S5n, and D. Samet [5] and T.
Matsuhisa and K. Kamiyama [4] extend the theorem to several models corre-
sponding to weaker logic S4n etc.
Theorem 1 (Aumann [1], Samet [5], Matsuhisa et al [4] ). The agents can
agree on an event if all membership values of the event under private information
are common-knowledge among them.
We shift our attention to the principal-agents model as follows: an owner (princi-
pal) of a firm hires managers (agents), and the owner cannot observe how much
effort the managers put into their jobs. In this setting, the problem known as
the moral hazard can arise: There is no optimal contract generating the same
effort choices for the manager and the agents. We apply Theorem 1 to solve the
problem. The aim is to establish that
Theorem 2. The owner and the managers can reach consensus on their expected
marginal costs for their jobs if their expected marginal costs are common-knowledge.
This article organises as follows. In Section 2 we describe the moral hazard in
our principal-agents model. Sections 3 and 4 introduce the notion of common-
knowledge associated with reflexive and transitive information structure and the
notion of decision function. Section 5 gives the formal statement of Theorem 1.
In Section 6 we introduce the formal description of a principal-agents model
under asymmetric information. We will propose the program to solve the moral
hazard in the model: First the formal statement of Theorem 2 is given, and
secondly what further assumptions are investigated under which Theorem 2 is
true. In the final section we conclude with remarks.

2 Moral Hazard
Let us consider the principal-agents model as follows: There are the principal
P and n agents {1, 2, · · · , k, · · · , n} (n ≥ 1) in a firm. The principal makes a
80 T. Matsuhisa

profit by selling the productions made by the agents. He/she makes a contact
with each agent k that the total amount of all profits is refunded each agent k
in proportion to the agent’s contribution to the firm.
Let ek denote the measuring managerial effort for k’s productive activities.
The set of possible efforts for k is denoted by Ek with Ek ⊆ R. Let Ik (·) be a
real valued continuously differentiable function on Ek . It is interpreted as the
profit by selling the productions made by the agent k with the cost c(ek ). Here
we assume Ik (·) ≥ 0 and the cost function c(·) is a real valued continuously
differentiable function on E = ∪nk=1 Ek . Let IP be the total amount of all the
research grants awarded:
n
IP = Ik (ek ).
k=1

The principal P cannot observe these efforts ek , and shall view it as a random
variable on a probability space (Ω, µ). The optimal plan for the principal then
solves the following problem:
n
Maxe=(e1 ,e2 ,··· ,ek ,··· ,en ) {Exp[IP (e)] − ck (ek )}.
k=1

Let Wk (ek ) be the total amount of the refund to agent k:

Wk (ek ) = rk IP (e),
n
with k=1 rk = 1, 0 ≤ rk ≤ 1, where rk denotes the proportional rate represent-
ing k’s contribution to the firm. The optimal plan for each agent also solves the
problem: For every k = 1, 2, · · · , n,

Maxek {Exp[Wk (ek )] − c(ek )}


n
subject to rk = 1, 0 ≤ rk ≤ 1.
k=1

We assume that rk is independent of ek , and the necessity conditions for critical


points are as follows: For each agent k = 1, 2, · · · , n, we obtain

Exp[Ik (ek )] − c (ek ) = 0
∂ek

rk Exp[Ik (ek )] − c (ek ) = 0
∂ek
in contradiction. This contradictory situation is called the moral hazard in the
principal-agents model.

3 Common-Knowledge
Let N be a set of finitely many agents and i denote an agent. The specification
is that N = {P, 1, 2, · · · , k, · · · , n} consists of the president P and the faculty
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 81

members {1, 2, · · · , k, · · · , n} in the college. A state-space Ω is a non-empty set,


whose members are called states. An event is a subset of the state-space. If Ω is
a state-space, we denote by 2Ω the field of all subsets of it. An event E is said
to occur in a state ω if ω ∈ E.

3.1 Information and Knowledge

By RT-information structure 1 we mean Ω, (Πi )i∈N  in which Πi : Ω → 2Ω


satisfies the two postulates: For each i ∈ N and for any ω ∈ Ω,

Ref ω ∈ Πi (ω);
Trn ξ ∈ Πi (ω) implies Πi (ξ) ⊆ Πi (ω).

This structure is equivalent to the Kripke semantics for the multi-modal logic
S4n. The set Πi (ω) will be interpreted as the set of all the states of nature that
i knows to be possible at ω, or as the set of the states that i cannot distinguish
from ω. We call Πi (ω) i’s information set at ω.
A partition information structure is a RT-information structure Ω, (Πi )i∈N 
satisfying the additional postulate: For each i ∈ N and for any ω ∈ Ω,

Sym If ξ ∈ Πi (ω) then ω ∈ Πi (ξ).

We will give the formal model of knowledge as follows:2

Definition 1. The S4n-knowledge structure ( or simply, knowledge structure)


is a tuple Ω, (Πi )i∈N , (Ki )i∈N  that consists of a RT-information structure
Ω, (Πi )i∈N  and a class of i’s knowledge operator Ki on 2Ω defined by

Ki E = {ω | Πi (ω) ⊆ E }

The event Ki E will be interpreted as the set of states of nature for which i knows
E to be possible.
We record the properties of i’s knowledge operator3: For every E, F of 2Ω ,

N Ki Ω = Ω;
K Ki (E ∩ F ) = Ki E ∩ Ki F ;
T Ki E ⊆ E
4 Ki E ⊆ Ki (Ki E).

Remark 1. If (Π)i∈N is a partition information structure, the operator Ki sat-


isfies the additional property:

5 Ω \ Ki E ⊆ Ki (Ω \ Ki E).

1
This stands for a reflexive and transitive information structure.
2
C.f.; Fagin et al [2].
3
According to these properties we can say the structure Ω, (Ki )i∈N  is a model for
the multi-modal logic S4n.
82 T. Matsuhisa

3.2 Common-Knowledge and Communal Information

The mutual knowledge operator KE : 2Ω → 2Ω is the intersection of all individual


knowledge operators: KE F = ∩i∈N Ki F , which interpretation is that everyone
knows E.

Definition 2. The common-knowledge operator KC : 2Ω → 2Ω is defined by

KC F = ∩n∈N (KE )n F.

The intended interpretations are as follows: KC E is the event that ‘every-


one knows E’ and “everyone knows that ‘everyone knows E’,” and “‘every-
body knows that “everyone knows that ‘everyone knows E’,” .”’ An event E is
common-knowledge at ω ∈ Ω if ω ∈ KC E.
Let M : 2Ω → 2Ω be the dual of the common- knowledge operator KC :

M E := Ω \ KC (Ω \ E).

By the communal information function we mean the function M : Ω → 2Ω


defined by M (ω) = M ({ω}). It can be plainly observed that the communal
information function has the following properties:

Proposition 1. Notations are the same as above. Then ω ∈ KC E if and only


if M (ω) ⊆ E

Proof. See, Matsuhisa and Kamiyama [4].

4 Decision Function and Membership Values

Let Z be a set of decisions, which set is common for all agents. By a decision
function we mean a mapping f of 2Ω × 2Ω into the set of decisions Z. We refer
the following properties of the function f : Let X be an event.
DUC (Disjoint Union Consistency): For every pair of disjoint events S and T ,
if f (X; S) = f (X; T ) = d then f (X; S ∪ T ) = d;
PUD (Preserving Under Difference): For all events S and T such that S ⊆ T ,
if f (X; S) = f (X; T ) = d then f (X; T \ S) = d.
By the membership function associated with f under agent i’s private infor-
mation we mean the function di from 2Ω × Ω into Z defined by di (X; ω) =
f (X; Πi (ω)), and we call di (X; ω) the membership value of X associated with f
under agent i’s private information at ω.
Definition 3. We say that consensus on X can be guaranteed among all agents
(or they agree on it) if di (X; ω) = dj (X; ω) for any agent i, j ∈ N and in all
ω ∈ Ω.
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 83

Remark 2. If f is intended to be a posterior probability, we assume given a prob-


ability measure µ on a state-space Ω which is common for all agents; precisely,
for some event X of Ω, f (X; ·) is given by f (X; ·) = µ(X|·). Then the mem-
bership value of X is the conditional probability value di (X; ω) = µ(X|Πi ω)).
The pair (X, di ) can be considered as as a fuzzy set X associated with agent i’s
membership function di . Consensus on X guaranteed among all agents can be
interpreted as that the fuzzy sets (X, di ) and (X, dj ) are equal for any i, j ∈ N .

5 Agreeing to Disagree Theorem

We can now state explicitly Theorem 1 as below: Let D be the event of the
membership degrees of an event X for all agents at ω, which is defined by

D = ∩i∈N {ξ ∈ Ω | di (X; ξ) = di (X; ω)}.

Theorem 3. Assume that the agents have a S4n-knowledge structure and the
decision function f with satisfying the two properties (DUC) and (PUD). If
ω ∈ KC D then di (X; ω) = dj (X; ω) for any agents i, j ∈ N and in all ω ∈ Ω.

Proof. Will be given in the same line in Matsuhisa and Kamiyama [4].

6 Moral Hazard Revisited

This section investigates the moral hazard problem from the common-knowledge
view point. Let us reconsider the principal-agents model and let notations and
assumptions be the same in Section 2. We show the evidence of Theorem 2
under additional assumptions A1-2 below. This will give a possible solution of
our moral hazard problem.

A1 The principal P has a RT-information {ΠP (ω) | ω ∈ Ω} of Ω, and each


faculty member k has also his/her a RT-information {Πk (ω) | ω ∈ Ω};
A2 For each ω, ξ ∈ Ω there exists the decision function f : 2Ω × 2Ω → R
satisfying the Disjoint Union Consistency together with
(a) f ({ξ}; ΠP (ω)) = ∂ek∂(ξ) Exp[IP (e)|ΠP (ω)];

(b) f ({ξ}; Πk (ω)) = ∂ek (ξ) Exp[Wk (e)|Πk (ω)]

We have now set up the principal-agents model under asymmetric information.


The optimal plans for principal P and agent k are then to solve
n
PE Maxe=(ek )k=1,2,··· ,n {Exp[IP (e)|ΠP (ω)] − k=1 c
k (ek )};
n
AE Maxek {Exp[Wk (ek )|Πk (ω)] − c(ek )} subject to k=1 Ik (rk ) = 1 with 0 ≤
rk ≤ 1.
84 T. Matsuhisa

From the necessity condition for critical points together with A2 it can been
seen that the principal’s marginal expected costs for agent k is given by

cP (ek (ξ); ω) = f (ξ; ΠP (ω)),

and agent k’s expected marginal costs is also given by

ck (ek (ξ); ω) = f (ξ; Πk (ω)).

To establish this solution program we have to solve the problem: Construct


the information structure together with decision function such that the above
conditions A1 and A2 are true. Under these circumstances, a resolution of the
moral hazard given by Theorem 2 will be restate as follows: We denote

[c (e(·); ω)] = ∩i∈N ∩ξ∈Ω {ζ ∈ Ω|f (ξ; Πi (ζ)) = f (ξ; Πi (ω))}.

interpreted as the event of all he expected marginal costs.


Theorem 4. Under the conditions A1 and A2 we obtain that for each ξ ∈ Ω,
if ω ∈ KC ([c (e(ξ))]) then cP (ek (ξ)) = ck (ek (ξ)) for any k = 1, 2, · · · , n.
Proof. Follows immediately from Theorem 3.
Remark 3. To establish Theorem 4 we have to solve the problem: Construct the
information structure (Πi )i∈N together with decision function f such that the
above conditions A1 and A2 are true.

7 Concluding Remarks
It ends well this article to pose additional problems for making further progresses:
1. If the proportional rate rk representing k’s contribution to the college de-
pends only on his/her effort for research activities in the principal-agents model,
what solution can we have for the moral hazard problem?
2. Can we construct a communication system for the principal- agents model,
where the agents including Principal communicate each other about their ex-
pected marginal cost as messages? The recipient of the message revises his/her
information structure and recalculates the expected marginal cost under the
revised information structure. The agent sends the revised expected marginal
cost to another agent according to a communication graph, and so on. In the
circumstance does the limiting expected marginal costs actually coincide ? Mat-
suhisa [3] introduces a fuzzy communication system and extends Theorem 3 in
the communication model. By using this model Theorem 4 can be extended in
the communication framework, and the detail will be reported in near future.

References
1. Aumann, R.J.: Agreeing to disagree. Annals of Statistics 4, 1236–1239 (1976)
2. Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning about Knowledge. MIT
Press, Cambridge (1995)
Can Moral Hazard Be Resolved by Common-Knowledge in S4n-Knowledge? 85

3. Matsuhisa, T.: Fuzzy communication reaching consensus under acyclic condition. In:
Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 760–767.
Springer, Heidelberg (2008)
4. Matsuhisa, T., Kamiyama, K.: Lattice structure of knowledge and agreeing to
disagree. Journal of Mathematical Economics 27, 389–410 (1997)
5. Samet, D.: Ignoring ignorance and agreeing to disagree. Journal of Economic
Theory 52, 190–207 (1990)
Casuist BDI-Agent: A New Extended BDI Architecture
with the Capability of Ethical Reasoning

Ali Reza Honarvar1 and Nasser Ghasem-Aghaee2


1
Department of Computer Engineering, Sheikh Bahaei University, Isfahan, Iran
2
Department of Computer Engineering, University of Isfahan, Isfahan, Iran
[email protected], [email protected]

Abstract. Since the intelligent agent is developed to be cleverer, more complex,


and yet uncontrollable, a number of problems have been recognized. The capa-
bility of agents to make moral decisions has become an important question,
when intelligent agents have developed more autonomous and human-like. We
propose Casuist BDI-Agent architecture which extends the power of BDI archi-
tecture. Casuist BDI-Agent architecture combines CBR method in AI and bot-
tom up casuist approach in ethics in order to add capability of ethical reasoning
to BDI-Agent.

Keywords: ethical artificial agent, explicit ethical agent, BDI agent, ethical rea-
soning, CBR-BDI agent.

1 Introduction

At the present moment, many researchers work on projects like expert systems which
can assist physicians for diagnosing malady of patient, warplanes which can be oper-
ated and used without human operators in war, autonomous driverless vehicles which
can be used for urban transportation, and suchlike too. The common goal between
these projects is augmentation of autonomy in behavior of such machines. As a con-
sequence of this autonomy they can act on behalf of us without any interference and
guidance, so it causes human comfort in their duties. But if we do not consider and
control the autonomy of these entities, we will face serious problems because of the
confidence in intelligence of autonomous system without any control and restriction
on their operations.
The new interdisciplinary research area of “Machine Ethics” is concerned with
solving this problem [1,2,3]. Anderson proposed Machine Ethics as a new issue which
consider the consequence of machine’s behavior on humanlike. The ideal and ulti-
mate goal of this issue is the implementation of ethics in machines, as machines can
autonomously detect the ethical effect of their behavior and follow an ideal ethical
principle or set of principles, that is to say, it is guided by this principle or these prin-
ciples in decisions it makes about possible courses of actions it could take [3]. So with
simulation of ethics in autonomous machine we can avoid the problems of autonomy
in autonomous machines.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 86–95, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Casuist BDI-Agent: A New Extended BDI Architecture 87

James Moor, a professor of philosophy at Dartmouth College, one of the founding


figures in the field of computer ethics, has proposed a hierarchical schema for catego-
rizing artificial ethical agent [4, 5]. At the lowest level is what he calls “ethical impact
agents”: basically any machine that can be evaluated for its ethical consequences.
Moor’s own rather nice example is the replacement of young boys with robot in the
dangerous occupation of camel jockey in Qatar. In fact, it seems to us that all robots
have ethical impacts, although in some cases they may be harder to discern than others.
At the next level are Moor calls “implicit ethical agents”: machine whose designers
have made an effort to design them so that they don’t have negative ethical effects, by
addressing safety and critical reliability concerns during the design process. Arguably,
all robots should be engineered to be implicit ethical agents, insofar as designers are
negligent if they fail to build in processes that assure safety and reliability. In [6] we
introduce an example of implicit ethical traffic controller agent which control urban
transportation ethically at intersections.
Next come “explicit ethical agents”: machines that reason about ethical using ethi-
cal categories as part of their internal programming, perhaps using various forms of
deontic logic that have been developed for representing duties and obligations, or a
variety of other techniques.
Beyond all these lie full ethical agents: those that can make explicit moral judg-
ments and are generally quite component in justifying such decisions. This level of
performance is often presumed to require a capacity for consciousness, intentionality,
and free will.
In order to control autonomous behavior of agents and to avert possible harmful
behavior from increasingly autonomous machines, In this paper we prepare necessary
elements for adding ethical decision making capability to BDI-Agents and propose an
architecture for implementation of explicit ethical agents based on case-based reason-
ing (CBR), BDI agent, casuistry and Consequentialist theories of ethics.
The main objective of this research is to prove that the integration of case-based
reasoning (CBR) with BDI (Belief-Desire-Intention) agent models can implement the
combination of bottom-up casuistry approach with a top-down consequentiallist the-
ory in ethics. With this proposal we can add necessary elements to a novel, BDI-
Agent Architecture in order to add ethical decision making capability to agents. In
section two we introduce the basic Preliminaries of CBR, BDI agent, casuistry and
Consequentialist theories in ethics. In other sections the details of casuist BDI-Agent
architecture will be introduced.

2 Preliminaries

2.1 BDI Agents

BDI agents have been widely used in relatively complex and dynamically changing
environments. BDI agents are based on the following core data structures: beliefs,
desires, intentions, and plans [7]. These data structures represent respectively, infor-
mation gathered from the environment, a set of tasks or goals contextual to the envi-
ronment, a set of sub-goals that the agent is currently committed, and specification of
how sub goals may be achieved via primitive actions. The BDI architecture comes
88 A.R. Honarvar and N. Ghasem-Aghaee

with the specification of how these four entities interact, and provides a powerful
basis for modeling, specifying, implementing, and verifying agent-based systems.

2.2 Case-Based Reasoning

Case-based reasoning (CBR) has emerged in the recent past as a popular approach to
learning from experience. Case-based reasoning (CBR) is a reasoning method based
on the reuse of past experiences which called cases [8]. Cases are description situa-
tions in which agents with goals interact with the world around them. Cases in CBR
are represented by a triple (p,s,o), where p is a problem, s is the solution of the prob-
lem, and o is the outcome (the resulting state of the world when the solution is carried
out). The basic philosophy of CBR is that the solution of successful cases should be
reused as a basis for future problems that present a certain similarity [9]. Cases with
unsuccessful outcomes or negative cases may provide additional knowledge to the
system, by preventing the agent from repeating similar actions that leads to unsuc-
cessful results or states.

2.3 Casuistry

The term “casuistry” refers descriptively to a method of reasoning for resolving per-
plexities about difficult cases that arise in moral and legal contexts [10]. Casuistry is a
broad term that refers to a variety of forms of case-based reasoning Used in discus-
sions of law and ethics. casuistry is often understood as a critique of a strict principle-
based approach to reasoning [11]. For example, while a principle-based approach may
conclude that lying is always morally wrong, the casuist would argue that lying may
or may not be wrong, depending on the details surrounding the case. For the casuist,
the circumstances surrounding a particular case are essential for evaluating the proper
response to a particular case. Casuistic reasoning typically begins with a clear-cut,
paradigm case which means "pattern" or "example". From this model case, the casuist
would then ask how close the particular case currently under consideration matches
the paradigm case. Cases similar to the paradigm case ought to be treated in a similar
manner; cases unlike the paradigm case ought to be treated differently. The less a
particular case resembles the paradigm case, the weaker the justification for treating
that particular case like the paradigm case.

2.4 Consequentialist

Consequentialist refers to those moral theories which hold that the consequences of a
particular action form the basis for any valid moral judgment about that action. Thus,
from a consequentialist standpoint, a morally right action is one that produces a good
outcome, or consequence [12].

3 Architecture of Casuist BDI-Agent


Value is the only basis for determining what it is moral to do [10]. If we can propose a
mechanism for determining the value of each action from ethical perspective then we
can claim that, it is possible for humans/non-humans to behave ethically by using the
Casuist BDI-Agent: A New Extended BDI Architecture 89

evaluation of each action without knowing codes of ethics explicitly. When human
dose any action, he will see the result of his action in future. If he considers implicitly
the result of his action from ethical aspect, he can use this experience in future situa-
tions. When he is situated in similar situation to previous case, he will use his experi-
ences and behave similarly if his previous action is successful and implicitly ethical
(we assume Humans don’t know codes of ethics). This idea called “casuistry” in ethics.
For implementing ethical decision making capability in artificial agents we use this
idea which is related to casuistry bottom-up approach in ethics. For considering and
evaluating situation from ethical aspect (without knowing codes of ethics) we use and
adapt previous works that try to make ethics computable.
In BDI architecture, agent behavior is composed of beliefs, desires, and intentions.
These mental attitudes determine the agent’s behavior. In casuist BDI-Agent architec-
ture, BDI-agent’s behavior is adapted to behave ethically. In this architecture, agents
sense environment <E> and make a triple which shows current situation that consists
of current agent’s beliefs, desires and details of sensed environment. The current
situation is denoted by a triple <E, B, D>. The current situation is delivered to Case-
Retriever of Casuist BDI-architecture. Case-Retriever is responsible for retrieving
previous cases which is similar to current situation. As each case in case memory
consist of solution part that shows how agent should act on basis of situation part of a
case, If any case is retrieved, agent should accept the solution part, adapt it and be-
have similar to solution part of retrieved case. If no case is retrieved, agent behaves
like normal BDI-agent. In this situation the evaluator part of casuist BDI-Agent archi-
tecture comes to play its role. Evaluator part evaluates the result of agent’s behavior.
The result of this evaluation is denoted by a <EV>. This evaluation is sent to Case-
Updater. Case-Updater creates a new case and saves the current situation in situation
part of a case, the behavior of agent in solution part of a case and the evaluation in
outcome part of a case ( if this case is not exist in case memory, otherwise it updates
previous cases). Fig. 1 shows the general structure of Casuist BDI-Agent architecture.
The algorithm of casuist BDI-Agent is described below.

Algorithm of Casuist BDI-Agent:

Repeat until End


E = Agent.Sense ( Environment )
B = Agent.Beliefs ( )
D = Agent.Desires ( )
// <E,B,D> denotes Current situation
Prv-cases=Case-Retriever ( <E,B,D> )
If notEmpty ( Prv-cases ) then
I = Agent.ReuseAndAdapot (Prv-cases )
// “I” Denotes Agent’s intention
Else
I = Agent.MakeDecision ( <E,B,D > )
EV = Case-Evaluator ( <E,B,D> )
CM = Case-Updator ( <EV,E,B,D,I> )
End Repeat
90 A.R. Honarvar and N. Ghasem-Aghaee

BDI-Agent Environment

Case-Retriever Case-Evaluator

Case Memory Case-Evaluator

Fig. 1. Structure of Casuist BDI-Agent architecture

4 Structure of Cases in Casuist BDI-Agent


CBR is one form of human reasoning can be used within Agent-based systems in
order to add ethical decision capability to agents. Generally each case has two parts
[13]: Description of a problem or a set of problems and Description of the solution
of this problem. Additional components can be added, if CBR is used in different
domains.
In casuist BDI-Agent, experiences are stored in a Case-Memory (CM) in the form
of cases, rather than rules. When a new situation is encountered, the agent reviews the
cases to try to find a match for this particular situation. If an ethical match is found,
then that case can be used to solve the new problem ethically, otherwise agent makes
autonomous decisions, i.e., knowing its goal and beliefs, be able to decide for them-
selves what actions they should perform next.
For constructing the structure of cases in casuist BDI-Agent architecture we use
previous works [14, 15], which are tried to combine CBR with BDI architecture, add
necessary details to them in order to prepare them for Casuist BDI architecture.
Each case has three main parts:
1. Problem: which is used for storing the environment situation that agent should act
upon that, current beliefs (knowledge base of agent on application domain) and
current desires of agent at specific moment.
2. Solution: This is used for storing the decisions or intentions of agent for solving
the problem.
3. Outcome: This is used for storing the result and ethical evaluation of performing
the solution in environment at specific moment.
As the concentration of this paper is on ethical decision making capability of
agents, the outcome part of cases in casuist BDI architecture is discussed in more
detail. The duty of outcome part of a case is to store the ethical evaluation of the
result of performing the solution part in problem or situation part of a case. [16]
Introduced a conceptual framework for ethical situations that involve artificial
Casuist BDI-Agent: A New Extended BDI Architecture 91

agents such as robots. It typifies ethical situations in order to facilitate ethical


evaluations. This typification involves classifications of ethical agents and patients
according to whether they are human beings, human-based organizations, or non-
human beings. This classification based on the notions of moral agent and moral
patient which are mentioned in [17]. Moral patients are entities that can be acted
upon for good or evil. Moral Agents are entities that can perform actions, again for
good or evil. In this framework according to whether moral agent or moral patient is
human beings, human-based organizations, or non-human beings, nine categories
are created. Fig. 2 shows this classification.

Human Organization Artificial Entity

Moral Agent

Acts on

Moral Patient

Human Organization Artificial Entity

Fig. 2. Basic agent-patient model [16]

We decide to propose a general evaluation method, we use a partial of this classifi-


cation where the moral agent is artificial agent (non-Human being) and moral patient
are human beings, human-based organizations, or non-Human beings. This part is
separated in Fig. 3.

Agent
Human Organization Artificial
Agent
Human A human acts An An artificial
on a human organization agent acts on
P acts on a a human
human
A
Organization A human acts An An artificial
T
on an organization agent acts on
I organization acts on an an
organization organization
E
Artificial A human acts An An artificial
N Agent on an artificial organization agent acts on
agent acts on an an artificial
T artificial agent agent

Fig. 3. Categories according to agents and patients (modified version of [16])


92 A.R. Honarvar and N. Ghasem-Aghaee

With this new simplification, outcome part of a case is divided to three sections.
These sections contain ethical evaluation values of performed agent’s intentions on
humans, organizations and artificial agents or non-humans. Ethical evaluation values
are calculated by a Case-Evaluator of casuist BDI architecture. This component is
described next section. The main structure of each case in casuist BDI architecture is
illustrated in Fig. 4. According to the specific application domain of artificial ethical
agent, more details can be added to this structure.

( Problem environment situation, Belief, Desire )

Solution
(intentions)

Outcome
On humans x
On y
organizations
On artificial z
agents

Fig. 4. Basic structure of cases in Case Memory

5 Case-Evaluator of Casuist BDI-Agent


We decide to add capability of ethical reasoning to BDI agent by evaluating the ex-
periences of agent from ethical aspect, and use these experiences plus its evaluations
for ethical decision making in future. In this section we introduce the Case-Evaluator
component of Casuist BDI architecture. This evaluator bases on previous attempts to
make ethics computable [1, 12, 16]. According to classification of ethical situation
which is introduced in previous section, the result of performing agent’s decisions or
intentions can effect on three kinds of entities: humans, non-humans and organiza-
tions. Case-Evaluator uses this notion to evaluate ethical aspects of affected environ-
ment. Case-Evaluator takes Agent’s Beliefs, Agent’s Desires, and environment status
before and after agent’s behavior as inputs to determine and compute the kinds, num-
bers, duration and intensity of pleasure/displeasure of entities which is affected by
agent’s behavior. These elements are necessary elements to evaluate a situation ethi-
cally from consequentiallist viewpoint [1, 12]. To propose a case evaluator, we adopt
and extend previous works which are tried to make consequentillist theories of ethics
computable. Gips In [12] propose an equation for consequentialist evaluation
schemes. This equation has the following form:
σ ܹ௜ ‫ܲ  כ‬௜
(1)
Where wi is the weight assigned each person and pi is the measure of pleasure or hap-
piness or goodness for each person. In [1] Aderson and Aremn propose a modified
version of these equations. For each person, the algorithm simply computes the product
of the intensity, the duration, and the probability, to obtain the net pleasure for each
person, and then it adds the individual net pleasure to obtain the Total Net Pleasure.
Casuist BDI-Agent: A New Extended BDI Architecture 93

ܶ‫݁ݎݑݏ݈ܽ݁ܲݐ݁ܰ ݈ܽݐ݋‬ 

ൌ ෍ሺ –‡•‹–› ൈ —”ƒ–‹‘ (2)



ൈ ”‘„ƒ„‹Ž‹–›ሻˆ‘”‡ƒ Šƒˆˆ‡ –‡†‹†‹˜‹†—ƒŽ

This computation would be performed for each alternative action. The action with the
highest Total Net Pleasure is the right action. In other words we can say their pro-
posed equation is equal to Gips’s equations with extended parameter of time. Accord-
ing to Gips’s equation, this new equation has the following form:
σ ܹ௜ ‫ܲ  כ‬௜ ‫ܶ  כ‬௜ (3)
Where wi is the weight assigned each person, pi is the measure of pleasure or happi-
ness or goodness for each person and Ti is the duration of action’s pleasure for each
person. Our Case-Evaluator considers the ethical classifications of situation which is
introduced. This ethical evaluator determines the kind of entity which is affected by
each agent’s intention or behavior, Computes duration, intensity and probability of
effects of intentions on each entity. These evaluations send to Case-Updater compo-
nent of Casuist BDI agent for updating the experiences of agent or Case Memory. The
evaluation method of this evaluator is described in equation 7 which uses equation 4-6
for its operation:
ܶܰܲ‫ ܪ‬ൌ σ௡௜ୀ଴ Š୧ ‫Š  כ‬୧ ‫Š  כ‬୧  (4)

TNPH is the total net pleasure of humans after agent’s behavior. Whi is the weight
assigned to humans which shows the importance of each person in specific situation
and application domain. Phi is the probability that a person is affected. Thi is the dura-
tion of pleasure/displeasure of each person after agent’s behavior. n shows the num-
bers of people in that situation.
ܱܶܰܲ ൌ σ௡௜ୀ଴ ‘୧ ‫‘  כ‬୧ ‫‘  כ‬୧ (5)
TNPO is the total net pleasure of organizations after agent’s behavior. Woi is the
weight assigned to organizations which shows the importance of each organization in
specific situation and application domain. Poi is the probability that an organization is
affected. Toi is the duration of pleasure/displeasure of each organization after agent’s
behavior. n shows the numbers of organization in that situation.
ܶܰܲ‫ ܣ‬ൌ σ௡௜ୀ଴ ƒ ୧ ‫ ƒ  כ‬୧ ‫ ƒ  כ‬୧ (6)
TNPA is the total net pleasure of artificial agent after agent’s behavior. Wai is the
weight assigned to artificial agents which shows the importance of each artificial
agent in specific situation and application domain. Pai is the probability that an artifi-
cial agent is affected. Tai is the duration of pleasure/displeasure of each artificial
agent after agent’s behavior. n shows the numbers of artificial agents in that situation.
ܶܰܲ ൌ ܶܰܲ‫ܹ  כ ܪ‬௛ ൅ ܱܶܰܲ ‫ܹ  כ‬௢ ൅ ܶܰܲ‫ܹ  כ ܣ‬௔ (7)
94 A.R. Honarvar and N. Ghasem-Aghaee

TNP is the total net pleasure of all three kinds of entities which participate in applica-
tion domain of agent’s behavior. Wh, Wo, Wa illustrate the participation degree of
humans, organizations and artificial agents, respectively. The summation of Wh, Wo
and Wa equals one.
TNPH, TNPO, TNPA and TNP are stored in outcome part of a case in Case Mem-
ory. These values can use by a Case-Retriever for retrieving cases when agents en-
counter a problem.

6 Conclusion
In this paper a new extension of BDI architecture, Casuist BDI-Agent, proposed
which has the previous capability of BDI-agent architecture in addition to capability
of ethical decision making. This architecture can be used for designing BDI-Agent in
domains where BDI-Agent can be used and ethical considerations are important issue.
With the aid of this new architecture, agents can consider ethical effects of their be-
haviors when they make decision. The main idea in this architecture is based on a
method of ethical reasoning which uses previous experiences and doesn’t use any
codes of ethics. The main advantage of using this method for implementing ethical
decision making capability in agent is elimination of needs to convert a set of ethical
rules in application domain of agents to algorithm which needs conflict management
of rules. In this architecture agents can adapt ethically to its application domain and
can augment its implicit ethical knowledge, so behave more ethically.

References
1. Anderson, M., Anderson, S., Armen, C.: Toward Machine Ethics: Implementing Two
Action-Based Ethical Theories. In: AAAI 2005 Fall Symp. Machine Ethics, pp. 1–16.
AAAI Press, Menlo Park (2005)
2. Allen, C., Wallach, W., Smith, I.: Why Machine Ethics? IEEE Intelligent Systems Special
Issue on Machine Ethics 21(4), 12–17 (2006)
3. Anderson, M., Anderson, S.: Ethical Healthcare Agents. Studies in Computational Intelli-
gence, vol. 107, pp. 233–257. Springer, Heidelberg (2008)
4. Wallach, W., Allen, C.: Moral Machines: Teaching Robot Right from Wrong. Oxford
University Press, Oxford (2009)
5. Moor, J.H.: The nature, importance, and difficulty of Machine Ethics. IEEE Intelligent
Systems Special Issue on Machine Ethics 21(4), 18–21 (2006)
6. Honarvar, A.R., Ghasem-Aghaee, N.: Simulation of Ethical Behavior in Urban Transporta-
tion. Proceedings of World Academy of Science, Engineering and Technology 53,
1171–1174 (2009)
7. Rao, G.: BDI Agents: From Theory to Practice. In: Proceedings of the First International
Conference on Multi-Agent Systems (ICMAS 1995), San Fransisco, USA (1995)
8. Pal, S., Shiu: Foundations of Soft Case-Based Reasoning. Wiley-Interscience, Hoboken
(2004)
9. Kolodner: Case-Based Reasoning. Morgan Kaufmann, San Mateo (1993)
Casuist BDI-Agent: A New Extended BDI Architecture 95

10. Keefer, M.: Moral reasoning and case-based approaches to ethical instruction in science.
In: The Role of Moral Reasoning on Socioscientific Issues and Discourse in Science
Education. Springer, Heidelberg (2003)
11. http://wiki.lawguru.com/index.php?title=Casuistry
12. Gips: Towards the Ethical Robot, Android Epistemology, pp. 243–252. MIT Press,
Cambridge (1995)
13. Kendal, S.L., Creen, M.: An Introduction to Knowledge Engineering. Springer, Heidelberg
(2007)
14. Olivia, C., Change, C., Enguix, C.F., Ghose, A.K.: Case-Based BDI Agents: An Effective
Approach for Intelligent Search on the web. In: Proceeding AAAI 1999 Spring
Symposium on Intelligent Agents in Cyberspace Stanford University, USA (1999)
15. Corchado, J.M., Pavón, J., Corchado, E., Castillo, L.F.: Development of CBR-BDI Agents:
A Tourist Guide Application. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004.
LNCS (LNAI), vol. 3155, pp. 547–559. Springer, Heidelberg (2004)
16. Al-Fedaghi, S.S.: Typification-Based Ethics for Artificial Agents. In: Proceedings of Sec-
ond IEEE International Conference on Digital Ecosystems and Technologies (2008)
17. Floridi, L., Sanders, J.W.: On the Morality of Artificial Agents. Minds and
Machines 14(3), 349–379 (2004)
Research of Trust Chain of Operating System

Hongjiao Li and Xiuxia Tian

School of Computer and Information Engineering, Shanghai University of Electric Power,


Shanghai, P.R.C.
[email protected]

Abstract. Trust chain is one of the key technologies in designing secure oper-
ating system based on TC technology. Constructions of trust chain and trust
models are analyzed. Future works in these directions are discussed.

Keywords: Trusted Computing, Secure Operating System, Trust Chain, Trust


Model.

1 Introduction
It seems likely that computing platforms compliant with Trusted Computing Group
(TCG) specifications will become widespread over the next few years [1, 2]. The ad-
venture of TC (Trust Computing) technology gives new opportunity to the research and
development of secure operating system [3, 4]. On the one hand, secure operating
system has the ability to protect legal information flow, protect the information from
unauthorized access. On the other hand, TC technology can protect operating system of
its own security. Therefore, with the help of TC technology to develop secure operating
system is an effective way to solve the security of terminal. Future operating system
will be open-source, trusted, secure, etc [5].
Trust chain is one of the key technologies of TC, which plays vital role in con-
struction of high-assured secure operating system. In this paper we summary in detail
two important problems about trust chain of secure operating system, mainly focus on
Linux.
The paper starts with a description of transitive trust and the notion of trust chain of
operating system in Section 2. Section 3 details the research on construction of oper-
ating system trust chain and analyzes future work on Linux. Trust models and its future
work are discussed in Section 4. We conclude the paper in Section 5.

2 Trust Chain of Operating System


Transitive trust also known as “Inductive Trust” is a process where the root of trust
gives a trustworthy description of a second group of functions. Based on this descrip-
tion, an interested entity can determine the trust it is to place in this second group of
functions. If the interested entity determines that the trust level of the second group of
functions is acceptable, the trust boundary is extended from the root of trust to include
the second group of functions. In this case, the process can be iterated. The second

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 96–102, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Research of Trust Chain of Operating System 97

group of functions can give a trustworthy description of the third group of functions,
etc. Transitive trust is used to provide a trustworthy description of platform character-
istics, and also to prove that non-migratable keys are non-migratable.
The transitive trust enables the establishment of a chain of trust by ensuring that
the trust, on each layer of the system, is based on, and is only based on, the trust on the
layer(s) underneath it, all the way down to the hardware security component, which
serves as the root of trust. If verification fails to succeed at any given stage, the system
might be put in a suspended-mode to block possible attacks. The resulting integrity
chain inductively guarantees system integrity. Figure 1 illustrates the process of tran-
sitive trust of operating system. Trust chain starts from the trust root. The unmodifiable
part of the BIOS measure the integrity of the BIOS and then the BIOS measures the
MBR (Master Boot Record) of the bootstrap device. Then, MBR measures the OS
loader, and OS loader measures OS kernel. Finally, OS measure applications.

Fig. 1. Transition of Trust

3 Construction of Operating System Trust Chain

3.1 Related Work

Many researchers have endeavor to lots of works on operating system based on TCPA.
AEGIS is a system structure for integrity examination developed by University of
Pennsylvania [6]; AEGIS architecture establishes a chain of trust, driving the trust to
lower levels of the system, and, based on those elements, secure boot. It validates
integrity at each layer transition in the bootstrap process. AEGIS also includes a re-
covery process for integrity check failures.
IBM T. J. Watson Research Center developed an integrity mechanism on Linux [7].
There system is the first to extend the TCG trust concepts to dynamic executable con-
tent from the BIOS all the way up into the application layer. They present the design
and implementation of a secure integrity measurement system for Linux. All executa-
ble content that is loaded onto the Linux system is measured before execution and these
measurements are protected by the Trusted Platform Module (TPM) that is part of the
TCG standards. Their measurement architecture is applied to a web server application
to show how undesirable invocation can be detected, such as rootkit programs, and that
measurement is practical in terms of the number of measurements taken and the per-
formance impact of making them.
Huang-Tao also designed a trusted boot method in server system [8]. Their research
proposed a solution of trusted boot. The solution performs integrity verification on
every control capability transfer between one layer and next layer. If OK, control can
transfer. Therefore, the boot of system can execute according to the transition of trusted
chain. In this way, system boot can be in a trusted state, which can improve the sys-
tem’s security. Also, the solution is easily and flexibility to implement.
98 H. Li and X. Tian

Dynamic multi-path trust chain (DMPTC) is a software type and character based
mechanism to assure system trustworthiness on WINDOWS system [9]. DMPTC dif-
ferentiates static system software and dynamic application software and takes different
ways and policies to control the loading and running of various executable codes. The
goal of DMPTC is to build a trusted computing platform by making computing plat-
form only load and run trustworthy executables. Also, DMPTC gives great considera-
tion to the impact it causes to system performance1.Based on the attributes of various
executables and by taking advantage of WINDOWS interior security mechanisms,
DMPTC reduces the time cost of the executables verification process greatly. And
ultimately assures the flexibility and utility of DMPTC.
Maruyama etc deeply studied transitive mechanisms from GRUB boot loader to
operating system, describe a TCPA integrity mechanism from end to end and imple-
ment it on Linux kernel [10]. Integrity measurement of the kernel is done through a
chaining of PCR (Platform Configuration Register) updates during the bootstrap
process of the GRUB kernel loader. Kernel integrity measure is done by update the
PCR chain in kernel loader during booting. The measure results which involved digi-
tally-signed PCR values are reported to remote server by using a JAVA application,
Hadi Nahari describes a mechanism to construct effective CONTAINMENT (that is, a
mechanism to stop an exploited application start to against to attacks on another ap-
plication), which is applied to embedded system [11]. The MAC (Mandatory Access
Control) is provided by SeLinux. The focus will be on practical aspects of hardware
integration as well as porting SeLinux to resource-constrained devices. The methods
provides a high-level, structured overall infrastructure to provide basic and necessary
function to establish the trust of operating system services (via connect a hard-
ware-based root of trust).

3.2 Further Work

For Linux kernel, changes may take place during execution. Due to the multi-aspect
and in-order of applications, single chained boot mechanism of system boot cannot
apply to the trust transition from OS to applications. Once the kernel is booted, then
user-level services and applications may be run. In Linux, a program execution starts
by loading an appropriate interpreter (i.e., a dynamic loader, such as ld.so) based on the
format of the executable file. Loads of the target executable's code and supporting
libraries are done by the dynamic loader. In Linux operating system, kernel and its
modules, binary execution files, shared libraries, configuration files, and scripts run
loosely serially, and can change the system’s state during execution, which is the key
difficulty to the research of trusted Linux. In such situations, even OS loader has veri-
fied that the kernel is trusted. The trusted states can also be destroyed because the
kernel modules can be inserted or uninstalled, which incurs that the PCR values cannot
represent the current execution condition. It needs verification whether the module has
effects on kernel states on inserting or uninstalling the kernel modules.
One must note that the former research though ensuring the integrity of the oper-
ating environment when a hard boot occurs, does not guarantee its integrity during
the runtime; that is, in case of any malicious modification to the operating environ-
ment during the runtime, this architecture will not detect it until the next hard boot
happens. Trust of application embodies the integrity of privacy of complete trust
Research of Trust Chain of Operating System 99

chain. Though in reference [7], the trust is extended to dynamic executable content,
mandatory access control policy are not measured, future architecture extensions
should include such measurements.
On the other hand, a MAC (Mandatory Access Control) mechanism will be able to
address one of its fundamental shortcomings; providing a level of protection at runtime.
Deploying MAC mechanisms at the same time balance performance and control are
particularly challenging tasks.
TPM-EMULATOR provides better technical support for the practice and verifica-
tion for the research of trust chain [12, 13]. There is a tremendous opportunity for
enhanced security through enabling projects to use the TPM.

4 Trust Model

4.1 Related Work

TC models centralized on how to compute trust in information world, that is, put use
the trust relations among humans into computing environments to achieve the goal of
trust.
Patel J etc proposed a probabilistic trust model[14], their research aims to develop a
model of trust and reputation that will ensure good interactions amongst software
agents in large scale open systems in particular[14]. Key drivers for their model are: (1)
agents may be self-interested and may provide false accounts of experiences with other
agents if it is beneficial for them to do so; (2) agents will need to interact with other
agents with which they have no past experience. Specifically, trust is calculated using
probability theory taking account of past interactions between agents. When there is a
lack of personal experience between agents, the model draws upon reputation infor-
mation gathered from third parties.
Trust management model based on the fuzzy set theory deals with the authenticity in
open networks [15]. The author showed that authentication can not be based on public
key certificate alone, but also needs to include the binding between the key used for
certification and its owner, as well as the trust relationships between users. And develop
a simple algebra around these elements and describe how it can be used to compute
measures of authenticity.
In algebra for assessing trust in certification chains, the fuzzy nature of subjective
trust is considered, and the conceptions of linguistic variable and fuzzy logic are in-
troduced into subjective trust management [16]. A formal trust metric is given, fuzzy
IF-THEN rules are applied in mapping the knowledge and experiences of trust rea-
soning that humanity use in everyday life into the formal model of trust management,
the reasoning mechanisms of trust vectors are given, and a subjective trust management
model is provided. The formal model proposed provides a new valuable way for
studying subjective trust management in open networks.
Due to the open network is dynamic, distributed, and non-deterministic, in reference
to trust relation among human, a trust evaluation model based on D2S theory is pro-
posed [17]. The model gives the formal description of direct trust based on the history
trade record among grid and constructed the trust recommend mechanism of grid
nodes. Combining D2S theory with evidence, we can get indirect trust. After that,
100 H. Li and X. Tian

combine direct trust and indirect trust, effectively. The results show that the trust model
is viable and usability.
Reference [18] proposed a trust management model based on software behavior. An
agent-based software service coordination model is presented to deal with the trust
problem in open, dynamic and changeful application environment. The trust valuation
model is given to value trust relationships between software services. Trust is
abstracted as a function of subjective expectation and objective experience, and a rea-
sonable method is provided to combine the direct experience and the indirect experi-
ence from others. In comparison with another work, a complete trust valuation model is
designed, and its reasonability and operability is emphasized. This model can be used
in coordination and security decision between software services.
Non-interference theory is introduced into the domain of trusted computing to con-
struct the trusted chain theoretic model [19]. The basic theory of the computing trusted is
proposed and a non-interference based trusted chain model is built from the dynamic
point of view, and then the model is formalized and verified. Finally, the process of
start up based on Linux operating system kernel is implemented. The implementation
provides a good reference for the development and application of the trusted computing
theory as well.

4.2 Further Work

TCG introduces the idea of trust into the computing environment, but there is still not
the formalized uniform description. Trusted computing is still a technology but not a
theory, and the basic theory model has not been established. Above trust models focus
on sociology human relation, does not in accordance with the definition of TCG. Also,
present models should be further optimized to objective, simple and usability.
Linux is multi-task operating system, multi-program run simultaneously; therefore,
verification of the former and then transit to the latter is not viable. To maintain the
trusted state of the system, there is need to verify the file and associated execute pa-
rameters is trusted or not. At the same time, there is also need verification before
loading to execution to those objects which probably change system state.
Due to the property of applications, formal description should focus on the separa-
tion of processes, etc. The theory which support trust transition should pay more at-
tention to the dynamic execution. Also, the security principle, such as least privilege,
need-to-know policy should also take into account. Efficiency of integrity measure-
ment and security policy enforcement should also further improved.
In essence, non-interference theory is based on information flow theory, and it can
detect covert channel of the system which meets the requirements of designing
high-level secure operating system. Future research on trust model should pay more
attention to the non-interference theory, which support the construction and extend of
trusted chain.

5 Conclusions
In this paper, we summarize construction and formalization of trust chain of operating
system. We conclude that future research on trust chain of secure operating system
Research of Trust Chain of Operating System 101

should focus on dynamic trusted chain. Make full use of trusted mechanisms supplied
by TPM, extending the TCG concept of trust to application layer to study the trusted
chain and its formal description.
Acknowledgments. We are grateful to Shanghai Municipal Education Commission‘s
Young Teacher Research Foundation, under grant N0. SDL08024.

References
1. Shen, C.-x., Zhang, H.-g., Feng, D.-g.: Survey of Information Security. China Sci-
ence 37(2), 129–150 (2007) (in Chinese)
2. Trusted Computing Group. TCG Specification Architecture Overview [EB/OL]
[2005-03-01], http://www.trustedcomputinggroup.org/
3. Changxiang, S.: Trust Computing Platform and Secure Operating System. Network Security
Technology and Application (4), 8–9 (2005) (in Chinese)
4. Shieh, A.D.: Nexus: A New Operating System for Trustworthy Computing. In: Proceedings
of the SOSP, Brighton, UK. ACM, New York (2005)
5. Zhong, C., Shen, Q.-w.: Development of Modern Operating System. Communications of
CCF 9, 15–22 (2008) (in Chinese)
6. Arbaughz, W.A., Farber, D.J., MSmith, J.: A Secure and Reliable Bootstrap Architecture.
In: IEEE Security and Privacy Conference, USA, pp. 65–71 (1997)
7. Sailer, R., Zhang, X., Jaeger, T., et al.: Design and Implementation of a TCG -based Integ-
rity Measurement Architecture. In: The 13th Usenix Security Symposium, San Diego (2004)
8. Tao, H., Changxiang, S.: A Trusted Bootstrap Scenario based Trusted Server. Journal of
Wuhan University (Nature Science) 50(S1), 12–14 (2004) (in Chinese)
9. Xiaoyong, L., Zhen, H., Changxiang, S.: Transitive Trust and Performance Analysis in
Windows Environment. Journal of Computer Research and Development 44(11),
1889–1895 (2007) (in Chinese)
10. Maruyama, H., Nakamura, T., Munetoh, S., et al.: Linux with TCPA Integrity Measurement.
IBM, Tech. Rep.: RT0575 (2003)
11. Nahari, H.: Trusted Embedded Secure Linux. In: Proceedings of the Linux Symposium,
Ottawa, Ontario Canada, June 27-30, pp. 79–85 (2007)
12. Hall, K., Lendacky, T., Raliff, E.: Trusted Computing and Linux,
http://domino.research.ibm.com/comm/research_projects.nsf/
pages/gsal.TCG.html/FILE/TCFL-TPM_intro.pdf
13. Strasser, M.: A Software-based TPM Emulator for Linux. Swiss Federal Institute of
Technology (2004),
http://www.infsec.ethz.ch/people/psevinc/TPMEmulatorReport.pdf
14. Patel, J., Teacy, W.T.L., Jennings, N.R., Luck, M.: A Probabilistic Trust Model for Han-
dling Inaccurate Reputation Sources. In: Herrmann, P., Issarny, V., Shiu, S.C.K. (eds.)
iTrust 2005. LNCS, vol. 3477, pp. 193–209. Springer, Heidelberg (2005)
15. Wen, T., Zhong, C.: Research of Subjective Trust Management Model based on the Fuzzy
Set Theory. Journal of Software 14(8), 1401–1408 (2003) (in Chinese)
16. Audun, J.: An Algebra for Assessing Trust in Certification Chains. In: The Proceedings of
NDSS 1999, Network and Distributed System Security Symposium, The Internet Society,
San Diego (1999)
17. Lulai, Y., Guosun, Z., Wei, W.: Trust evaluation model based on DempsterShafer evidence
theory. Journal of Wuhan University (Natural Science) 52(5), 627–630 (2006) (in Chinese)
102 H. Li and X. Tian

18. Feng, X., Jian, L., Wei, Z., Chun, C.: Design of a Trust Valuation Model in Software Service
Coordination. Journal of Software 14(6), 1043–1051 (2003) (in Chinese)
19. Jia, Z., Changxiang, S., Jiqiang, L., Zhen, H.: A Noninterference Based Trusted Chain
Model. Journal of Computer Research and Development 45(6), 974–980 (2008)
(in Chinese)
A Novel Application for
Text Watermarking in Digital Reading

Jin Zhang, Qing-cheng Li, Cong Wang, and Ji Fang

Department of Computer Science, Nankai University,


Tianjin 300071, China
{nkzhangjin,liqch}@nankai.edu.cn,
{wangcong6222351,shuiqi}@mail.nankai.edu.cn

Abstract. Although watermarking research has made great strides in theoretical


aspect, its lack of application in business could not be covered. It is due to few
people pays attention to usage of the information carried by watermarking. This
paper proposes a new watermarking application method. After digital document
being reorganized with advertisement together, watermarking is designed to
carry this structure of new document. It will release advertisement as interfer-
ence information under attack. On the one hand, reducing the quality of digital
works could inhabit unauthorized distribution. On the other hand, advertisement
can benefit copyright holders as compensation. Moreover implementation de-
tail, attack evaluation and watermarking algorithm correlation are also
discussed through an experiment based on txt file.

1 Introduction
It has been several ten years since digital watermarking technology emerged. Both of
technology diversity and theoretical research have made major strides in this field.
But its application in business is always a big problem which perplexes the develop-
ment of watermarking. Many watermarking companies have gone out of business or
suspended their watermarking efforts except some focus on tracking and digital
fingerprint [1].
This problem is due to, in our opinion, the misunderstanding of the nature of digital
watermarking. The real value of digital watermarking is the information bound in
watermarking. In other words, watermarking content and application methods are the
key issues in digital watermarking research. Unfortunately, the current research in this
area is little to write about.
As an important part in digital watermarking, text watermarking encounters the same
problem with other kinds of watermarking technology. While most people agree that
text watermark should be been widely used in the field of digital reading, but this is not
the truth. On one hand lack of appropriate application methods make text watermarking
useless. On the other hand, text watermarking is characterized by its attack sensitiveness
(this part is discussed in section2). This easily led to watermark information lost.
In this paper, one application method is proposed for text watermarking in digital
reading. We combine original digital content and advertisement as a whole digital

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 103–111, 2009.
© Springer-Verlag Berlin Heidelberg 2009
104 J. Zhang et al.

document. The rules of combination are embedded in this digital document as water-
marking information. Once the document is under attack, advertisement will be re-
leased from it. The attacker will have to read them with real content together, which
will serve as compensation for the copyright holder.
This paper is organized as follows: in section 2, we review other kinds of text wa-
termarking. And the procedure of our solution is presented generally in section 3. And
we expound solution detail and experiment result in section 4. At last we draw the
conclusion in section 5.

2 Related Works

In general, the research object of text watermarking is digital document. And docu-
ment is always made up of two parts, content and format.
Many researchers choose to embed watermarking by taking content as manifesta-
tions of language. That is semantics-based watermarking. When the watermarking
information is embedded, the content is modified according to linguistic rules. Atal-
lah’s TMR is the representative of this kind of watermarking [2].Dai and Yang also
proposed their solution [3, 4]. This kind of watermarking technology always needs
more time to consume. And it is very sensitive to tamper attack and deleting attack
[5]. At the same time, less capacity is another shortcoming of them.
Some other researchers deal with content as a still picture. In this way, they can
build additional redundant space to embed watermarking so as to enhance capacity.
Brassil and Low’s line-shift/word-shift coding and word feature coding are the repre-
sentative of such solutions [6, 7]. In their solution, content is a binary image. They use
lines or words’ tiny mobile in different directions to embed information. And Huang
complemented their work [8].Liu and Chen’s work is based on interconnected do-
main. They regarded Chinese characters as special images [9]. It is worth mentioning
that Li and Zhang propose their method which integrates the ideas of Liu and seman-
tics-based watermarking [10]. They make a significant reduction in overall algorithm
consumption. But OCR technology is their natural enemies. No matter how sophisti-
cated these methods are, the majority of watermarking information will be removed
after the OCR or retyping attack. Tamper and deleting attack also could reduce their
performance.
There are some researchers who hope that watermarking can be hidden in docu-
ment’s format. Fu and Wang’s design is a prime example [11]. Zhou and Sun use spe-
cial characters in RTF file to embed watermark [12]. And some solution is based on
redundant coding space, such as Zhou’s work [13]. Other techniques such as add
blanks and other perceptible characters are also proposed. Although they remove the
impact of tamper attack and deleting attack. Any change to the document will have a
fatal impact on them, much more OCR and retyping attack.
To sum up, all kinds of watermarking are sensitive to special attack. There is little
information could be maintained under the niche attack.
A Novel Application for Text Watermarking in Digital Reading 105

3 Solution Design

As mentioned above, there is none text watermarking technology can resist all kinds
of attack. In particular the realization of these attacks is also very low cost. Rather
than spending a lot of energy to design a more robust watermark, it is better to find a
method which could make attacker wouldn’t burn their house to rid it of a mouse.

3.1 Design Principle

The general watermarking embedding procedure is described by (1) [14]:


C wn = W ⊕ C0 ⊕ n (1)

After being embedded watermarking information W, original content C0 with additive


channel noise n will be the last digital works or digital document Cwn. Because digital
document always has strict coding requirement or format specification such as check
code mechanism. We do not consider channel noise issue.
As noted in section 1, watermarking is used to transmit combination rules, while
combination of content and advertisement finished. In this case, we modify formula
(1) as follows:
⎧Cwn = F (W , C0 , C A ) ⊕ nA
⎪ (2)
⎨ ⎧C0 , nA = 0
⎪G (W , Cwn ) = ⎨C ⊕ C , n = else
⎩ ⎩ 0 A A

Where CA is advertisement and nA is attack noise. We denote CW as digital document


which contain watermarking without being attacked. F( ) is the combination function,
while G( ) is the striping function.

3.2 Operation Procedure

In this section, we will discuss the operation procedure as figure 1 shows.

CA

C0 ⊕ W
⊕ CW

nA ⊕
W CWn
C0

Fig. 1. Design processing flow chart


106 J. Zhang et al.

Step 1: C0 and CA combine in accordance with the rules. These rules are encoded
into a special format, which is watermarking information W. In other words, both of
C0 and CA are watermarked.
Step 2: After being embedded W, digital document will be distributed or attacked.
In this way attack noise nA. is introduced.
Once attack happened during transmission, nA will add into document. At this time
CW turns into Cwn. As formula (2) shows: When attack misses, nA is given the value 0.
CW equals to Cwn in this condition.
Step 3: When user’s device or client software receive Cwn, it will extract W and
restore C0 according to W. If nA equals to 0, user could watch original content. But if
watermarking is totally erased, user will have to watch CA while they are browsing C0.
In the worst case that W is totally distorted, there are only garbled characters left.

4 Implementation and Evaluation

We implement our design on one kind of hand-held digital reading device, and apply
it in related digital works release platform. The algorithm in [10] is taken as default
algorithm. We use a txt document which contains 10869 Chinese characters as ex-
periment object.
Figure2. (a) shows the document which contains watermarking and advertisement.
In this screenshot, it is open directly without stripping. And Figure2. (b) shows the
result of opening it after stripping. In Figure2.(b) both of advertisement and water-
marking signs are removed by content stripper.

(a) (b)

Fig. 2. Device operation screenshots

Before discussing implementation detail, we give a definition. When the algo-


rithms as [10] work, the content will be changed selectively. Every change can con-
tain 1 bit information. We name a changeable part in content as a watermarking tag.
And we use WT to denote watermarking tag.
A Novel Application for Text Watermarking in Digital Reading 107

4.1 Implementation Detail

4.1.1 Watermarking Embedding Processing


In this part the most important question is the organization of digital document. Out of
consideration for distributing watermark and advertisement, the document content
structure is designed as figure 3 showing.

Fig. 3. Document content structure diagram

Original content C0 is split into N blocks, sizes of which are not equal and are de-
noted by sequence {Sai}. And sequence {Sbi} is used to denote the sizes of CA blocks.
These combination rules are described by W, which is made up of sequence
{Wi}.We define Wi as follows (3):
Wi = {li ,{Ci , ai , bi }R , S } (3)
li: total length of Wi.
Ci: check code of ai and bi.
{}R: redundancy construction function. It could be replaced by cryptographic
functions.
S: barrier control code. It is used to isolate different Wi. It is also helpful when delet-
ing attack appears. Different code words can also play the role of control information
transmission.
{ai}: relative length of C0 block i. It equals to the distance between first WT in this to
the last character in this C0 block.
{bi}: length of CA block i. It usually equals to Sbi.
Description above is just content structure. After these processing, it could be sealed
in other document format, such as DOC, Html. Because of lager consumption of sys-
tem, this work should be arranged in content servers or distribution servers.

4.1.2 Content Stripping Processing


Generally the main work is extracting watermarking information and restoring origi-
nal content.
108 J. Zhang et al.

Fig. 4. Table T structure diagram

After scanning CWn, watermarking information sequence will be gained. According


to barrier control code S, Wi is separated. Before restoring original content, the plain
txt stream should be restore first. This txt stream includes C0 and CA without water-
marking. And there is a table T which contains WT’s offset in the stream should be
setup. As figure 4 shows. This due to locating requirements because every block’s
address is presented by relative offset, as formula (3) shows. According to T and
{Wi}, original content could be restored.
Actually we can use cache mechanism to optimize the design performance in de-
vice. In short, user can browse the content before totally restoring content. When user
wants to skip to some pages, device can scan nearest barrier control code S and build
temporary table T’ to show content quickly. And complete table could be built within
user browsing interval.

4.2 Application in Dynamic DRM

Most main DRMs use asymmetric cryptographic algorithm and special document
format to ensure security. Our design focuses on content which is independent of
document format. And then it could be applied in most DRM smoothly.
Dynamic DRM (DDRM) is a special DRM which support superdistribution and
distribution among users [15]. And its technical basis is its license mechanism. And
all digital works or digital documents on this platform must be divided into blocks
and reorganized. Especially dynamic advertisement block is supported in DDRM.
That means the advertisement in digital works could be changed with distribution
gong on.
As mentioned in section 3, watermarking is embedded into total content. Both of
C0 and CA contain watermarking. In this case, device has to reorganize digital docu-
ment to update CA after each distribution. Obviously it is very hard for today’s em-
bedded devices.
Therefore we have to amend formula (3), and make it turn into another form as (4)
shows.
CW = F (W , C0 ) ⊕ C A (4)

In this condition, only C0 could be watermarked. And CA could be replaced dynami-


cally. But there must be two certain rules as follows:
A Novel Application for Text Watermarking in Digital Reading 109

• New CA block must have the same size with the old one. That is the reason why
we usually give CA block a larger size.
• Validity of CA should also be checked. And barrier control code S’s value is
used to indicate whether dynamic CA is supported.

4.3 Evaluation under Attack

As discussed in section 2, most text watermarking algorithm including our design are
attack sensitive. Once attack match with watermarking algorithm’s shortcoming, there
could be little information left.
Rather than pay attention to more robust algorithm, we focus on risk management
after attack happened. Figure 5 shows the result of watermarking information lost in
our design.

Fig. 5. Content left after being attack

Obviously advertisement emerges when watermarking information is lost. On one


hand, it could reduce digital works quality and inhibit illegal propagation. On the
other hand, it could compensate content provider and service provider for the lost.
Technically, the robustness of our design mainly depends on the robustness of
watermarking algorithm selected.

4.4 Watermarking Algorithm Correlation

Our research mainly focuses on finding a new method to apply digital watermarking.
Therefore the correlation of selection algorithm needs to be discussed.
Though most algorithms could smoothly be applied in our design, we need use
different way to split content according to different algorithms.
When one algorithm hiding information in format is selected, splitting method is
very easy. If the algorithm tries to regard text as a picture, the descriptors about size
or offset should be replaced by the descriptors about coordinates.
110 J. Zhang et al.

(a)

(b)

Fig. 6. Capacity distribution

When the algorithm selected involves semantics-based watermarking, we must


evaluate the size of Wi in advance. And then the size of block could be set, after
checking distribution of watermarking tags. Otherwise it could happen that the size of
Wi is larger than the capacity of watermarking space in block.
Figure 6 is the capacity distribution in our experiment. First we divide content into
20 blocks which have the same size. (a) shows its result that the smallest capacity is
only 60 bits. (b) shows the situation that every block have the same watermarking
capacity. Obviously capacity distribution is not even.

5 Conclusion
As an application technology, digital watermarking research should not only indulge in
finding new algorithm or blindly seek technical target. Searching efficient application
method maybe more meaningful.
Out of viewpoint above, this paper propose a novel application method for text wa-
termarking in digital reading field. It will release advertisement as interference infor-
mation under attack. On the one hand, reducing the quality of digital works could
inhabit unauthorized distribution. On the other hand, it point out a new way for
watermarking application.
A Novel Application for Text Watermarking in Digital Reading 111

References
1. Delp, E.J.: Multimedia security: the 22nd century approach. Multimedia Systems 11,
95–97 (2005)
2. Atallah, M.J., Raskin, V., Crogan, M., Hempelmann, C., Kerschbaum, F., Mohamed, D.,
Naik, S.: Natural language watermarking: Design, analysis, and a proof-of-concept imple-
mentation. In: Moskowitz, I.S. (ed.) IH 2001. LNCS, vol. 2137, pp. 185–200. Springer,
Heidelberg (2001)
3. Dai, Z., Hong, F., Cui, G., et al.: Watermarking text document based on statistic property
of part of speech string. Journal on Communications 28(4), 108–113 (2007)
4. Yang, J., Wang, J., et al.: A Novel Scheme for Watermarking Natural Language Text:
Intelligent Information Hiding and Multimedia Signal Processing. IEEE Intelligent Soci-
ety 2, 481–488 (2007)
5. Li, Q., Zhang, J., et al.: A Chinese Text Watermarking Based on Statistic of Phrase
Frequency: Intelligent Information Hiding and Multimedia Signal Processing. IEEE Intel-
ligent Society 2, 335–338 (2008)
6. Brassil, J., Low, S., Maxemchuk, N., O’Gorman, L.: Electron marking and identification
techniques to discourage document copying. IEEE J. Select. Areas Commun. 13,
1495–1504 (1995)
7. Low, S., Maxemchuk, N.: Capacity of Text Marking Channel. IEEE Signal Processing
Letter 7(12), 345–347 (2000)
8. Huang, H., Qi, C., Li, J.: A New Watermarking Scheme and Centroid Detecting Method
for Text Documents. Journal of Xi’an Jiaotong University 36(2), 165–168 (2002)
9. Liu, D., Chen, S., Zhou, M.: Text Digital Watermarking Technology Based on Topology
of Alphabetic Symbol. Journal of Chinese Computer Systems 27(25), 812–815 (2007)
10. Li, Q., Zhang, Z., et al.: Natural text watermarking algorithm based on Chinese characters
structure. Application Research of Computers 26(4), 1520–1527 (2009)
11. Fu, Y., Wang, B.: Extra space coding for embedding wartermark into text documents and
its performance. Journal of Xi’an Highway University 22(3), 85–87 (2002)
12. Zou, X., Sun, S.: Fragile Watermark Algorithm in RTF Format Text. Computer Engineer-
ing 33(4), 131–133 (2007)
13. Zhou, H., Hu, F., Chen, C.: English text digital watermarking algorithm based on idea of
virus. Computer Engineering and Applications 43(7), 78–80 (2007)
14. Cox, J., Miller, M., et al.: Digital Watermarking and Steganography, 2nd edn. Morgan
Kaufmann, San Francisco (2007)
15. Li, Q., Zhang, J., et al.: A Novel License Distribution Mechanism in DRM System. In:
Proceedings of 22nd Advanced Information Networking and Applications - Workshops,
pp. 1329–1334. IEEE Computer Society, Los Alamitos (2008)
Optimization of Real-Valued Self Set for Anomaly
Detection Using Gaussian Distribution

Liang Xi, Fengbin Zhang, and Dawei Wang

College of Computer Science and Technology, Harbin University of Science and Technology,
150080 Harbin, China
[email protected], [email protected],
[email protected]

Abstract. The real-valued negative selection algorithm (RNS) has been a key
algorithm of anomaly detection. However, the self set which is used to train de-
tectors has some problems, such as the wrong samples, boundary invasion and
the overlapping among the self samples. Due to the fact that the probability of
most real-valued self vectors is near to Gaussian distribution, this paper pro-
poses a new method which uses Gaussian distribution theory to optimize the
self set before training stage. The method was tested by 2-dimensional synthetic
data and real network data. Experimental results show that, the new method
effectively solves the problems mentioned before.

Keywords: anomaly detection, artificial immunity system, real-valued self set,


Gaussian distribution.

1 Introduction
The anomaly detection problem can be stated as a two-class problem: given an ele-
ment of the space, classify it as normal or abnormal[1]. There exist many approaches
for anomaly detection which include statistical, machine learning, data mining and
immunological inspired techniques[2, 3, 4]. The task of anomaly detection may be
considered as analogous to the immunity of natural systems, while both of them aim
to detect the abnormal behaviors of system that violate the established policy[5,6].
Negative selection algorithm (NSA)[7] has potential applications in various areas, in
particular anomaly detection. NSA trains detector set by eliminating any candidate
that match elements from a collection of self samples (self set). These detectors sub-
sequently recognize non-self data by using the same matching rule. In this way, it is
used as an anomaly detection algorithm that only requires normal data to train.
Most works in anomaly detection used the problem to binary representation in the
past. However, many applications are natural to be described in real-valued space.
Further more, these problems can hardly be processed properly using NSA in binary
representation. Gonzalez et al.[8] proposed the real-value representation for the
self/non-self space, which differs from the original binary representation. The real-
valued self/detector is a hypersphere in n-dimensional real space; it can be repre-
sented by an n-dimensional point and a radius.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 112–120, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Optimization of Real-Valued Self Set for Anomaly Detection 113

As we know that, the detection performance is mainly relied on the quality of de-
tector set: detector coverage of the non-self space. The detector is trained by the self
set, so that the self set plays an important role on the whole detection process, and the
health of self set is very important to be paid enough attention. It should be noted that,
the real-valued self set indeed has some problems which the binary representation has
not, such as, the wrong self samples, the overlapping of self samples and the boundary
invasion. To solve these problems, this paper proposes a novel method to optimize the
self set. Starting from the point that self samples approach the Gaussian distribution
as the number of samples is large enough, so this method employs Gaussian distribu-
tion theory to deal with the problems mentioned above.

2 Problems with Real-Valued Self Set

2.1 The Wrong Self Samples

Before training detectors, the self set must be secure. However, the real situation may
not meet to the requirement all the time, and the self set is likely to collect some
wrong samples. The figure 1(a) illustrates the problem of wrong self in 2-dimensional
space, there are two samples is so far from the self region, which we can consider as
the wrong self. The wrong self samples may result in the holes, so it is important to
discard the wrong samples before RNS.

(a) The wrong self samples (b) The overlapping (c) The boundary invasion

Fig. 1. The sketch of real-valued self set problems

2.2 The Overlapping

The overlapping self samples: In a work situation, the self samples distribute inten-
sively. As shown in figure 1(b), a large number of self samples are intensively distribute
in the self region, and the overlapping rate is very high, and there are many unnecessary
samples in the self region. Now we discuss the relationship between the number of self
samples and the candidates. Some variables should be defined as below:
f : The probability of failure matching of a candidate to any self sample
pf : The probability of an attack that is not detected
pm : The matching probability
Ns : The number of self samples
114 L. Xi, F. Zhang, and D. Wang

Nd : The number of detectors


N d : The number of candidates
0

When the pm is small enough and the N s is large enough,

f = (1 − pm ) Ns ≈ e− Ns pm .
When the N d is large enough,

p f = (1 − pm ) N d ≈ e − N d pm .

Because
− ln( p f ) ,
N d = N d0 ∗ f =
pm

so
− ln( p f )
N d0 = (1)
pm (1 − pm ) N s

From the equation 1, we can see that the N d has exponential relationship with the N s ,
0

so the more self samples, the more candidates and the higher cost of the detector
training.
The follow equation defines an approximate measure of overlapping between self
samples:
2
− si − s j
r2 (2)
Overlapping( si , s j ) = e s

The maximum value, 1, is reached when the distance between the two sample is 0, on
the contrary, when the distance is equal to 2 rs , the value of this function is almost close
to 0. Based on the equation 2, the amount of overlapping of a self set is defines as
2
− si − s j

Overlapping( S ) = ∑ e r2
s
, i, j = 1, 2,..., n. (3)
i≠ j

2.3 The Boundary Invasion

As a result of the self’s radius, on the border of the self region, the covering area
should invade the non-self region’s boundary. We can see this phenomenon clearly in
the figure 1(c). So when training the detectors by self set, the vicinity of the non-self
boundary may not be covered completely.
Optimization of Real-Valued Self Set for Anomaly Detection 115

3 Optimization Method
In view of the analysis above, to optimize the self set is significant. The aim of the
optimization is using the least self samples to cover the self region but not the non-
self region. The problem of optimization can be stated as follows:
minimize:
V ( S ) = Volume{x ∈ U | ∃s ∈ S ,|| x − s ||≤ rs } (4)
restricted to:
{∀s ∈ S | ∃d ∈ D, s − d ≤ rs } = ∅ (5)

and
{∀si , s j ∈ S ,|| si − s j ||≤ rsi or || si − s j ||≤ rs j } = ∅ (6)

The function defined in equation 4 represents the amount of the self space covered by
self samples. S, which corresponds to the volume covered by the union of hyper-
spheres associated with each self. The restriction: equation 5 tells that no self should
invade the detector set; equation 6 tells that, there is no self sample covered by the
other samples in the self set.
As we said before, if the number of self samples is large enough, the probability
distribution of self samples approaches the Gaussian distribution. Furthermore, the
central limit theorem clarifies that whichever kind of the probability distribution all
samples approach, the distribution of the samples’ mean is near to a Gaussian distri-
bution; the mean of Gaussian distribution is equal to the mean of all samples.
According to the Gaussian distribution, we can also realize that, the smaller the
distance between the sample and the mean point is, the larger the value of the sam-
ple’s probability density. As discussed before, the nearer the sample is next to the
center of the self region, the heavier the overlapping rate is. So we can use this rela-
tionship to deal with the problem of the unnecessary self samples. This paper pro-
poses the method is that adjusting every sample’s radius by its probability density to
deal with the boundary invasion, and then discarding each unnecessary sample by the
radius.
There is an important and famous proposition in the Gaussian distribution: the “3σ”
Criterion, which shows that each normal sample is almost in the “3σ” district, al-
though the range is from −∞ to ∞ . So we can use the “3σ” Criterion to deal with
the problem of the wrong self samples.
As analyzed above, we may draw a conclusion that, the optimization by the prob-
ability theory is reasonable. To solve the three problems with self set mentioned
above, the optimization is composed by three steps as follows:
Step.1: Discarding the wrong self samples by the “3σ” criterion;
Step.2: Adjusting the self radius by the self’s probability density;
Step.3: Discarding the unnecessary self samples by the self radius.
Before describing the optimization, some variables frequently used in the following
sections are defined here:
116 L. Xi, F. Zhang, and D. Wang

s: Self (Eigenvector: value, Radius: r)


S: Self set (Initial self set: S0)
N0: The number of self samples in S0
N: The number of self samples in S (initial value: N0)
k: Unit self radius
num: the number of probability density interval
Now the algorithm pseudocode is described as follows:

BEGIN
Collection the self samples: S0 ← s;
//step. 1: discarding the wrong self samples
Regularize S0, and then compute the µ and σ;
n=0;
while (n<N0){
if (sn is out of the “3σ” district){
discard sn, N0--;
}
n++;
}
// step. 2: adjusting every sample’s radius
Compute the smaxpdf and sminpdf ;
L= (maxpdf-minpdf)/num;
while (n<N0){
l = ( pdf s − min pdf ) / L ;
n
Sn.r=l×k;
}
// step. 3: discarding the unnecessary self samples
S ← s0, N=1, flag=0;
while (n<N0){
i=0;
while (i<N){
d =|| sn − si || ;
if (d<sn.r || d<si.r){
flag=1, break;
}
i++;
}
if (flag==0){
S ← si, N++;
}
n++;
}
END

In the step.1, the self sample should be discarded if its probability density is out of
the “3σ” district. The time complexity of this step is O(n).
Optimization of Real-Valued Self Set for Anomaly Detection 117

In the step.2, the nearer self samples are closed to the center of self region, the lar-
ger the radii of them. So the radii of self samples near the boundary of self region are
adjusted to be a relatively rational level so that the boundary invasion is avoided,
while the around the central region, each sample’s radius is adjusted to be a relatively
larger level than before so that the overlapping is severer than before. The time com-
plexity of this step is O(n).
In the step.3, by discarding the unnecessary samples, the problem with more over-
lapping in the step.2 of optimization is solved. The samples with larger level radius
can cover more space than the fixed radius samples before, so that, the self region can
be covered by fewer samples than before. The time complexity of this step is O(n2).
Via the optimization, the self set is more rational than before. The total time com-
plexity of optimization is O(n2).

4 Experiments and Results


To test the algorithm described in the previous section, experiments were carried out
using 2-dimensional synthetic data and real network data.

4.1 2-Dimensional Synthetic Data

Firstly, some self samples chosen from a fixed number of random points, which fit
with the normal distribution, are normalized to compose the self set (198 samples,
radius is fixed as 0.05). So this set has no wrong samples, and the step.1 optimization
will be illustrated in the real network experiment. Each result of optimization is
shown by the figure 2.

(a) The initial self set (b) The self set optimized by step.2 (c) The self set optimized by step.3

(d) Detectors trained by (e) Detectors trained by self set (f) Detectors trained by self set
initial self set optimized by step.2 optimized by step.3

Fig. 2. Experiment results on 2-dimensional synthetic data


118 L. Xi, F. Zhang, and D. Wang

The figure 2(a) shows the initial self set, and we can see that the overlapping and
the boundary invasion are extremely severe. The detectors trained by the initial self
set are shown in figure 2(d), and we can see clearly that the boundary of the non-
self is not covered completely which is the consequence of the self samples bound-
ary invasion. Likewise, the optimized self set and its detector set are shown by the
other figures of the figure 2 (the unit radius is set as 0.005). After all the steps of
optimization, the number of optimized self is dropped to 67, and the overlapping(S)
is dropped to 2.71% (by the equation 3). As the figures show, adjusting each sam-
ple’s radius avoids the boundary invasion so that the non-self region’s boundary is
covered well; and then the coverage of self region by the most reasonable self sam-
ples almost staying the same as before.

4.2 Real Network Data

In the real network experiment, the self samples are come from the network security
Lab of the HUST (Harbin University of Science and Technology). The value is 2-
dim: pps(packages per second) and bps(bytes per second). We collected 90 samples
and drew the joint probability distribution in figure 3. And we can see that the joint
distribution is quite similar to the standardized Gaussian distribution.

(a) The joint probability distribution of pps and bps (b) Two-dimensional standardized Gaussian

Fig. 3. The two-dimensional joint probability distribution

The results of optimization are shown by the figure 4. The optimized self sets are
in figure 4(a, b, c, d), and the corresponding detector sets are in the other figures of
figure 4.
As shown in the figure 4(a), the self set had 3 wrong self samples which were far
away from the self region. Therefore, the detector set generated by the initial self set
has holes shown in the figure 4(e). After the step.1, the wrong samples were discarded
(shown in the figure 4(b)). The results of other steps of optimization are just like the
simulation experiment above. The final self set has 19 samples.
To perform a comparison of generating detector set between the initial self set and
the optimized one, we used the two data sets to generate 3000 detectors. We can
clearly see the process from the figure 5, the detector generating speed using opti-
mized self set is faster than using initial self set.
Optimization of Real-Valued Self Set for Anomaly Detection 119

(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 4. Real network experiment results

Fig. 5. The detector generating speed comparison between initial and optimized self set

To perform a comparison of the detection effect of the two detector sets, we used
the SYN flooding to attack the two hosts with different detector sets simultaneously.
All the results shown in the table 1 are average of 100 times experiment. From the
table we can see clearly that the detection rate with detectors using optimized self set
is much higher than the other (98.9% vs. 83.7%) and the false alarm rate is remark-
able lower than the other (2.3% vs.8.3%).

Table 1. The Detection Comparison Between The Two Detector Sets

False Alarm
Detector Set Connection Invasion Detecting Detection Rate
Rate
By initial 13581653.3 100 92.5 83.7% 8.3%
By optimized 13581653.3 100 101.2 98.9% 2.3%
120 L. Xi, F. Zhang, and D. Wang

5 Conclusion
In this paper we have shown that the real-valued self set’s optimization in anomaly
detection with the probability theory. Experimental results demonstrated that this
optimization is necessary and the method is obviously effective. Moreover, the opti-
mized self set provides a favorable foundation of generating detectors. The detector
set trained by the optimized self set is more reliable because of the advantages of the
optimization algorithm:
1. Holes can be better covered by the generated detectors because of discarding the
wrong self samples.
2. Time to generate detectors is saved by using smaller number of self samples, which
also requires less space to store them.
3. The radius of the border self samples is adjusted to a rational range so that the
boundary invasion is solved and the false alarm rate is reduced obviously.
The influence of real-valued self set optimization needs further study, including more
real network experiment and strict analysis. The implication of self radius, or how to
interpret each self sample, is also an important topic to be explored.

Acknowledgment
This work was sponsored by the National Natural Science Foundation of China
(Grant No.60671049), the Subject Chief Foundation of Harbin (Grant
No.2003AFXXJ013), the Education Department Research Foundation of Heilongji-
ang Province (Grant No. 10541044, 1151G012).

References
1. Patcha, A., Park, J.M.: An overview of anomaly detection techniques: existing solutions and
latest technological trends. Computer Networks 51, 3448–3470 (2007)
2. Hofmeyr, S., Forrest, S.: Architecture for an artificial immune system. IEEE Transactions on
Evolutionary Computation 8, 443–473 (2000)
3. Simon, P.T., Jun, H.: A hybrid artificial immune system and self and self organising map for
network intrusion detection. Information Sciences 178, 3024–3042 (2008)
4. Forrest, S., Perelson, A.S., Allen, L.: Self-non-self discrimination in a computer. In: Proc. of
the IEEE Symposium on Research in Security and Privacy, pp. 202–212 (1994)
5. Dasgupta, D., Gonzalez, F.: An immunity based technique to characterize intrusions in com-
puter network. IEEE Transactions on Evolutionary Computation 69, 281–291 (2002)
6. Boukerche, A., Machado, R.B., Juca, R.L.: An agent based and biological inspired real-time
intrusion detection and security model for computer network operations. Computer Com-
munications 30, 2649–2660 (2007)
7. Ji, Z., Dasgupta, D.: Real-valued negative selection algorithm with variable-sized detectors.
In: Deb, K., et al. (eds.) GECCO 2004, Part I. LNCS, vol. 3102, pp. 287–298. Springer,
Heidelberg (2004)
8. Gonzalez, F., Dasgupta, D., Kozema, D.: Combining negative and classification techniques
for anomaly detection. In: Proc. of the 2002 Congress on Evolutionary Computation,
pp. 705–710 (2002)
A GP Process Mining Approach from a Structural
Perspective

Anhua Wang, Weidong Zhao∗, Chongchen Chen, and Haifeng Wu

Software School, Fudan University, 220 Handan Rd.


200433 Shanghai, China
{072053026,wdzhao,07302010048,082053025}@fudan.edu.cn

Abstract. Process mining is the automated acquisition of process models from


event workflow logs. And the model’s structural complexity directly impacts
readability and quality of the model. Although many mining techniques have
been developed, most of them ignore mining from a structural perspective. Thus
in this paper, we have proposed an improved genetic programming approach
with a partial fitness, which is extended from the structuredness complexity
metric so as to mine process models, which are not structurally complex. Addi-
tionally, the innovative process mining approach using complexity metric and
tree based individual representation overcomes the shortcomings in previous
genetic process mining approach (i.e., the previous GA approach underperforms
when dealing with process models with short parallel and OR structure, etc).
Finally, to evaluate our approach, experiments have also been conducted.

1 Introduction
Today, information about business processes is mostly recorded by enterprise infor-
mation systems such as ERP and workflow management systems in the form of
so-called “event logs” [1]. As in many domains processes are evolving and become
more and more complex, there is a need to understand the actual processes based on
logs. Thus process mining is employed since it automatically analyzes these logs to
extract explicit process models. Currently, most of process mining techniques try to
mine models from the behavior aspect only (i.e., reflect the exact behavior expressed in
logs) while ignore the complexity of mined models. However, as processes nowadays
are becoming more complex, process designers and analysts also consider mining
structurally simple models since complex processes may result in errors, bad under-
standability, defects and exceptions [2]. Therefore, “good” process models are required
to conform to logs and somehow have simple structure to clearly reflect the desired
behavior [3]. Thus to consider both behaviorally and structurally, we propose a genetic
process mining approach coupled with a process complexity metric.
In fact, utilizing evolutionary computation in process mining research is not a new
concept. In 2006, Alves introduced the genetic algorithm (GA) to mine process
models due to its resilience to noisy logs and the ability to produce novel sub-process


Corresponding author.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 121–130, 2009.
© Springer-Verlag Berlin Heidelberg 2009
122 A. Wang et al.

combinations [4]. In this case, an individual is a possible model and the fitness
evaluates how well an individual is able to reproduce the behavior in the log. How-
ever, since the individual was abstracted to the level of a binary string, it had prob-
lems when mining certain processes, especially those exhibiting high level of parallel
execution. Thus, a genetic programming (GP) approach coupled with graph-based
representation was proposed in [5]. Its abstraction of a directed graph structure also
allows greater efficiency in evaluating individual fitness.
However, there are two main problems in these approaches: Firstly some relations
between process tasks are not considered in the individual representation; the other is
that they neglect the complexity of mined models. Thus we improve the GP approach
by extending the individual representation in [4] and define a new part of the fitness
measure that benefits individuals with simpler structure. The new structural fitness is
based on the structuredness metric (SM) proposed in [6], which is a process complexity
metric. Our paper is organized as follows: Section 2 defines the individual representa-
tion and section 3 extends the SM. The GP approach is shown in section 4 with
experiments discussed in section 5. Finally, conclusions are drawn in section 6.

2 Extended Causal Matrix


In GP, an individual is an effectively executable program and its representation defines
the search space. The main requirement for the representation is to express the de-
pendencies between process tasks in logs. In our improved GP, programs are expressed
as extended causal matrix (ECM), which extends from the causal matrix (CM) repre-
sentation in [4]. The CM shows which tasks directly enable the execution of other tasks
via the matching of input (I) and output (O) condition functions. The relationship be-
tween tasks supported are restricted to ‘AND’ and ‘XOR’ only.

Table 1. Log of the application example

Id numSimilar Process instances


1 30 A, E, F
2 45 A, B, C, D, F
3 52 A, C, B, D, F

As we can see, without supporting other business logic, previous GA and GP ap-
proaches may cause unexpected errors. For instance, consider the event log shown in
Table 1. It shows three different process instances for job applicants. Note that when
HR screens resumes (A), he/she will decline the resume (E) or arrange examinations
including interview (B) and a written test(C) and score the applicant (D) when his/her
resume is passed. Finally, the result is sent to the applicant (F).
The result model is expected to be Fig. 1(a), where the black rectangle is an implicit
task. However, previous approaches will wrongly generate Fig. 1(b) since the indi-
vidual’s representation in these approaches cannot express the relation of exclusive
disjunction of conjunctions. Therefore, we give an improved version of the causal
matrix that supports other relations, i.e., the extended CM (ECM for short).
A GP Process Mining Approach from a Structural Perspective 123

Fig. 1. Expected model (a) and mined result using previous GA and GP (b)

ECM can be denoted as a quadruple (T, C, I, O), where T is a finite set of activities
(tasks) appeared in workflow logs, C ⊆ T × T is the causality relation and I and O are
the input and output condition functions respectively.

Fig. 2. An example of the tree representation

Note that in previous GA approaches, condition functions in CM are expressed as


lines of code. Instead, our GP approach utilizes a syntax tree structure with maximum
depth 2. Given a task t, its input and output condition functions are called input tree and
output tree respectively. The tasks in I(t)/O(t) are leaves of the corresponding tree (if
there exists only one task, it is the root), while operations (XOR, AND and OR) are
internal nodes. The expression can also be represented in a prefix form. For example,
Fig. 2 shows the tree representation of a prefix-notation program XOR (B, AND(C,
D)), we also call it a XOR tree since the root operation is XOR. Apparently, ECM is a
general version of CM since more task relations can be reflected by ECM. The ECM for
our example is shown in Table 2 below.

Table 2. An example of ECM

TASK I(input tree) O(output tree)


A () XOR(B, AND(C, D))
B, C, D (A) (F)
E AND(C, D) (F)
F XOR(B, E) ()

The reason why we choose the tree type in ECM is that it provides the feasibility to
express the relation OR. Moreover, it facilitates some operations such as crossover and
mutation since it allows entire sub-tree structures to be swapped over with ease. Note
that there may be duplicate tasks in sub trees. Similar to original CM, our extended CM
can also deal with the loop structure.
124 A. Wang et al.

3 Complexity Metric Based on ECM

The GP approach in this paper tries to mine process models not only from the behavior
perspective but also from the structure one. This is done by defining new partial
structure fitness so as to mine process models from the structural perspective. We need
first introduce a process complexity metric based on ECM. Until now, many com-
plexity metrics of process models have been proposed and we choose the structured-
ness metric (SM)[6] in this paper. The SM focuses on the model structure. Addition-
ally, both the GA mining algorithm and SM have been implemented in the context of
ProM, which is characterized by a pluggable framework for process analysis, thus some
modifications can be made to implement the improved GP.
In this section, we extend the Petri net version of SM in [6] to propose a ECM
based SM. The idea behind this metric stems from the observation that process
models are often structured in terms of the combination of basic patterns such as
sequence, choice, parallelism and iteration [7]. To calculate the complexity, we de-
fine the component as a sub ECM to identify different kinds of structures in the ECM
and then score each structure by giving it some “penalty” value. Finally, the sum of
these values is used to measure the complexity of process models, i.e., the complexity
of the individual in the GA. To understand the idea, some definitions are given
below:
Firstly, some structural properties based on ECM are discussed.

Definition 1. Free choice. An ECM is a free-choice ECM iff for every two tasks t1 and
t2, I(t1) ∩ I(t2)≠null implies I(t1) = I(t2)
Definition 2. State machine. An ECM is a State machine iff for every task t, there exists
no AND/OR in I(t) and O(t).
Definition 3. Marked graph. An ECM is a marked graph iff for every task t, there exists
no XOR in I(t) and O(t).
A component corresponds to a behavioral pattern and is basically an ECM

Definition 4. Component. Let ECM(T, I, O) be an ECM representation of an individual


process model, C is a component iff C is a sub closure of the ECM.
The definition of a component is used to iteratively identify patterns. To define
different patterns, the following notations are needed..

Definition 5. Component notations. Let ECM(T, I, O) be a process model.


z ECM||C is a function that maps a component to an ECM.
z [ECM] is the set of non-trivial components of ECM.
To find components, we give the following definitions.
Definition 6. Annotated ECM. ECM=(T, I, O, τ ) is an annotated ECM iff (T, I, O) is
an ECM, and τ : T → ℜ+ is a function that assign a penalty to each task in ECM. Note
that initially for each task t in ECM, τ (t ) is 1.
A GP Process Mining Approach from a Structural Perspective 125

Definition 7. Component types. Let (T, I, O) be a ECM, C [ECM] a component in


ECM, and ECM||c=(TC, IC, OC), C is a:
z Sequence component, iff ECM||C is a state machine and a marked graph.
z Maximal Sequence component iff for ∀C ' ∈ [ ECM ] , C ′ is a sequence com-
ponent: C ≠ C ' ⇒ C ⊄ C ' .
z CHOICE component, iff every task shares the same I and O.
z WHILE component, iff T is a triple set T(ti, t, to) and O(ti) = XOR(t, to), I(t) = ti,
O(t) = to, I(to) = XOR(ti, t).
z MARKED GRAPH component, iff ECM||C is a marked graph ECM.
z MAXIMAL MARKED GRAPH component iff for ∀C ' ∈ [ ECM ] , C ′ is a
MAKED-GRAPH component: C ≠ C ' ⇒ C ⊄ C ' .
z STATE MACHINE component, iff ECM||C is a state machine ECM.
z MAXIMAL STATE MACHINE component, iff for ∀C ' ∈ [ ECM ] , C ′ is a
STATE MACHINE component: C ≠ C ' ⇒ C ⊄ C ' .
z WELL-STRUCTURED component, iff for every task t in ECM, if O(t) is a
XOR-tree, every task in O(t) shares the same input; if I(t) is a
AND-tree/OR-tree, there is no XOR in its branches; no cycles.
z MAXIMAL WELL-STRUCTURED component, iff for ∀C ' ∈ [ ECM ] , C ′ is a
WELL-STRUCTURED component: C ≠ C ' ⇒ C ⊄ C ' .
z UNSTRUCTURED-component, if none of above definitions applies for C.
z MINIMAL UNSTRUCTURED component, iff for ∀C ' ∈ [ ECM ] , C ′ is a
UNSTRUCTURED component : C ≠ C ' ⇒ C ⊄ C ' .
To calculate the ECM-based SM, we try to identify atomic patterns in the form of
component types defined above and then iteratively replace them by tasks. Each time
when a component is removed, the penalty value of this component is calculated and
associated with a new task. By doing this, ECM is reduced in several steps until the
trivial ECM with just one task. The SM is then associated with this task.
Next, we introduce the priority of a component type and its corresponding penalty
value, and finally the algorithm that calculates the complexity is given. The order in
which components are reduced to tasks is as follows: MAXIMAL SEQUENCE com-
ponent, CHOICE component, WHILE component, MAXIMAL MARKED GRAPH
component, MAXIMAL STATE MACHINE component, MAXIMAL WELL
STRUCTURED component and other components. Thus we first select a MAXIMAL
SEQUENCE component which is endowed with the highest priority.
Definition 8 Component penalty function. Let ECM = (T ', I ', O ',τ ) be an annotated
ECM, we define the component penalty ρ ECM : [ ECM ] → ℜ+ as follows. For
any C ∈ [ ECM ] , the penalty value associated to C is given in Table 3.
In Table 3, diff(P) is |totalNum of XOR in O(T) - totalNum of XOR in I(T)|, and
diff(T) is |totalNum of AND\OR in O(T) – totalNum of AND\OR in I(T)|. For details
on the settings of different component weight, we can refer to [6].
The algorithm that calculates the SM based on ECM is shown below.
• The program calculates the ECM of an individual
126 A. Wang et al.

Table 3. The penalty value of different component type

C (Component) ρ ECM (C )
MAXIMAL SEQUENCE ∑ t ∈T
τ (t )
CHOICE 1.5 ⋅ ∑ t ∈T
τ (t )
WHILE ∑ t ∈{ I ( C ), O ( C )}
τ ( t ) + 2 ⋅ ∑ t ∈ T τ (t )
MAXIMAL MARKED GRAPH 2⋅∑ t ∈T
τ (t ) ⋅ diff (T )
MAXIMAL STATE MACHINE 2⋅∑ t ∈T
τ (t ) ⋅ diff ( P )
MAXIMAL WELL STRUCTURED 2⋅∑ t ∈T
τ (t ) ⋅ diff ( P ) ⋅ diff (T )
otherwise 5⋅∑ t ∈T
τ (t ) ⋅ diff ( P ) ⋅ diff (T )

(i) X:=(ECM, τ ), where τ (t) = 1, ∀ t∈ T,


(ii) while [X] ≠ ∅ (i.e., X is not reduced to a ECM with
just a single task)
(iii) pick C so that C is the first priority in every
component C ' ∈ [ X ] ,
(iv) yield ECM’ by removing C from ECM and replace it by
a new task tc
(v) τ ’(tc)= ρ (C ) and τ ’(c)= τ (c) for all other t,
X

(vi) X:=(ECM’, τ ' ),


(vii) Output SM(ECM)= τ (t ) (T={t} after the ECM is reduced).

4 Genetic Process Mining Algorithm


In this section, we explain how our GP-based process mining works, and show its
ability of supporting more combined constructs and mining “uncomplex” process
models. There’re six main stages in our GP process mining algorithm: 1) Read work-
flow log; 2) Calculate dependency relations among tasks; 3) Build initial population; 4)
Calculate individuals’ fitness; 5) Return the fittest individuals; 6) If generation limit is
reached or optimum individual is found (i.e. fitness = 1.0), stop and return, else create
next population (using elitism, crossover and mutation) and do 4).

4.1 Initial Population

We build the initial causal matrices in a heuristic way, which tries to determine the
causality relation by utilizing a dependency measure, which aims to ascertain the
strength of the relationship between tasks by calculating the amount of times one task is
directly preceded by another. The measure is also able to determine which tasks are in a
loop.
Once the causality relation is determined, the condition functions I and O are ran-
domly built. i.e., every task t1 that causally precedes a task t2 is randomly inserted as a
leaf in the tree structure, which corresponds to the input condition function of t2. Op-
erators (AND, OR and XOR) are also inserted randomly to construct a tree. A similar
process is done to set the output condition function of a task.
A GP Process Mining Approach from a Structural Perspective 127

4.2 Fitness Function

Fitness function is used to assess the quality of an individual and this is assessed by:
1) benefiting the individuals that can parse more event traces in logs (the “com-
pleteness” requirement);
2) punishing the individuals that allow for more extra behavior than the one ex-
pressed in logs (the “preciseness” requirement);
3) punishing the individuals that are complex, i.e., individuals with high SM
value(the “uncomplexity” requirement). Equation (1) depicts the fitness function of our
GP-based process mining algorithm, in which L is event logs and ECM is an individual.
The notation ECM[] represents a generation of process models. Three partial fitness
measures are detailed below.

Fitness( L, ECM , ECM []) = PFcomplete( L, ECM ) (1)


−κ * PFprecise( L, ECM , ECM [])
− χ * PFuncomplex ( ECM , ECM [])

In equation (1), the functions PFcomplete and PFprecise derived from the previous GA
approach are measured from the behavior perspective for completeness and precise-
ness. Since ECM is a general version of CM, the functions are nearly the same as those
in previous GA approach [8].
The function PFcomplete shown in equation (2) is based on the continuous parsing
of all the traces against an individual in event logs. In this equation, all missing tokens
and extra tokens left behind act as a penalty in the fitness calculation. More details can
be found in [8] Obviously, ‘OR’ construct parsing should also be considered.
PFcomplete( L, ECM ) = (2)
allParseTasks ( L, ECM ) − punishment ( L, ECM )
numTasksLog ( L)

In terms of preciseness, the individual should contain “less extra behavior”, i.e., the
individual tends to have less enabled activities. PFprecise in equation (3) provides an
measure of the amount of extra behavior an individual allows for in comparison to other
individuals in the generation [8].
PFprecise ( L, ECM , ECM []) = (3)
allEnabledTasks ( L, ECM )
max(allEnabledTasks ( L, ECM []))

As mentioned before, the GA approach and most other process mining techniques
focus on the behavior in logs but ignore the structure of the actual individual. Thus we
defined the uncomplex fitness in equation (4). It gives the transformation of the
structuredness metric values to the interval [0, 1], where max and min are the maximum
and minimum value of the entire generation of individuals respectively.
128 A. Wang et al.

PFuncomplex ( ECM , ECM []) = (4)


SM ( ECM ) − min( SM ( ECM []))
max( SM ( ECM [])) − min( SM ( ECM []))

4.3 GP Operators

Genetic operators such as selection, crossover and mutation are used to generate indi-
viduals of the next generation, and the operators in [8] have been adapted for our GP
approach. In our case, the next population consists of the best individuals and others
generated via crossover and mutation. And parents are selected from a generation by
playing a five individual tournament selection process.
In terms of crossover, a task, which exists in both parents, is selected randomly as
the crossover point. The input and output tree of the task are then split at a randomly
chosen swap point (a set of tree nodes) for each parent. Our crossover algorithm covers
the complete search space defined by ECM. The following is an illustration of the
crossover algorithm, which shows the input trees of two parents for task K.
• Crossover algorithm in our GP process mining approach
parentTask1 = AND(XOR(A,B,C),OR(B,D),E)
parentTask2 = OR(AND(A,B),XOR(C,D),F)
swap1 = XOR(A,B,C),OR(B,D); remainder1 = E
swap2 = AND(A,B),F; remainder2 = XOR(C,D)
If crossover, for each branch in swapset1
If the selected branch shares the same operation with one
of the remainder2 (e.g., XOR(A,B,C) and XOR(C,D) are both
XOR), with the equal probability, select one of the
following three crossover methods:
i) Add it as a new branch in remainder
(e.g., XOR(A,B,C),XOR(C,D))
ii) Join it with an existing branch in remainder2
(e.g., XOR(A,B,C,D))
iii) Add as a new branch, and then select the branch
with the same operation in remainder2 and remove the tasks
that are in common(i.e., the same tasks)
(e.g., XOR(A,B,C),D)
Else select one crossover method conditionally.
If the operation of selected branch is the same as the
root of parentTask2
Add as a new leaf of the root in remainder2 for each
task in selected branch
(e.g., XOR(C,D),B,D)
Else add as a new branch in remainder2
Repeat for the combination swap2 and remainder1
In the crossover code, some examples are shown to make sure you can understand.
After execution, the result child for input2(K) may be one of these:
OR(XOR(A,B,C),XOR(C,D),B,D);OR(XOR(A,B,C,D),B,D); OR(XOR(A,B,C),B,D).
The cycle is repeated for both input and output trees of the selected task.
A GP Process Mining Approach from a Structural Perspective 129

In the mutation, with probability mutation rate, every task in the individual is mu-
tated for its input and output tree. The following program depicts the mutation.
• Mutate algorithm in our GP process mining approach
If mutation, for each task t in the individual (Assuming I(t)
= AND(XOR(A,B,C),OR(B,D),E)), one of the following opera-
tions are done:
i) Choose a branch and add a task to it (randomly chosen
from the complete set of available tasks in the individual),
if the branch is a leaf (i.e., one task), randomly add an
operation on the new branch.
(e.g., XOR(A,B,C),OR(B,D,F),E)
ii) Remove a task from a chosen branch
(e.g., XOR(A,B,C),OR(B,D))
iii) Change the operation for the chosen branch
(e.g., XOR(A,B,C),XOR(B,D),E)
iv) Redistribution the elements in I(t)
(e.g., XOR(B,C),OR(A,B),D,E)
Repeat it for output trees of each task
Both crossover and mutation operators utilize a repair routine executed after an in-
put/output has been changed to make sure that only viable changes are made.

5 Experiments
We have implemented our GP process mining approach in Java and plug it into the
ProM framework. The experiments in this section allow us to validate the approach.
Table 4 shows the parameters being used in the experiments.

Table 4. Parameters for the experiment

Population size 100


Maximum generations 1000
Elitism rate 0.02
Crossover rate 0.8
Mutation rate 0.02
κ (extra behavior fitness reduction) 0.025
χ (complex fitness reduction) 0.025

Fig. 3. Two mined models (a) using previous GA/GP approach; (b) using our GP approach
130 A. Wang et al.

Fig. 4. Another two mined models (a) using previous GA/GP approach; (b) using our GP

To compare with the previous GA/GP approach, we have tested various workflow
logs. For short, we give two typical comparisons. Fig. 3(b) is the expected model and
our GP approach has successfully mined it while previous approaches produced a
confusing model since our GP approach supports the task relation of disjunction of
conjunctions. Fig. 4 shows two mined results when mining process with short parallel.
And it illustrates that the structurally uncomplex fitness help our GP method outper-
forms when dealing with process models containing short parallel constructs.

6 Conclusions
The advantage of our GP process mining approach is that it can mine process behav-
iorally and structurally. This owes to the novel idea of combining the genetic
programming with the structuredness metric. Additionally, the extended CM for rep-
resenting the individuals provides benefits in crossover and mutation. Thus, our GP
approach outperforms other process mining approach in some aspects.

References
[1] van der Aalst, W.M.P., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters,
A.J.M.M.: Workflow mining: a survey of issues and approaches. Data & Knowledge En-
gineering 47(2), 237–267 (2003)
[2] Cardoso, J.: Control-flow complexity measurement of processes and Weyuker’s properties.
Transactions on Enformatika, Systems Sciences and Engineering 8, 213–218 (2005)
[3] Rozinat, A., van der Aalst, W.M.P.: Conformance Testing: Measuring the Fit and Appro-
priateness of Event Logs and Process Models. In: Bussler, C.J., Haller, A. (eds.) BPM 2005.
LNCS, vol. 3812, pp. 163–176. Springer, Heidelberg (2006)
[4] Alves de Medeiros, A.K., Weijters, A.J.M.M.: Genetic Process Mining. Ph.D Thesis,
Eindhoven Technical University, Eindhoven, The Netherlands (2006)
[5] Turner, C.J., Tiwari, A., Mehnen, J.: A Genetic Programming Approach to Business
Process Mining. In: GECCO 2008, pp. 1307–1314 (2008)
[6] Lassen, K.B., van der Aalst, W.M.P.: Complexity metrics for Workflow nets. Information
and Software Technology 51, 610–626 (2009)
[7] van der Aalst, W.M.P., ter Hofstede, A.H.M., Kiepuszewski, B., Barros, A.P.: Workflow
patterns. Distributed and Parallel Databases 14(1), 5–51 (2003)
[8] Alves de Medeiros, A.K., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic process min-
ing: an experimental Evaluation. Journal of Data Mining and Knowledge Discovery 14(2),
245–304 (2007)
Effects of Diversity on Optimality in GA

Glen MacDonald and Gu Fang

School of Engineering, University of Western Sydney,


Locked Bag 1797, Penrith South DC, NSW 1797, Australia
[email protected], [email protected]

Abstract. Genetic Algorithms (GA) is an evolutionary inspired heuristic search


algorithm. Like all heuristic search methods, the probability of locating the op-
timal solution is not unity. Therefore, this reduces GA’s usefulness in areas that
require reliable and accurate optimal solutions, such as in system modeling and
control gain setting. In this paper an alteration to Genetic Algorithms (GA) is
presented. This method is designed to create a specific type of diversity in order
to obtain more optimal results. In particular, it is done by mutating bits that are
not constant within the population. The resultant diversity and final optimality
for this method is compared with standard Mutation at various probabilities.
Simulation results show that this method improves search optimality for certain
types of problems.

Keywords: Genetic Algorithms, diversity, mutation, optimality.

1 Introduction
Genetic Algorithm (GA) is an Evolutionary Computation (EC) method proposed by
Holland [1] which searches the binary space for solutions that satisfy pre-defined
criteria. GA is a heuristic search method that contains three operators: Selection,
Combination and Mutation. It operates on a collection of individuals (populations). A
population contains individuals which have the potential to differ from each other.
The GA works by identifying individuals which most closely satisfy pre-defined
criteria, called selection. These selected individuals (the fittest) are combined. Each
combination cycle is called a generation. To create new individuals, the combined
individuals must differ. The amount of difference in the population is called its diver-
sity. Combination occurs by swapping parts between the individuals. The combination
methods can vary significantly [2-6]. Goldberg [3] suggested Single Point Crossover
(SPC) and Multi-Point Crossover (MPC) as pattern identifying crossover methods.
Punctuated Crossover [4] is a deterministic crossover method which alters the prob-
ability based on the success of previous crossover points. Random Respectful Cross-
over [5] produces offspring by copying the parts at positions where the parents are
identical and filling the remaining positions with random members from the bit set.
Disrespectful Crossover (DC) [6] is a variant on SPC, where the bits in the section
below the crossover point are inverted if they are different or have a 50% chance of
being in either state if they are the same. DC is essentially a localized high probability
mutation algorithm incorporated into a crossover function. Its purpose is in encourag-
ing search divergence, not search convergence. Reduced Surrogate [4] randomly

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 131–140, 2009.
© Springer-Verlag Berlin Heidelberg 2009
132 G. MacDonald and G. Fang

selects bits which are different as the crossover points for single or double point
crossover. This has the advantage of not performing redundant crossovers and is a
more efficient crossover as search diversity decreases. Hoshi [2] showed that encour-
aging similarly ranked individuals to combine resulted in superior solutions.
If the combination of two different individuals can lead to all individuals in the
search space then it is possible that the globally optimal solution will be found. How-
ever, in GA, like in biology, Combination and Selection are what makes the search
converge, as traits in individuals become more common after successive generations.
It is believed that these traits are what make the individuals ‘fit’. However these
‘traits’ could be sub-optimal. If this is the case, a population will fail to continually
improve and will have reached a locally optimal solution. This means that the GA
may not produce an optimal solution for the problem, i.e., the global optima.
To encourage the population to avoid local optima, individuals are altered in a
probabilistic manner, called mutation [3]. This is modeled on biological mutation,
which is the alteration of individual genes within an organisms DNA. Commonly,
mutation is implemented in GA by probabilistically changing bit states, whereby
every single bit in an individual has a specific probability of changing state. This type
of mutation is also known as De-Jong mutation [7].
The effectiveness of De-Jong mutation has proven to be dependent on the mutation
probability and the effectiveness of a mutation probability has been shown to be prob-
lem dependent [8-13]. The mutation probability affects the probability of an individ-
ual being mutated as well as the number of bits mutated within that individual. A high
mutation probability means that an individual has a high chance of having a large
number of bits altered; this results in a highly random search which mostly ignores
convergence through crossover. A low mutation probability means that an individual
has a low chance of having a small number of bits altered; this creates a search which
is highly reliant on combination.
Searches which ignore combination do not converge. Searches which are entirely
reliant on combination are prone to premature convergence. Mutation alone amounts
to a random search and forever diverges. Mutation when combined with selection and
combination is a parallel, noise tolerant, hill-climbing algorithm [8]. Hill-climbing
algorithms guarantee the location of a local optimum, but will be deceived by a prob-
lem with many local optima. Methods of determining the mutation probability based
on generation [9], population size [10], length [11], crossover success and the rate of
change in fitness have been investigated [12]. A method which prolongs the GA
search time by systematically varying the mutation rate based on population diversity
has significantly improved the final solution optimality [13]. These mutation strate-
gies have improved the GA in a problem independent fashion. However they do not
present any significant improvement to the ideology behind mutation. They do not
address the search algorithm, they primarily present methods of altering one of the
algorithms parameters in order to prolong the search or more reliably disrupt conver-
gence. This is unlikely to make the search more efficient or more robust but rather
just make it more exhaustive. They also do not allow convergence and divergence to
be complimentary.
In this paper a new mutation method is introduced to improve the diversity of
population during the convergence. This method also intends to ensure that the
convergence is not disrupted by the mutation process.
Effects of Diversity on Optimality in GA 133

The remainder of the paper is organized as follows: section 2 details the methodol-
ogy used for maintain the diversity while ensuring convergence. The numerical
results are shown in section 3. The conclusions are then given in section 4.

2 The Proposed Mutation Method

2.1 Related Work

The difference between individuals in GA is primarily measured by the Euclidean


distance or Hamming Distance [14]. Hamming Distance [15] is the number of bits
which differ between the individuals. Diversity for a population is generally the aver-
age of the distance measure between all individuals.
Hamming Distance was used as the diversity measure in this paper as it is problem
independent. It also takes into account the unique nature of the binary space, which is
lost in the Euclidean measurement; the implemented GA is binary coded. In this paper
diversity measurement was primarily used as an analysis tool and is the average
Hamming distance between all unique individuals in the population.
When a bit has a constant state within the population it is said to have converged,
as in all future populations this bit state will never alter without mutation. When a
search is converging it has a declining population diversity. When a search has con-
verged the population has no useful diversity as combination will never create differ-
ing individuals. Traditional mutation alters every bit with a pre-set probability. Thus it
will probabilistically interrupt convergence at some point during the search. Dynamic
alteration of this convergence interruption has shown to improve the search result
[12], [13].

2.2 Proposed Method

The number of unique individuals which can be created, in a string of length n at a


distance d = [0 : n ] , through the combination of two dissimilar individuals can be
determined by the combinations equation:
n!
N d = Cnd = . (1)
D! ( n − D )!

where Nd is the number of individuals at distance d and D is the Hamming distance


between the individuals. D can be calculated as:

n n
D= − −d . (2)
2 2

In general, mutation will always be a hill-climbing algorithm and a dynamic De-Jong


mutation rate can only compromise between hill-climbing and unguided, convergence
disturbing divergence.
The proposed mutation method is named Coherent Diversity Maintenance (CDM).
This method maintains population diversity by the following means:
134 G. MacDonald and G. Fang

1. Deliberately targets non-unique individuals to be replaced with ‘Random Immi-


grants (RI)’. ‘Random Immigrants’ are randomly generated individuals.
2. Preserves bit convergence by ensuring the new RI contain the bit states con-
verged within the rest of the population.
Fundamentally, CDM provides the crossover with more samples from the current
search space as the search space for each generation is reduced through the conver-
gence of the last generation. However, CDM differs from using RI only, as RI ignores
convergence and provides crossover with new samples from the entire search space.
The effectiveness of RI style mutation is dependent on the correlation between popu-
lation size and search space size. If the search space is significantly larger than the
population size, a diverse population will give combination a more incoherent and
disconnected sample. As the search space reduces, CDM will become more beneficial
than RI alone as the CDM samples will be more representative of the converged
search space.
To evaluate the effectiveness of the CDM, it was compared with Random Immigra-
tion, standard De-Jong mutation of varying probabilities and a GA with no mutation by
using many different types of benchmark functions in the following section.

2.3 Mutation Rate

Mutation traditionally requires the specifying and tuning of the rate at which mutation
affects the population. The mutation rate for the CDM and the implemented version
of RI was determined by the number of repeated individuals in the population. There-
fore, this method requires the comparison of all individuals in the population.
The computational cost of this comparison is:

(N 2 / 2 − N ) × L . (3)

where L is the length of individual and N is the number of individuals in a population.


The mutation rate is self regulated by the amount of diversity in the population. The
repetitive combination of like individuals has a higher probability of creating identical
individuals than the combination of relatively dissimilar individuals. Thus, as the
population diversity decreases and combination results in more identical individuals,
the mutation rate increases. Replacing only repetitive individuals has the benefit of
not interrupting convergence because the product of combination is not ignored or
diverted unless combination creates redundant individuals. This ensures that conver-
gence and divergence do not compete within the search.
Even when the mutation rate becomes quite high it will not remove individuals
which are useful for selection. The main disadvantage of this is that it does not guar-
antee the avoidance of a local optima if found.

3 Results
The GA testbed used ranking selection without elitism and was tested on two popula-
tion sizes, 50 and 100, over 20 generations. The tests were conducted using both sin-
gle point and dual point crossover. The mutation methods were tested on six (6)
benchmark functions with various shapes: Two Multi-Modal functions, two single
Effects of Diversity on Optimality in GA 135

optima functions and two flat functions. The functions were tested over 3 to 5 dimen-
sions with 6 to 12 Bit resolution per dimension. The mutation methods were com-
pared with respect to diversity, and the final fitness. The results were averaged over
100 tests.

3.1 Benchmark Functions

The following six (6) bench mark functions [16] are used in this paper to evaluate the
effectiveness of the proposed CDM. They are:
Two single-optimum functions:

1. The Ackley function:


⎛ ⎞
⎟ − exp ⎛⎜ 1 ⎞
n n
f ( x ) = −20 exp ⎜ − 0.2 ∑x ∑ cos(2πx ) ⎟⎟ + 20 + e
1 2 (4)
.
⎜⎜ ⎟⎟ ⎜n
i i
n ⎝ ⎠
⎝ i =1 ⎠ i =1

−32 ≤ xi ≤ 32, xmin = (0,L,0), and f ( x min ) = 0.


2. The Sphere function:
n
f ( x) = ∑x
i =1
2
i . (5)

−50 ≤ xi ≤ 50, xmin = (0,L,0), and f ( xmin ) = 0.

Two multi-optima functions:


3. Rastrigin’s function:
n
f ( x) = ∑( x
i =1
2
i − 10 cos(2πxi )) + 10n . (6)

−5.12 ≤ xi ≤ 5.12, xmin = (0,L,0), and f ( xmin ) = 0.

4. Griewank’s Function:
n n
⎛ xi ⎞
∑ x − ∏ cos⎜⎜⎝
1
f ( x) = 2
i ⎟⎟ + 1 . (7)
4000 i =1 i =1 i⎠

−100 ≤ xi ≤ 100, xmin = (0,L,0), and f ( xmin ) = 0.

And two flat-optimum functions:


5. Zakharov’s Function:
2 4
n ⎛ n ⎞ ⎛ n ⎞
f ( x) = − ∑ +⎜
x i2
⎜ ∑
0 .5ix i ⎟ + ⎜
⎟ ⎜
0 .5ix i ⎟ .
⎟ ∑ (8)
i =1 ⎝ i =1 ⎠ ⎝ i =1 ⎠
−10 ≤ xi ≤ 10, xmin = (0,L,0), and f ( xmin ) = 0.
136 G. MacDonald and G. Fang

6. Rosenbrock’s Function:
n
f ( x) = ∑ (100( x
i =1
i +1 − xi2 ) 2 + ( xi − 1) 2 ) . (9)

−10 ≤ xi ≤ 10, xmin = (1,L,1), and f ( xmin ) = 0.

3.2 Numerical Results of Fitness

The GA results, in terms of the mean final fitness, are shown in Tables 1 – 6 of the
above six benchmark functions. The results were obtained by using different mutation
methods – the proposed CDM method, the completely random RI method and De-
Jong method with 0.001, 0.002, 0.005 and 0 probability mutation rates.

Table 1. Ackley function final fitness comparison

Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/6-Bit 0.38 0.86 1.21 0.62 0.16 1.89
Dual point/ 6-Bit 1.56 3.24 2.86 2.14 0.92 4.16
Single point/ 12Bit 0.3 0.39 0.54 0.36 0.07 0.9
Dual point/ 12Bit 1.58 2.11 1.79 1.26 0.5 2.56

Table 2. Sphere function final fitness comparison

Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/ 6-Bit 0.1 0.18 0.24 0.12 0.01 0.62
Dual point/ 6-Bit 0.07 0.08 0.07 0.04 0.01 0.2
Single point/ 12Bit 0.81 1.77 1.31 0.77 0.1 3.22
Dual point/ 12Bit 0.66 0.9 0.65 0.27 0.03 1.43

Table 3. Rastrigin’s function final fitness comparison

Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/ 6-Bit 1.66 2.34 4.56 4.15 4.04 4.89
Dual point/ 6-Bit 1.59 1.89 2.98 3.04 2.69 3.3
Single point/ 12Bit 6.55 9.76 10.56 9.74 7.38 15.81
Dual point/ 12Bit 5.34 6.06 5.91 5.76 4.72 7.27

Table 4. Griewank function final fitness comparison

Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/ 6-Bit 0.04 0.05 0.07 0.05 0.03 0.11
Dual point/ 6-Bit 0.04 0.03 0.05 0.04 0.03 0.07
Single point/ 12Bit 0.08 0.09 0.09 0.07 0.05 0.16
Dual point/ 12Bit 0.06 0.06 0.07 0.05 0.04 0.09

Table 5. Zakharov’s function final fitness comparison

Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/6-Bit 8.9 29.6 54.3 35.4 19.4 115
Dual point/6-Bit 11.1 17.9 22.7 16.4 10.6 42
Single point/12Bit 294 448 430 136 55.1 1419
Dual point/12Bit 165 257 149 84.2 33.8 305
Effects of Diversity on Optimality in GA 137

Table 6. Rastrigin’s function final fitness comparison

Cross over points/ no. of bits CDM RI 0.001 0.002 0.005 None
Single point/6-Bit 0.01 0.01 0.07 0.06 0.05 0.09
Dual point/6-Bit 0.01 0.01 0.05 0.05 0.04 0.05
Single point/12Bit 0.01 0.03 0.1 0.06 0.03 0.17
Dual point/12Bit 0.03 0.02 0.05 0.03 0.04 0.11

From these results it can be seen that the RI produces worst results than the CDM
methods. This is expected as the CDM is designed to encourage diversity whilst still
maintaining population convergence.
The more interesting aspects of the results are observed in the fitness results for the
two single-optima functions (Tables 1-2) of CDM in comparison with the De-Jong
methods. It seems that the CDM is not generating better results than the normal prob-
abilistic mutation methods. It is believed that this is due to the fact that single optima
solutions can be found using hill-climbing methods that the De-Jong’s method falls
into. Therefore, the normal mutation method will generate acceptable results.
The CDM in general will generate better or similar results for the multi-optima func-
tions (Tables 3-4). In these cases, as the CDM is designed to encourage diversity, it is
likely to generate better, or at least no worse results than the other mutation methods.
In dealing with the flatter-optima functions (Tables 5-6), it can be seen that the
CDM’s final results are compatible to those from the normal mutation methods. This
is expected as on the flatter surface diversity will not have a significant impact on
diversity outcomes.
It is also noted in the computation that the CDM in general performs better in deal-
ing with simpler cases, i.e., with single cross-over point and with shorter bit length in
population. CDM performed relatively worse with dual point crossover (DPC) be-
cause DPC is better than single point crossover (SPC) in maintaining population di-
versity. It is believed that CDM performed relatively worse when tested with longer
individuals because the search space became too large for the given population size to
be effective.

3.3 Numerical Results on Diversity

During the computation, it is noted that the diversity varies significantly with different
mutation rates and mutation methods. It was also found that it varies between bench-
mark functions. However, the most significant effect on diversity came with search
space size, which is a combination of parameter number and parameter resolution.
Shown in Fig. 1 is the average population hamming distances for all tests in 6-Bit
and 12-Bit parameter resolutions.
It is clear that there are marked differences in search space sizes between 6-Bit and
12-Bit parameter resolutions. The average distance between all solutions at the begin-
ning for a 6-Bit 5-dimensional case is 12 Bits while it is 25 Bits for the 12-Bit case.
In all tests the diversity measurement was made after the crossover. This means
that only the useful diversity created through the mutation method is recorded, i.e.,
when the mutation method creates individuals with above average fitness and a larger
bit difference, then it will improve the diversity of the next generation. If mutation
does not create fitter individuals with a larger distance from the other relatively fit
individuals then it will not improve the next generation’s diversity.
138 G. MacDonald and G. Fang

Fig. 1. Diversity Summary

Figure 1 shows the generational diversity summaries for the 6-Bit and 12-Bit tests.
In the figure, CDM refers to Coherent Diversity Maintenance mutation. RI is for Ran-
dom Immigrants mutation and DJ .001 – DJ .005 refer to the De-Jong Mutation with
the stated probability.
It can be seen from the figure that the diversity levels of CDM and RI are similar
for both parameter resolutions. However CDM consistently creates higher population
diversity during the mid generations, whilst tapering off at the final generations. The
lower diversity relative to RI in the final generations is to be expected as CDM only
creates individuals which are coherent with the converging search space whereas RI
disregards the convergence of the search space.
RI and CDM encourage roughly equivalent diversity levels for all tests but pro-
duce significantly different final fitness results, thus one can conclude that there is an
inconsistent link between diversity and superior convergence. As RI encourages new
individuals to come from the entire search space and CDM encourages new indi-
viduals to come from the converging search space one can conclude that specific
diversity gives better results than unspecific diversity. This occurs because as con-
vergence progresses, CDM is able to provide crossover with a more representative
sample of the converging space which enables it to better select the next generation
of individuals.
Effects of Diversity on Optimality in GA 139

4 Conclusions

In this paper a new mutation scheme, named Coherent Diversity Maintenance (CDM),
is introduced. This method is designed to encourage diversity in the GA population by
maintaining the convergence trend. It is believed that this method will explore the
converging parameter space more fully therefore resulting in better convergence re-
sults. Numerical results have shown that the CDM tends to generate better, or at least
no worse, results for multi-optima or flatter optima situations than the normal prob-
abilistic mutation method and complete random mutation method.

References
1. Holland, J.H.: Adaption in Natural and Artificial Systems: An Introductory Analysis with
Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge (1992)
2. Chakraborty, G., Hoshi, K.: Rank Based Crossover – A new technique to improve the
speed and quality of convergence in GA. In: Proceedings of the 1999 Congress on Evolu-
tionary Computation, CEC 1999, vol. 2, p. 1602 (1999)
3. GoldBerg, D.E.: Genetic Algorithms in search, optimization and machine learning. Addi-
son-Wesley Publishing Company, Reading (1989)
4. Dumitrescu, D., Lazzerini, B., Jain, L.C., Dumitrescu, A.: Evolutionary Computation.
CRC Press, Boca Raton (2000)
5. Radcliff, N.J.: Forma Analysis and Random Respectful Recombination. In: Proceedings of
the fourth International Conference on Genetic Algorithms (1991)
6. Watson, R.A., Pollack, J.B.: Recombination Without Respect: Schema Combination and
Disruption in Genetic Algorithm Crossover. In: Proceedings of the 2000 Genetic and Evo-
lutionary Computation Conference (2000)
7. De Jong, K.A.: Analysis of the Behaviour of a class of Genetic Adaptive Systems. Techni-
cal Report, The University of Michigan (1975)
8. Hoffmann, J.P.: Simultaneous Inductive and Deductive Modeling of Ecological Systems
via Evolutionary Computation and Information Theory. Simulation 82(7), 429–450 (2006)
9. Fogarty, T.C.: Varying the Probability of mutation in the Genetic Algorithm. In: The Pro-
ceedings of the Third International Conference on Genetic Algorithms 1989, pp. 104–109
(1989)
10. Hesser, J., Manner, R.: Towards an optimal mutation probability for Genetic Algorithms.
In: Schwefel, H.-P., Männer, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 23–32. Springer,
Heidelberg (1991)
11. Back, T.: Optimal Mutation Rates in Genetic Search. In: Proceedings of the Fifth Interna-
tional Conference on Genetic Algorithms, June 1993, pp. 2–8 (1993)
12. Davis, L.: Adapting Operator Probabilities in Genetic Algorithms. In: Proceedings of the
Third International Conference on Genetic Algorithms, pp. 61–69 (1989)
13. Ursem, R.K.: Diversity-Guided Evolutionary Algorithms. In: Guervós, J.J.M., Adamidis,
P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS,
vol. 2439, pp. 462–471. Springer, Heidelberg (2002)
14. Janikow, C.Z., Michalewicz, Z.: An experimental comparison of binary and floating point
representations in genetic algorithms. In: Belew, R.K., Booker, J.B. (eds.) Proceedings of
the Fourth International Conference Genetic Algorithms, pp. 31–36. Morgan Kaufmann,
San Mateo (1991)
140 G. MacDonald and G. Fang

15. Gasieniec, L., Jansson, J., Lingas, A.: Efficient Approximation Algorithms for the Ham-
ming Centre Problem. Journal of Discrete Algorithms 2(2), 289–301 (2004)
16. Kwok, N.M., Ha, Q., Liu, D.K., Fang, G., Tan, K.C.: Efficient Particle Swarm Optimiza-
tion: A Termination Condition Based on the Decision-making Approach. In: 2007 IEEE
Congress on Evolutionary Computation, Singapore (2007)
Dynamic Crossover and Mutation Genetic Algorithm
Based on Expansion Sampling

Min Dong and Yan Wu

Department of Computer Science and Technology, Tongji University,


1239 Si Ping Road, Shanghai, 200092, China
[email protected]

Abstract. The traditional genetic algorithm gets in local optimum easily, and its
convergence rate is not satisfactory. So this paper proposed an improvement,
using dynamic cross and mutation rate cooperate with expansion sampling to
solve these two problems. The expansion sampling means the new individuals
must compete with the old generation when create new generation, as a result,
the excellent half ones are selected into the next generation. Whereafter several
experiments were performed to compare the proposed method with some other
improvements. The results are satisfactory. The experiment results show that
the proposed method is better than other improvements at both precision and
convergence rate.

Keywords: genetic algorithm, expansion sampling, dynamic cross rate,


dynamic mutation rate.

1 Introduction

Practical application of genetic algorithm has the problems of premature convergence,


less robust in parameter selection, slow convergence speed at anaphase. These defi-
ciencies seriously limit the scope of application of genetic algorithms. How to im-
prove the performance of genetic algorithm to overcome the shortages so that it can
be better applied to practical problems, become the main subject of many scientists. In
recent years there are many improved genetic algorithms, for example, Self-adaptive
simulated binary crossover genetic algorithm [1], fuzzy adaptive genetic algorithm [2],
Multi-agent genetic algorithm [3], fast and elitist multi-objective genetic algorithm [4],
fully adaptive strategy genetic algorithm [5], quantum clone genetic algorithm [6], im-
proved mutation operator genetic algorithm [7] and so on. These new algorithms make
the performance of genetic algorithm improved, but still far from achieving the de-
sired state. In order to maintain the population diversity and avoid the high fitness
individual been selected too many times to occupy the entire population, this paper
presents a dynamic crossover probability and mutation probability. In order to im-
prove the algorithm’s convergence rate, this paper puts the new individuals with the
previous generation together and selects the best half individuals into the next-
generation. And then this paper proved the effectiveness of the improved algorithm
through several experiments.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 141–149, 2009.
© Springer-Verlag Berlin Heidelberg 2009
142 M. Dong and Y. Wu

2 Dynamic Crossover Probability

Traditional genetic algorithm uses the fixed cross-rate operator. Because the cross-
rate of individuals is the same, it will make all of the contemporary individuals in the
cross-operation retained at the same probability, thus the current better individuals
will be selected several times at the choice operation of the next round, and the rela-
tively poor individuals in current generation will be eliminated, leading the population
quickly evolving to the direction of the current optimal individual. If the current op-
timal individual is a local optimum, then the entire algorithm can easily fall into local
optimum. To avoid this situation and increase the diversity of the population, this
paper presents a dynamic cross-rate, namely using the ratio between the Euclidean
distance of two chromosomes and the Euclidean distance of the largest and the small-
est fitness individual in population as cross-rate:
| f (a ) − f (b) |
Pc = (1)
max( f ) − min( f )
where f(a) is a chromosome’s fitness, f(b) for b chromosome’s fitness, max(f) and
min(f) respectively are the largest and the smallest fitness in the population, Pc is the
crossover probability. Such cross-rate will make the individuals in the middle of
population retained at a greater chance, and the individuals at both ends of population
a greater probably cross, so that the better individuals and the poorer individuals are
both changed to avoid too intense competition in the choice operation of next round.
The poor individuals’ genes can also contribute to the development of the population.
The principle of this improvement is a simple probability problem of average dis-
tribution, in which, the crossover probability of a subject to the average distribution of
f(b). To illustrate this principle, we assume that max(f) for 2, min(f) for 0, f(b) average
distributes in the interval [0,2]. In this case, if an individual a1 is in the middle of the
interval, that is, its f(a1) is 1, then its cross-rate with b is 1/4, and if an individual a2 at
both ends of the interval, that is, its f(a2) is 0 or 2, then its cross-rate with b is 1/2,
larger than individuals in the middle. The distribution is shown in Figure 1 below
(x-axis is f(a) values, y-axis is crossover rate):
In addition, this algorithm’s dynamic crossover probability basically is effective
crossover probability (we call crossover between two long distance individuals the
effective cross-operation. Because of their relatively big difference between each
other, cross-operation changes them comparatively large. On the other side, the
cross-operation between very close chromosomes changes individuals very small,
the individuals after cross-operation almost unchanged, such cross-operation is inef-
fective),which can improve the effect of cross-operation and avoid inbreeding and
premature convergence effectively. Therefore, although the cross probability of this
paper is lower than other genetic algorithms, the overall effect is better. And the
lower cross-rate will reduce the computational complexity and accelerate the speed
of evolution.
Dynamic Crossover and Mutation Genetic Algorithm 143

Fig. 1. Crossover rate curve

3 Dynamic Mutation Probability and Expansion Sampling


In order to avoid better individuals rapidly occupied the entire population, this paper
proposed a dynamic mutation probability using the ratio between a chromosome’s
fitness and the largest fitness in population. The mutation probability is defined
as:

f (a ) 2
pm = k *( ) 0 < k <1 (2)
max( f )
Where f(a) is the fitness of an individual a, max(f) is the largest fitness, Pm is the
mutation probability, k is a parameter, valid in interval (0,1). This mutation rate
allows the better individuals have the bigger probability in order to avoid better indi-
viduals rapidly occupying the entire population, lead the population evolving to the
direction of diversification. The distribution of mutation rate is shown in Figure 2
below (x-axis is f(a) values, y-axis is mutation rate):
At the same time, this paper uses the expansive optimal sampling, namely puts the
new individuals with the previous generation together and chooses the best half indi-
viduals into the next-generation. Traditional genetic algorithm puts the new individu-
als directly into the next generation, which will make the older better individuals
which have been crossed or mutated no chance accessing to future generations, thus
slows down the convergence rate of the population. And the expansive optimal sam-
pling makes those better individuals enter the next generation, thus rapidly accelerates
convergence speed.
The traditional genetic algorithm uses fixed mutation rate in computation. As a re-
sult the algorithm is hard to jump out of local optimum because of the relatively small
mutation probability when the algorithm gets in local optimum. And sometimes even
jumped out, there would be useless because they are not selected at the next choice
operation. To avoid this status, this paper combines the dynamic mutation rate with
144 M. Dong and Y. Wu

Fig. 2. Mutation rate curve

the expansive optimal sampling. Dynamic mutation rate makes algorithm more easily
jumping out of local optimum when the algorithm get in a local optimum, and the
expansive optimal sampling ensures this mutated and excellent individuals can be
selected into future generations, thereby greatly reducing the probability of getting in
local optimum. If the evolution enough, in theory, the genetic algorithm would not get
in a local optimum assuredly.

4 Algorithm Processes
The procedures of the proposed method are as follows:
1) Initialize the control parameters: set the population size N, the evolution genera-
tion g;
2) g = 0, randomly generate a population of N individuals Po = (x1, x2, ...);
3) Determine the fitness function, calculate the value of individual fitness f(xi), (i =
1,2, ..., N), to evaluate the individuals in population;
4) Determine the choice strategy, the algorithm population in this paper mainly
uses the method of roulette, select the advantage individuals from the individuals Pg
of generation g to form the mating pool;
5) Determine the dynamic crossover operator. Determine cross-rate of each pair of
individuals through the P c = | f ( a ) − f ( b ) | /(m ax( f ) − m in( f )) , and then
proceed cross-operation to produce the middle results;
6) Determine the dynamic mutation operator. Use the P m = k * ( f ( a ) / m ax( f ))
2

to determine mutation rate, and then proceed mutation operation;


7) The expansive optimal sampling. Put the individuals after choice, cross and mu-
tation operation with the individuals Pg before operation together and select the best
half individuals into the next-generation;
8) Stop the evolution if the results meet the termination criteria, else jump to (3).
Dynamic Crossover and Mutation Genetic Algorithm 145

5 Simulation Experiments

5.1 Experiment 1

In this paper, the first experiment uses typical complex optimization function as
follows:
y = x *sin(10* pi * x) + 2 x ∈ [ −1, 2] (3)

Fitness function is as follows:


Fitness = x *sin(10 * pi * x) + 2 (4)

We use improved mutation operator genetic algorithm (IMGA) in paper [7] and fuzzy
adaptive genetic algorithm (FAGA) in paper [2] to compare with this paper’s dynamic
crossover and mutation genetic algorithm based on expansion sampling (ESDGA), simu-
lating in Matlab7. We change the parameters of genetic algorithm in experiment in order
to observe the performance of several algorithms better. In experiment N is the popula-
tion size, Pc is the crossover probability. In improved mutation operator genetic algo-
rithm the gene discerption proportion L=0.6, the various parameters: a1 = 0.01, b1 =
0.01, a2 = O.01, b2 = 0.05, α = 0.03, g0 = 40. The evolution generation is 500, each kind
of parameter repeat 50 times. Min and Max are the minimum and maximum optimal
solution in 50 times, Avg is the average, Gen is the convergence generations, local is the
number of local optimum solutions. The experiment’s results are shown in Table 1, Table
2 and Table 3 (Pc are useless in FAGA and ESDGA, k is used only in ESDGA):

Table 1. 50 independent results for IMGA in Experiment 1

N=100 N=120 N=80 N=50


Pc=0.75 Pc=0.9 Pc=0.8 Pc=0.5
Min 3.4126 3.7332 3.7871 3.2674
Max 3.8502 3.8500 3.8503 3.8503
Avg 3.8185 3.8425 3.8037 3.8470
gen 145 135 118 142
local 5 7 6 5

Table 2. 50 independent results for FAGA in Experiment 1

N=100 N=120 N=80 N=50


Min 3.8126 3.8332 3.8071 3.7674
Max 3.8503 3.8503 3.8502 3.8500
Avg 3.8385 3.8425 3.8337 3.8070
gen 22 23 18 14
local 3 6 4 6
146 M. Dong and Y. Wu

Table 3. 50 independent results for ESDGA in Experiment 1

N=100 N=120 N=80 N=50


k=0.6 k=0.8 k=0.4 k=0.3
Min 3.8503 3.8503 3.8503 3.8503
Max 3.8503 3.8503 3.8503 3.8503
Avg 3.8503 3.8503 3.8503 3.8503
gen 9 10 12 12
local 0 0 0 0

From the tables it is clear that IMGA needs about 130 iterations to converge, with
an average of about 12% local optimum. FAGA needs an average of about 20 itera-
tions to converge, 10% local optimum. ESDGA needs only about 12 iterations to
converge, and there is no local optimum in total 200 results, better than the two algo-
rithms. Also need to point out that for this paper’s algorithm, we have a lot of experi-
ments to select the k value, the table only lists representative values, as proved, the
effect of the algorithm are better when k above 0.3.
The speed and accuracy is a contradiction in genetic algorithms. If the algorithm
requires high speed, only evolving a few generations, the accuracy of the results will
be not satisfactory, and most likely have not convergence or convergence in local
optimum. If the algorithm continues the evolution infinitely, it must converge to the
global optimum finally, if there is a global optimum. In this paper, we use the latter
situation to compare the performance of the proposed algorithm affected by different
k values in Experiment 1. This is to say, Algorithm will evolve infinitely until the
population converges to the global optimum. We set the population size N=100 and
choose different k in the valid interval (0, 1). The result is shown in Figure 3 below
(x-axis is k values, y-axis is convergence generations):

Fig. 3. The influence of different k in experiment 1


Dynamic Crossover and Mutation Genetic Algorithm 147

From the table we can clearly see that, the bigger k the better performance of this
algorithm. 0.3 is a dividing line, and there is not large difference above 0.6. This re-
sult subjects to the theory discussed earlier.

5.2 Experiment 2

Then we select a typical needle in haystack problem[5] (Needle-in-haystack, NiH for


short) for experiment.
a
max f ( x, y ) = ( )2 + ( x 2 + y 2 ) 2 x, y = [ −5.12, 5.12] (5)
b + ( x2 + y 2 )

In the function: a = 3.0, b = 0.05; maximum value of the function is 3600 when x = 0,
y = 0. And there are four local maximum points (-5.12,5.12), (-5.12, -5.12) ,
(5.12,5.12), (5.12, -5.12), function value is 2748.78.
This is a typical GA deceptive problem. Here we still use improved mutation op-
erator genetic algorithm (IMGA) in paper [7] and fuzzy adaptive genetic algorithm
(FAGA) in the paper [2] to compare with this paper’s dynamic crossover and muta-
tion genetic algorithm based on expansion sampling (ESDGA) in Matlab7. The evolu-
tion generation is also 500, each parameter repeat 50 times. Min and Max are the
minimum and maximum optimal solution in 50 times, Avg is the average, Gen is the
convergence generations, local is the number of local optimum solutions. The ex-
periment results are shown in Table 4, Table 5, Table 6 (Pc are useless in FAGA and
ESDGA, k is used only in ESDGA):

Table 4. 50 independent results for IMGA in Experiment 2

N=100 N=120 N=80 N=50


Pc=0.75 Pc=0.9 Pc=0.8 Pc=0.5
Min 3600 3600 2748.8 2748.8
Max 3600 3600 3600 3600
Avg 3583 3600 3583 3511
Gen 278 293 264 260
Local 1 0 1 2

Table 5. 50 independent results for FAGA in Experiment 2

N=100 N=120 N=80 N=50


Min 3600 3600 3600 2748.8
Max 3600 3600 3600 3600
Avg 3600 3600 3600 3583
Gen 46 57 48 40
Local 0 0 0 1
148 M. Dong and Y. Wu

Table 6. 50 independent results for ESDGA in Experiment 2

N=100 N=120 N=80 N=50


k=0.6 k=0.8 k=0.4 k=0.3
Min 3600 3600 3600 3600
Max 3600 3600 3600 3600
Avg 3600 3600 3600 3600
Gen 21 21 22 23
Local 0 0 0 0

As can be seen from the tables, IMGA and FAGA appear a few local optimums, in
which FAGA need about 50 iterations to converge, IMGA needs about 270 iterations
to converge. However ESDGA did not lead to local optimum, and the convergence
rate is also very fast, basically as long as 20 on average, better than IMGA and
FAGA.
Then we compare the performance of the proposed algorithm affected by different
k values in Experiment 2 as same as Experiment 1. We also set the population size
N=100 and choose different k in the valid interval (0, 1). The result is shown in
Figure 4 below (x-axis is k values, y-axis is convergence generations):

Fig. 4. The influence of different k in experiment 2

The conclusion is similar with Experiment 1 though there are a few differences, the
bigger k the better performance of this algorithm.

6 Conclusion
Simple genetic algorithm has defects of slow convergence speed and getting in local
optimum easily, making the application of genetic algorithm being limited. To solve
the two largest defects, this paper proposed an improved method. As we can see from
the experiment results, this paper’s dynamic crossover and mutation genetic algorithm
Dynamic Crossover and Mutation Genetic Algorithm 149

based on expansion sampling improve the performance of genetic algorithm, over-


come the problems of inbreeding and premature convergence, accelerate the speed of
the algorithm convergence, effectively avoid the algorithm falling in local optimum,
make the algorithm has a strong optimization capacity.

References
1. Kalyanmoy, D., Karthik, S., Tatsuya, O.: Self-adaptive simulated binary crossover for real-
parameter optimization. In: Genetic and Evolutionary Computation Conference, pp. 1187–
1194 (2007)
2. Huang, Y.P., Chang, Y.T., Sandnes, F.E.: Using Fuzzy Adaptive Genetic Algorithm for
Function Optimization. In: Annual meeting of the North American Fuzzy Information Proc-
essing Society, June 3-6, pp. 484–489. IEEE, Los Alamitos (2006)
3. Zhong, W.C., Liu, J., Xue, M.Z., Jiao, L.C.: A Multi-agent Genetic Algorithm for Global
Numerical Optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part
B 34(2), 1128–1141 (2004)
4. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multi-objective genetic
algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197
(2002)
5. Xiang, Z.Y., Liu, Z.C.: Genetic algorithm based on fully adaptive strategy. Journal of Cen-
tral South Forestry University 27(5), 136–139 (2007)
6. Li, Y.Y., Jiao, L.C.: Quantum clone genetic algorithm. Computer Science 34(11), 147–149
(2007)
7. Li, L.M., Wen, G.R., Wang, S.C., Liu, H.M.: Independent component analysis algorithm
based on improved genetic algorithm. Journal of System Simulation 20(21), 5911–5916
(2008)
Multidisciplinary Optimization of Airborne Radome
Using Genetic Algorithm

Xinggang Tang, Weihong Zhang, and Jihong Zhu

Laboratory of Engineering Simulation and Aerospace Computing,


The Key Laboratory of Contemporary Design and Integrated Manufacturing Technology,
Ministry of Education, Northwestern Polytechnical University, Xi’an 710072, China
[email protected], {zhangwh,JH.Zhu_FEA}@nwpu.edu.cn

Abstract. A multidisciplinary optimization scheme of airborne radome is pro-


posed. The optimization procedure takes into account the structural and the
electromagnetic responses simultaneously. The structural analysis is performed
with the finite element method using Patran/Nastran, while the electromagnetic
analysis is carried out using the Plane Wave Spectrum and Surface Integration
technique. The genetic algorithm is employed for the multidisciplinary optimi-
zation process. The thicknesses of multilayer radome wall are optimized to
maximize the overall transmission coefficient of the antenna-radome system
under the constraint of the structural failure criteria. The proposed scheme and
the optimization approach are successfully assessed with an illustrative numeri-
cal example.

Keywords: multidisciplinary optimization, radome, genetic algorithm,


structural failure, transmission coefficient.

1 Introduction

Airborne radomes are often used to protect antennas from a variety of environmental
and aerodynamic effects. The design of a high performance airborne radome is a
challenging task as the aerodynamic, electromagnetism, structural mechanics, etc
requirements are generally involved. The performances of the radome in different
disciplines are usually in conflict with each other. A thin and light-weighted radome
with excellent electromagnetic transmission ability is apparently structurally unreli-
able, while a well-designed radome structure with high stiffness and stability will
affirmatively have a poor electromagnetic performance. The radome design is hence a
multidisciplinary procedure because the analyses of different disciplines are tightly
coupled. However, the traditional engineering design approach separates the design
procedure into sequential stages. A design failure at certain stage would cause the
whole design to start from scratch. This will result in a tremendous cost of design
cycle and resources.
The great improvement of the optimization theory and algorithms in the past several
decades has provided an efficient solution for radome design. The optimization procedure

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 150–158, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Multidisciplinary Optimization of Airborne Radome Using Genetic Algorithm 151

can simultaneously take into account the radome characteristics in different disciplines and
searches for an optimal design with all design requirements to be well-balanced.
In the earlier researches, the techniques of simulating the electromagnetic charac-
teristics of radomes were well developed, which can be classified into two kinds: (1)
high-frequency methods, such as the Ray Tracing (RT) [1] technique based on Geo-
metric Optics (GO), Aperture Integration-Surface Integration (AI-SI) [2] and Plane
Wave Spectrum-Surface Integration (PWS-SI) [3] based on Physical Optics (PO); (2)
low-frequency methods, such as the Finite Element Method (FEM) [4] and Method of
Moment (MoM) [5]. Generally, the high-frequency methods are more suitable for
electrically large problems because of high computational efficiency, while the low-
frequency methods can provide more accurate analysis with much higher computa-
tional complexity. Since the iteration strategy with large computing cost is used in the
optimization methods, the high-frequency methods are superior to low-frequency
methods for electromagnetic analysis of the radome optimization.
Even though the researches on radome optimization were pioneered early by using a
simulated annealing technique [6], it is until recently that more advanced optimization
algorithms have been used and promoted the development of the radome design. For
example, the particle swarm optimization and the Genetic Algorithm are applied for the
radome optimization. The layered thickness of the sandwich radome wall and the shape
of the radome are optimized to maximize the overall transmission coefficient for the
entire bandwidth [7] or to minimize the boresight error [8]. A multi-objective optimiza-
tion procedure was further proposed to optimize the boresight error and power transmit-
tance simultaneously using the genetic algorithm and RT method [9]. However, these
researches on radome optimization are limited to the electromagnetic characteristics.
Recently, the Multidisciplinary Radome Optimization System (MROS) [10] is proposed
as a synthesis procedure, in which the material selection, structural analysis, probabilis-
tic fracture analysis and electromagnetic analysis are incorporated to perform the mul-
tidisciplinary optimization. This work indicates a new trend of radome optimization
which will be more applicable for practical engineering designs.
In this paper, the structural and electromagnetic characteristics are considered si-
multaneously to perform the radome design. A multidisciplinary optimization proce-
dure is developed based on the finite element model of the radome. The structural
analysis and electromagnetic analysis are carried out to obtain the structural failure
indexes and the overall transmission coefficient, respectively. The genetic algorithm
is employed for the optimization.

2 Multidisciplinary Simulation of Radome


2.1 Simulation Strategy

Normally, the Finite Element Method (FEA) and the Physical Optics (PO) method are
the preferred approaches used for the structural analysis and electromagnetic analysis
of radome, respectively. However, in the traditional design scheme illustrated in
Fig.1, these analyses, as well as the procedure of modeling and postprocessing are
implemented separately. There is no collaboration or communion between different
analysis procedures, which can no longer match the requirements of rapid design
cycle for the modern products.
152 X. Tang, W. Zhang, and J. Zhu

Finite element Electromagnetic


model model

Finite element Physical optics


analysis analysis

Analysis result Analysis result


postprocessing postprocessing
Structural Electromagnetic
analysis analysis

Fig. 1. Traditional radome analysis scheme

A significant improvement for the radome design is to integrate the interested


analyses in a multidisciplinary manner. A potential scheme of the multidisciplinary
analysis is illustrated in Fig.2. The most important step is to identify what data is to be
shared by different sub-analyses. It is known all the geometry and topology informa-
tion of the radome that has been defined in the finite element model can be shared
with the PO analysis. Furthermore, the finite element mesh can also be used as the PO
analysis mesh on condition that the element size is adapted to the wavelength accord-
ing to the Nyquist Sampling Theorem. Therefore, in the proposed analysis scheme
shown in Fig.2, the finite element model is defined as the mutual data for both struc-
tural and electromagnetic analysis.

Finite element
model

Electromagnetic
Finite element model
analysis
Physical optics
analysis

Analysis result
postprocessing

Fig. 2. Multidisciplinary radome analysis scheme

Because of requirements of the low dielectric constant and high mechanical strength,
the airborne radome is usually manufactured of sandwich laminate. The choice of mate-
rials and wall thickness has a significant influence on the structural strength and electri-
cal performance. The idea in this paper is to find the proper configuration of these
parameters to satisfy the design requirements. The validity of the design can be verified
with the structural analysis and electromagnetic analysis of the radome respectively.
Multidisciplinary Optimization of Airborne Radome Using Genetic Algorithm 153

2.2 Structural Analysis of Radome

The basic issue in radome structural analysis is the material definition. This is very
difficult because multilayer thin walled configurations including the A sandwich, B
sandwich, and C sandwich are often used as illustrated in Fig.3. Taking A sandwich
for example, it consists of three layers: two dense high-strength skins separated by a
lower-density, lower-dielectric core material such as foam or honeycomb. This con-
figuration can provide much higher strength for a given weight than a monolithic wall
of homogeneous dielectric material. The construction of the rest two kinds of wall
configuration can be concluded accordingly. In our demonstrative application, we use
A sandwich as the wall configuration which is most frequently used.

a) A sandwich b) B sandwich c) C sandwich

Fig. 3. Multilayer radome wall configuration

Another important aspect of the radome structural analysis is the structural loads
and constraints. For common airborne radome (non high-speed radomes), the aerody-
namic load due to airflow is the main cause of mechanical stress. The radome is
tightly attached to the airframe with special bolts.
The radome structural analysis can performed in Patran/Nastran. The A sandwich
configuration is modeled in Patran Laminate Modeler by treating the sandwich layer
as an orthotropic layer with the equivalent material properties. The structural validity
of the radome can be assessed by the failure analysis of composite materials. Fre-
quently used failure criterions include the Hill, Hoffman, Tsai-Wu, Maximum Stress
and Maximum Strain.

2.3 Electromagnetic Analysis of Radome

Based on the equivalence principle, the approach of PWS-SI method can be divided
into three steps: 1) a transmitting antenna is considered and the fields transmitted to
the radome interior S1 are computed by the Plane Wave Spectrum method; 2) Trans-
mission through the radome wall is then calculated by the transmission line analogy to
obtain the fields over the radome outer surface S 2 ; 3) Radiation to the far field is then
determined by the surface integration of the equivalent currents obtained from the
tangential field components over the outer surface S 2 .
The antenna-radome system model is illustrated in Fig.4. According to the planar
slab approximation technique, the radome wall is treated as being locally planar and
modeled as an infinite planar slab with complex transmission coefficients, lying in the
tangent plane at the incident points of the radiation rays.
154 X. Tang, W. Zhang, and J. Zhu

S2
S1
J
n
M Et , H t
S2
S1
E i , H iθ
Er ,H r

Fig. 4. The antenna-radome system model

According to PWS technique, the transmitted electric field at the near-filed point
P can be calculated by
1
( ) A( k x , k y ) (1)
2
where ( x, y , z ) are the coordinates of P in the antenna coordinate system,
k0 = ik x + jk y + kk z is the radial wave number and r = ix + jy + kz is the vector of point
P. A( k x , k y ) is the so-called plane wave spectrum or the Fourier transform of the
antenna aperture, which can be formulated as
1
∫∫ E ( x, y,0)e
j ( kx x + k y y )
A( k x , k y ) = dξ dη (2)
2π S

If the antenna aperture is circularly symmetric, the surface integral (2) will regress to
a linear integral. Thus the computational complexity will be greatly reduced.
The transmission coefficients for perpendicular polarization T⊥ (θ P ) and parallel
polarization T// (θ P ) of the multilayered radome wall can be determined by the trans-
mission line analogy [11]. Thus the fields over the radome outer surface can be
formulated by
t i i
M BM BM T ( P ) BM t BM T// ( M )
t i i
(3)
M BM BM T// ( P ) BM t BM T ( M )

where
Re( E × H ∗ )
nBP = PM × nRM , t BP = nBP × nRM , PM ( x , y , z ) =
Re( E × H ∗ )

Thus, the electric far field can be calculated with a surface integral technique by

1
t
M 1 ( t
M ) e jkr R1
dS2 (4)
S2
Multidisciplinary Optimization of Airborne Radome Using Genetic Algorithm 155

where R1 is the unit vector of the observation direction, r ′ is the unit vector of the
incidence point on the outer surface S 2 and n is the normal vector of the outer sur-
face S 2 on the incidence point.
With the far field distribution of the antenna-radome system, the electromagnetic
performance of the radome, such as the transmission ratio, boresight error, etc.

3 Optimization Using Genetic Algorithm


GA simulates the natural biological evolution strategy based on Darwinian survival of
the best fitness principal. The theoretical foundations of GA were firstly led by Hol-
land in 1975 [12]. Since then, GA and its many variations have been applied success-
fully to a wide range of different problems [13]. They provide an alternative way of
traditional gradient-based optimization techniques for those mathematically intracta-
ble optimization problems such as combinational optimization, discrete-variable op-
timization, and so on. Generally, GA is accomplished by firstly generating random
solutions for a problem. Each randomly generated solution is then evaluated by its
fitness and the good solutions are selected and recombined to form new generation
solutions. The process of optimization based on fitness and recombination to generate
variability is repeated until the best solution or the stop criterion is identified.
The optimization of radome considered in this paper is a multidisciplinary optimi-
zation which considers the performances of radomes in both structural mechanics and
electromagnetics. The complex coupling between the two disciplines makes the opti-
mization modeling and sensitivity analysis intractable. However, there is no such
barrier for applying GA to the radome optimization.
Practically, the shape of the airborne radome often conforms to the unitary aerody-
namic performance of the aircraft, which should not be changed in the subsequent
design. In our applications, we consider the multilayer sandwich as the radome wall.
The radome optimization takes the thicknesses of each layer as design variables. The
failure indexes of each layer and the overall transmission coefficient of the antenna-
radome system are calculated by the structural analysis and electromagnetic analysis
respectively. The optimization objectives are to maximize the overall transmission
coefficient and minimize the failure indexes. Since the intractability of handling
multi-objective optimization, we consider one objective as the constraint. Thus, the
multidisciplinary radome optimization can be formulated as
Maximize Tran(t1 , t2 ,L , tn )

Subject to max( f i , j ) ≤ f max (5)

ti ,min ≤ ti ≤ ti ,max

where ti denotes the thickness of the ith layer, Tran denotes the overall transmission
and f i , j is the failure index of the ith layer of the jth element.
156 X. Tang, W. Zhang, and J. Zhu

4 Numerical Results

In this section, the proposed optimization approach is demonstrated with a paraboloidal


radome. For the structural analysis, the finite element model is illustrated in Fig.5.
The A sandwich radome wall is modeled as a 7 layered laminate. Each skin is con-
structed with 3 layers, while the core is constructed with 1 layer. We suppose the
bottom of the radome is fixed to the airframe tightly, and the whole radome is endur-
ing a frontal pressure caused by the relative movement of air or wind. The structural
failure of the radome material is verified with Hill Failure Criterion of composite.
For the electromagnetic analysis, we suppose a circular aperture antenna is placed
on the bottom of radome which works at 2GHz, and the finite element model contains
2721 quadrangle elements with the maximum element area of 1.04×10-3 m2. Thus,
the finite element mesh satisfies the electromagnetic analysis requirements according
to the Nyquist Sampling Theorem, which indicates that there are at least 4 elements
within an unit area of square of the wavelength.

Fig. 5. Finite element model of a paraboloidal radome


Transmission Coefficient

Iterations

Fig. 6. The iteration history of objective function


Multidisciplinary Optimization of Airborne Radome Using Genetic Algorithm 157

Table 1. Optimal layered thickness of the radome wall (mm)

Design Variables t1 t2 t3 t4 t5 t6 t7
Lower limit 0.1 0.1 0.1 1 0.1 0.1 0.1
Upper limit 0.5 0.5 0.5 10 0.5 0.5 0.5
Initial value 0.2 0.2 0.2 8.8 0.2 0.2 0.2
Optimal value 0.1 0.1 0.1 9.96 0.1 0.1 0.1

With Radome
Without radome
E field in dB

Radiation angle in degree

Fig. 7. Radiation patterns of the electrical far field

Table 2. Structural performance of the initial and final design

Structural Maximal Maximal Maximal


performance Plystress Displacement Failure index
Initial design 24.2 0.337 0.452
Final design 48.3 0.671 0.403

We consider the radome optimization problem (5) to maximize the transmission


coefficient and constrain the failure indexes less than 0.6. The GA is performed with
50 generations and 70 individuals for each generation. The iteration history of the
objective function is illustrated by Fig.6.
The optimal design of the radome is achieved at generation 48 with the optimal
layer thicknesses listed in Table.1. From the radiation patterns of the electrical far
field in Fig.7 and the structural performances comparison in Table.2, we can see that
the final design achieve a better electromagnetic performance at the price of the de-
pression of the structural performance.

5 Conclusion
This paper proposes a multidisciplinary optimization scheme for airborne radome
design. The design procedure considers the structural and electromagnetic performance
of the radome simultaneously. Genetic Algorithm is employed for the optimization to
158 X. Tang, W. Zhang, and J. Zhu

maximize the overall transmission coefficient under constraints of the structural failure
of the radome material. The optimization scheme is successfully validated by the de-
sign optimization of a paraboloidal radome.
This work provides the new trend of radome optimization which will be more effi-
cient and applicable for the radome design engineering. Even though the results of the
demonstration are primitive, the proposed optimization scheme and the solution ap-
proach can be easily extended to more complicated applications.

References
1. Tricoles, G.: Radiation Patterns and Boresight Error of a Microwave Antenna Enclosed in
an Axially Symmetric Dielectric Shell. J. Opt. Soc. Am. 54, 1094–1101 (1964)
2. Paris, D.T.: Computer-Aided Radome Analysis. IEEE. Trans. on AP 18, 7–15 (1970)
3. Wu, D.C.F., Rudduck, R.C.: Wave Spectrum Surface Integration Technique for Radome
Analysis. IEEE Trans. on AP 22, 497–500 (1974)
4. Povinelli, M.J., Angelo, J.D.: Finite Element Analysis of Large Wavelength Antenna Ra-
dome Problems for Leading Edge and Radar Phased Arrays. IEEE Trans. on Magnetics 27,
4299–4302 (1991)
5. Chang, D.C.: A Comparison of Computed and Measured Transmission Data for the
AGM-88 Harm Radome. Master’s thesis, Naval Postgraduate School, Monterey, California,
AD-A274868 (1993)
6. Chan, K.K., Chang, P.R., Hsu, F.: Radome design by simulated annealing technique. In:
IEEE International Symposium of Antennas and Propagation Society, pp. 1401–1404.
IEEE Press, New York (1992)
7. Chiba, H., Inasawa, Y., Miyashita, H., Konishi, Y.: Optimal radome design with particle
swarm optimization. In: IEEE International Symposium of Antennas and Propagation So-
ciety, pp. 1–4. IEEE Press, New York (2008)
8. Meng, H., Dou, W., Yin, K.: Optimization of Radome Boresight Error Using Genetic Al-
gorithm. In: 2008 China-Japan Joint Microwave Conference, pp. 27–30. IEEE Press, New
York (2008)
9. Meng, H., Dou, W.: Multi-objective optimization of radome performance with the struc-
ture of local uniform thickness. IEICE Electronics Express 5, 882–887 (2008)
10. Baker, M.L., Roughen, K.M.: Structural Optimization with Probabilistic Fracture Con-
straints in the Multidisciplinary Radome Optimization System (MROS). In: 48th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Confer-
ence, Honolulu, Hawaii (2007); AIAA-2007-2311
11. Ishimaru, A.: Electromagnetic wave propagation, radiation, and scattering. Prentice Hall,
Englewood Cliffs (1991)
12. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press,
Michigan (1975)
13. Riolo, R., Soule, T., Worzel, B.: Genetic programming theory and practice IV. Springer,
New York (2007)
Global Similarity and Local Variance in Human Gene
Coexpression Networks

Ivan Krivosheev1, Lei Du2, Hongzhi Wang1, Shaojun Zhang1,2,


Yadong Wang1,∗, and Xia Li1,2,∗
1
School of Computer Science and Technology, Harbin Institute of Technology Harbin,
150001, China
2
College of Bioinformatics Science and Technology, Harbin Medical University Harbin,
150081, China

Abstract. For the study presented here, we performed a comparative analysis of


whole-genome gene expression variation in 210 unrelated HapMap individuals
to assess the extent of expression divergence between 4 human populations and
to explore the connection between the variation of gene expression and func-
tion. We used the GEO series GSE6536 to compare changes in expression of
47,294 human transcripts between four human populations. Gene expression
patterns were resolved into gene coexpression networks and the topological
properties of these networks were compared. The interrogation of coexpression
networks allows for the use of a well-developed set of analytical and conceptual
tools and provides an opportunity for the simultaneous comparison of variation
at different levels of systemic organization, i.e., global vs. local network proper-
ties. The results of this comparison indicate that human co-expression networks
are indistinguishable in terms of their global properties but show divergence at
the local level.

Keywords: gene coexpression network interrogation.

1 Introduction
In the last few years, gene coexpression networks attracts attention of many re-
searches[1,2]. According to previous studies, these networks are considered as a
graph where each node represents a gene, and edges represent statistically high rela-
tionship between genes. The interaction between two genes in a gene network does
not necessarily imply a physical interaction[4]. For the study presented here, we per-
formed a comparative analysis of whole-genome gene expression variation in 210
unrelated HapMap individuals[7] to assess the extent of expression divergence be-
tween 4 human populations and to explore the connection between the variation of
gene expression and function. Gene coexpression network could be constructed by
using the GeneChip expression profiles data in NCBI GEO (Gene Expression Omni-
bus repository, database of gene expression data). [3]


Corresponding authors.

H. Deng et al. (Eds.): AICI 2009, LNAI 5855, pp. 159–166, 2009.
© Springer-Verlag Berlin Heidelberg 2009
160 I. Krivosheev et al.

2 Materials and Methods

2.1 Data Sets

Expression profiles of human gene pairs were compared in order to evaluate the di-
vergence of human gene expression patterns. A total of 47,294 human transcripts for
every population were considered. All-against-all gene expression profile compari-
sons for human populations’ matrices (47294*60 CEU, 47294*45 CHB, 47294*45
JPT, and 47294*60 YRI) were used to generate population-specific coexpression
networks. For coexpression networks, nodes correspond to genes, and edges link two
genes from the same population if their expression profiles are considered sufficiently
similar.

2.2 Network Construction

There are a number of existing reverse-engineering methods to construct coexpression


network such as Relevance Networks, Bayesian networks etc. Results reported here
are for networks constructed using Algorithm for the Reconstruction of Accurate
Cellular Networks (ARACNE)[8], a novel information-theoretic algorithm for the
reverse-engineering of transcriptional networks from microarray data. ARACNE
infers interactions based on mutual information between genes, an information-
theoretic measure of pairwise correlation. ARACNE compares favorably with exist-
ing methods and scales successfully to large network sizes.
ARACNE relies on a two-step process. First, candidate interactions are identified
by estimating pairwise gene-gene mutual information (MI) and by filtering them us-
ing an appropriate threshold, I0, computed for a specific p-value, p0, in the null-
hypothesis of two independent genes. This step is almost equivalent to the Relevance
Networks method. Thus, in its second step, ARACNE removes the vast majority of
indirect candidate interactions using a well-known property of mutual information –
the data processing inequality (DPI) The DPI states that if genes g1 and g3 interact
only through a third gene, g2, (i.e., if the interaction network is g1<->…<->g2<-
>…<->g3 and no alternative path exists between g1 and g3), then the following holds
I (g1,g3) ≤min[I (g1,g2) ;I (g2,g3)] .
For presented study we set mutual information threshold value at 0.001 and DPI
value at 0.1. If the P value from this test is smaller than a predefined threshold, the
edge will be preserved in the consensus network. When those threshold and DPI val-
ues are used, coexpression networks tend to congeal into graphs that are so densely
connected as to preclude meaningful analysis of their topological properties.

3 Results and Discussion

3.1 Human Population Networks

Human population coexpression networks were evaluated with respect to a number


of parameters describing their global topological properties and found to be highly
similar (Table 1). For each network were computed a variety of parameters
Global Similarity and Local Variance in Human Gene Coexpression Networks 161

provided by Cytoscape (open source bioinformatics software platform) plugin,


Network Analyzer[9]. The numbers of nodes and edges in each network are compa-
rable, with the CEU population network showing higher values for both. The aver-
age degree (<k>) is the average number of edges per node and gives rough
approximation of how dense the network is. The CEU population network shows a
slightly higher <k> which is consistent with the greater numbers of nodes and
edges. However, <k> is again similar for all networks and rather high. By way of
comparison, typical world-wide web networks have <k>≈7. The values of <k>
might not be particularly relevant because, as will be shown below, the degree dis-
tributions are highly skewed.
A more refined notion of network density is given by the average clustering coeffi-
cient (<C>). The clustering coefficient C of a node i is defined as the fraction of the
pairs of neighbors of node i that are linked to each other: Ci = 2ni/ ki(ki-1), where ni is
the number of observed links connecting the ki neighbors of node i and ki(ki-1)/2 is
the total number of possible links. The average clustering coefficient (<C>) is the
mean of this value for all nodes with at least two neighbors, and for all human popula-
tion networks <C>≈0.3 (Table 1). For networks of this size, these <C> values are
considered to be quite high. The high density of the coexpression networks is not
necessarily surprising because, as one could reasonably expect, co-expression is,
largely (but not entirely), transitive. In other words, if gene A is coexpressed with
genes B and C, then genes B and C are likely to be coexpressed as well.

Table 1. Network parameters

Population
Parameter
CEU CHB JPT YRI
Clustering coeffi-
0.350 0.272 0.304 0.320
cient
Network diameter 16 19 24 26
Network centrali-
0.047 0.032 0.066 0.042
zation
Average degree 4.576 10.721 16.46 13.781

Number of nodes 5546 3180 3572 3061

Number of edges 72073 17047 29398 21092

Network density 0.005 0.003 0.005 0.005


Network hetero-
1.539 1.601 1.778 1.622
geneity
Characteristic
25.991 6.297 6.733 7.246
path length
162 I. Krivosheev et al.

3.2 Intersection Network

As described above, the human population gene coexpression networks are closely
similar in terms of their global topological characteristics; they share similar node
degree (k) distributions and C(k) distributions as well as similar average node degrees
(<k>), clustering coefficients (<C>) and path lengths (<l>). Other parameters related
to neighborhood, such as network density, network centralization and network het-
erogeneity are closely similar.
We further sought to evaluate the similarity between the population-specific coex-
pression networks at a local level. There is as yet no general method for assessing
local network similarity (or graph isomorphism). However, in the case of the human
population gene coexpression networks generated here, the use of orthologous gene
pairs results in a one-to-one mapping between the nodes of the two networks. In this
sense, the networks can be considered to be defined over the same set of nodes N, and
thus can be directly compared by generating an intersection network. The human
population intersection network is defined as the network over the set of nodes N
where there is a link between two nodes i and j if i and j denote two pairs of ortholo-
gous genes which are connected in every human population network. Thus, the inter-
section network captures the coexpressed gene pairs conserved between 4 human
populations.
The global characteristics of the intersection network are shown in Table 2. The in-
tersection network node degree and C(k) distributions are clearly similar to those of
the population-specific networks as are the average clustering coefficient (<C> =
0.213) and average path length (<l> = 3.04). Network diameter equals 10. The net-
work diameter and the average shortest path length, also known as the characteristic
path length, indicate small-world properties of the analyzed network. Taken together,
these findings indicate that the global structure of the population-specific coexpres-
sion networks is preserved in the intersection network. However, the most striking
feature of the intersection network is the small fraction of genes (~20%) and edges
(~4–16%) that are conserved between populations networks (Table 3). Accordingly,
the average node degree is lower (<k> = 7.518) in the intersection network than it is
in each of the population-specific networks.

Table 2. Number of nodes and edges in intersection network

Nodes Edges
Intersection
713 2680
network
CEU 5546 (13%) 72073 (4%)
CHB 3180 (22%) 17047 (16%)
JPT 3572 (20%) 29398 (9%)
YRI 3061 (23%) 21092 (13%)
Global Similarity and Local Variance in Human Gene Coexpression Networks 163

Table 3. Intersection network parameters

Intersection Random
Parameter
network network
Clustering coefficient 0.213 0.001
Network diameter 10 10
Network centralization 0.061 -
Average degree 7.518 0.0
Number of nodes 713 713
Number of edges 2680 2680
Network density 0.011 -
Network heterogeneity 1.592 -
Characteristic path
3.040 0.005
length

Random network generated according Erdos-Renyi model, it has same number of


nodes and edges with intersection network.

3.3 Functional Coherence of Gene Coexpression Networks

Genes in the networks were functionally categorized using their Gene Ontology (GO)
biological process annotation terms. Overrepresented GO terms were identified with
BINGO[14] by comparing the relative frequencies of GO terms in specific clusters with
the frequencies of randomly selected GO-terms. The Hypergeometric test was used to

Fig. 1. Overrepresented GO terms


164 I. Krivosheev et al.

do this with the Benjamini and Hochberg false discovery rate correction for multiple
tests and a P-value threshold of 0.001. Pairwise similarities between gene GO terms
were measured using the semantic similarity method, which computes the relative dis-
tance between any two terms along the GO-graph . Result is shown in Table 4.
The graph (Figure 1) visualizes the GO categories that were found significantly
over-represented in the context of the GO hierarchy. The size (area) of the nodes is
proportional to the number of genes in the test set which are annotated to that node.
The color of the node represents the (corrected) p-value. White nodes are not sig-
nificantly over-represented, the other ones are, with a color scale ranging from yel-
low (p-value = significance level, e.g. 0.001) to dark orange (p-value = 5 orders of
magnitude smaller than significance level, e.g. 10-5 * 0.001). The color saturates at
dark orange for p-values which are more than 5 orders of magnitude smaller than
the chosen significance level.

Table 4. GO process annotation

GO-ID Description p-value


43283 biopolymer metabolic process 3.4840E-11
42254 ribosome biogenesis 8.2487E-11
43170 macromolecule metabolic process 7.3685E-9
6259 DNA metabolic process 1.1960E-8
8152 metabolic process 1.3627E-8
44237 cellular metabolic process 2.3559E-8
ribonucleoprotein complex biogenesis
22613 4.3938E-8
and assembly
43284 biopolymer biosynthetic process 4.6317E-8
16072 rRNA metabolic process 6.9420E-8
44238 primary metabolic process 7.6631E-8
9058 biosynthetic process 1.9158E-7
6394 RNA processing 1.9444E-7
6365 rRNA processing 4.0186E-7
16070 RNA metabolic process 6.0819E-7
9059 macromolecule biosynthetic process 6.2404E-7
10467 gene expression 9.9307E-7
6260 DNA replication 1.1083E-6
34470 ncRNA processing 3.6057E-6
6974 response to DNA damage stimulus 6.6203E-6
7049 cell cycle 1.0995E-5
6996 organelle organization and biogenesis 1.4177E-5
Global Similarity and Local Variance in Human Gene Coexpression Networks 165

In fact, from the figure it could be seen that the category ‘biopolymer metabolic
process’ is the important one, and that the over-representation of ‘macromolecule
metabolic process ' and 'metabolic process ' categories is merely a result of the pres-
ence of those 'protein modification' genes. The fact that both categories are colored
equally dark, is due to the saturation of the node color for very low p-values.

4 Conclusion
The global topological properties of the human population gene coexpression networks
studied here are very similar but the specific architectures that underlie these properties
are drastically different. The actual pairs of orthologous genes that are found to be co-
expressed in the different population are highly divergent, although we did detect a
substantial conserved component of the co-expression network. One of the most preva-
lent functional classes that show clear function-expression coherence are genes involved
in biopolymer metabolism. Example of these cluster is shown in Figure 2.
The biological relevance of the global network topological properties appears ques-
tionable[10]. Of course, this does not prevent network analysis from being a powerful
approach, possibly, the most appropriate one for the quantitative study of complex
systems made up of numerous interacting parts.

Fig. 2. Cluster of tightly coexpressed and functionally coherent genes

Acknowledgements
This work was supported in part by the National Natural Science Foundation of China
(Grant Nos. 30871394, 30370798 and 30571034), the National High Tech Develop-
ment Project of China, the 863 Program (Grant Nos. 2007AA02Z329), the National
Basic Research Program of China, the 973 Program (Grant Nos. 2008CB517302) and
the National Science Foundation of Heilongjiang Province (Grant Nos. ZJG0501,
1055HG009, GB03C602-4, BMFH060044, and D200650).
166 I. Krivosheev et al.

References
1. Horvath, S., Dong, J.: Geometric interpretation of gene coexpression network analysis.
PLoS Comput. Biol. 4(8), e1000117 (2008)
2. Carter, S.L., et al.: Gene co-expression network topology provides a framework for mo-
lecular characterization of cellular state. Bioinformatics 20(14), 2242–2250 (2004)
3. Bansal, M., et al.: How to infer gene networks from expression profiles. Mol. Syst. Biol. 3,
78 (2007)
4. Potapov, A.P., et al.: Topology of mammalian transcription networks. Genome
Inform. 16(2), 270–278 (2005)
5. Yu, H., et al.: The importance of bottlenecks in protein networks: correlation with gene
essentiality and expression dynamics. PLoS Comput. Biol. 3(4), e59 (2007)
6. Stranger, B.E., et al.: Relative impact of nucleotide and copy number variation on gene
expression phenotypes. Science 315(5813), 848–853 (2007)
7. The International HapMap Project. Nature 426(6968), 789–796 (2003)
8. Margolin, A.A., et al.: ARACNE: an algorithm for the reconstruction of gene regulatory
networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1), S7 (2006)
9. Vlasblom, J., et al.: GenePro: a Cytoscape plug-in for advanced visualization and analysis
of interaction networks. Bioinformatics 22(17), 2178–2179 (2006)
10. Tsaparas, P., et al.: Global similarity and local divergence in human and mouse gene
co-expression networks. BMC Evol. Biol. 6, 70 (2006)
11. Khaitovich, P., et al.: A neutral model of transcriptome evolution. PLoS Biol. 2(5), E132
(2004)
12. Yanai, I., Graur, D., Ophir, R.: Incongruent expression profiles between human and mouse
orthologous genes suggest widespread neutral evolution of transcription control.
OMICS 8(1), 15–24 (2004)
13. Jordan, I.K., Marino-Ramirez, L., Koonin, E.V.: Evolutionary significance of gene expres-
sion divergence. Gene 345(1), 119–126 (2005)
14. Babu, M.M.: Introduction to microarray data analysis. In: Grant, R.P. (ed.) Computational
Genomics: Theory and Application. Horizon Press, Norwich (2004)
A Grid Based Cooperative Co-evolutionary
Multi-Objective Algorithm

Sepehr Meshkinfam Fard1, Ali Hamzeh2, and Koorush Ziarati2


1
Department of mathematics, College of Science, Shiraz University, Shiraz, Iran
2
CSE and IT Department, School of ECE, Shiraz University, Mollasadra Ave, Shiraz, Iran
[email protected], [email protected],
[email protected]

Abstract. In this paper, a well performing approach in the context of Multi-


Objective Evolutionary Algorithm (MOEA) is investigated due to its complex-
ity. This approach called NSCCGA is based on previously introduced approach
called NSGA-II. NSCCGA performs better than NSGA-II but with a heavy load
of computational complexity. Here, a novel approach called GBCCGA is intro-
duced based on MOCCGA with some modifications. The main difference be-
tween GBCCGA and MOCCGA is in their niching technique which instead of