Skip to main content

Vijay Sonawane

Savitribai Phule Pune University, Information Technology, Faculty Member

Followers

88

Following

66

Public Views

Matthias Tarasiewicz

Devie Ryana Suchendra

Huazhong University of Science and Technology

Vikrant Gandotra

Newcastle University

Abdurrashid Ibrahim Sanka

Giovanni Schmid

Consiglio Nazionale delle Ricerche (CNR)

Halmstad University

Sören Adamsson

InterestsView All (6)

Uploads

Papers by Vijay Sonawane

A Survey on Mining Cryptocurrencies

Recent Trends in Intensive Computing

Advanced monetary standards have acquired huge ubiquity nowadays. Bitcoin is the decentralized, d... more Advanced monetary standards have acquired huge ubiquity nowadays. Bitcoin is the decentralized, disseminated, distributed virtual cash known cryptographic money. Bitcoin mining chips away at standard of the blockchain, which is believed to be one of this present century’s sharp advancement. The blockchain is the arrangement of blocks that are associated so that in the current block there is the hash of the past block. Any adjustment of information in any block in a blockchain brings about a blunder in the entire blockchain. A strategy called mining, where excavators settle a complex numerical riddle, produces Bitcoins. The excavators contend as quickly as time permits to mine the Bitcoin and guarantee the award. Mining should be possible by a solitary individual or by a pool, where a lot of excavators join to mine a solitary block in an organization.

Spectrum trading and sharing in unmanned aerial vehicles based on distributed blockchain consortium system

Computers and Electrical Engineering

Extracting Interesting Knowledge from Versions of Dynamic XML Documents

International Journal of Research in Engineering and Technology, 2013

XML has became very popular for representing semi structured data and a standard for data exchang... more XML has became very popular for representing semi structured data and a standard for data exchange over the web these days. The data exchanged as XML is growing continuously, so the necessity to not only store these large volumes of XML data for future use, but to mine them to discover interesting information has became obvious. The extracted knowledge can be used to make predictions. Recently, a large amount of work has been done in XML data mining. Most of the existing work focuses on the static XML data mining, while XML data is dynamic in real applications. So research has been focused more on XML documents versions and extracting information from versions of XML documents. Approach proposed in this paper is for mining association rules from changes of versions of dynamic XML documents by using information present in the consolidated delta which can be used for making future predictions.

Visual Monitoring System Using Simple Network Management Protocol

2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015

Network, Storage, Resources, Application services monitoring is becoming increasingly important f... more Network, Storage, Resources, Application services monitoring is becoming increasingly important for provision of QoS (Quality of services) in IT. So to manage the huge data centers servers resources and installed application on it required more human resources and time to check every server status by login to it. So in this paper we are introducing the new network and application services monitoring tool. Using it you can manage your all servers in DC. In monitoring we are providing features like: Manage different servers platform (Windows and Linux), Hardware Inventory like how many Disks, Memory, CPU are in used currently. As well as we will monitor how many applications on server and relative services of it. Highlight max disk, memory, CPU usage servers. We will provide alerts configurations visually so Administrator troubleshoots issues and cause of issue quickly. A benefit of system is in very less time or at a time you can manage N number of servers activity. So Organizations Qos (Quality of services) will be increase and also the downtime. For Failure of any resource will be detected in very less time. Troubleshooting also be an easy, Single Human resource can manage complete N number of server.

Internet of Thing Based Home Appliances Control

2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015

Now a days people are using a smart phones and wish to done the thing very efficiently that rapid... more Now a days people are using a smart phones and wish to done the thing very efficiently that rapid increase in number of users internet using the fast decade had made Internet a part and parcel of life and IOT is latest and emerging information technology. Internet of thing is growing network of everyday object from industrial machine to consumer goods that can share the information and complete task while you are busy with other activities. How internet of thing can be use for handling home appliances smartly. Smart home apllications provides the comfort, convenience and user friendly handling the many home applinces. The papar includes various model for web connectivity and also basics of information about the energy conservation system. The home automation system differ from other system by allowing the user to operate the system from anywhere around the world through the IOT.

Searching Web Page Using Entropy Estimation

Data Mining and Knowledge Engineering, 2009

Explosive growth of the web has made information search and extraction harder to the web. User ne... more Explosive growth of the web has made information search and extraction harder to the web. User needs to automatically search product based web pages to locate the product description from huge data. In this paper, we propose simple technique to locate products in the retrieved web page of the e-commercial web site. For this we are taking the benefits of hierarchical structure of HTML language. First it discovers the set of product descriptions based on the measure of entropy at each node in the HTML tag tree of the retrieved web page. Afterward, a set of association rules based on heuristic features is employed for more accuracy in the product extraction.

An Optimistic Approach for Clustering Multi-version XML Documents Using Compressed Delta

International Journal of Electrical and Computer Engineering (IJECE), 2015

Today with Standardization of XML as an information exchange over web, huge amount of information... more Today with Standardization of XML as an information exchange over web, huge amount of information is formatted in the XML document. XML documents are huge in size. The amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Also in real world applications XML documents are dynamic in nature. The versatile applicability of XML documents in different fields of information maintenance and management is increasing the demand to store different versions of XML documents with time. However, storage of all versions of an XML document may introduce the redundancy. Self describing nature of XML creates the problem of verbosity,in result documents are in huge size. This paper proposes optimistic approach to Re-cluster multi-version XML documents which change in time by reassessing distance between them by using knowledge from initial clustering solution and changes stored in compressed delta. Evolving size of XML docume...

Web Site Mining Using Entropy Estimation

2010 International Conference on Data Storage and Data Engineering, 2010

With the unstable growth of the Web, there is an ever-Increasing volume of data and information p... more With the unstable growth of the Web, there is an ever-Increasing volume of data and information published in numerous Web pages. Web mining aims to develop new techniques to effectively extract and mine useful knowledge or information from these Web pages. And allows user to easily locate desired object from huge data. In this paper, we propose simple web site mining technique by mining product information from the pages of the e-commercial web site. For this we are taking the benefits of hierarchical structure of HTML language. First it discovers the set of product descriptions based on the measure of entropy at each node in the HTML tag tree of the retrieved web page. Afterward, a set of association rules based on heuristic features is employed for more accuracy in the product extraction.

FASST Mining: Discovering Frequently Changing Semantic Structure from Versions of Unordered XML Documents

Database Systems for Advanced Applications, 2005

In this paper, we present a FASST mining approach to extract the frequently changing semantic str... more In this paper, we present a FASST mining approach to extract the frequently changing semantic structures (FASSTs), which are a subset of semantic substructures that change frequently, from versions of unordered XML documents. We propose a data structure, H-DOM + , and a FASST mining algorithm, which incorporates the semantic issue and takes the advantage of the related domain knowledge. The distinct feature of this approach is that the FASST mining process is guided by the user-defined concept hierarchy. Rather than mining all the frequent changing structures, only these frequent changing structures that are semantically meaningful are extracted. Our experimental results show that the H-DOM + structure is compact and the FASST algorithm is efficient with good scalability. We also design a declarative FASST query language, FASSTQUEL, to make the FASST mining process interactive and flexible.

Mining association rules from XML data using XQuery

Microcomputer Applications, 2004

In recent years XML has became very popular for representing semistructured data and a standard f... more In recent years XML has became very popular for representing semistructured data and a standard for data exchange over the web. Mining XML data from the web is becoming increasingly important. Several encouraging attempts at developing methods for mining XML data have been proposed. However, efficiency and simplicity are still a barrier for further development. Normally, pre-processing or post-processing are required for mining XML data, such as transforming the data from XML format to relational format. In this paper, we show that extracting association rules from XML documents without any preprocessing or post-processing using XQuery is possible and analyze the XQuery implementation of the well-known Apriori algorithm. In addition, we suggest features that need to be added into XQuery in order to make the implementation of the Apriori algorithm more efficient.

Mining association rules from structural deltas of historical xml documents

… in Knowledge Discovery and Data Mining, 2004

Previous work on XML association rule mining focuses on mining from the data existing in XML docu... more Previous work on XML association rule mining focuses on mining from the data existing in XML documents at a certain time point. However, due to the dynamic nature of online information, an XML document typically evolves over time. Knowledge obtained from mining the evolvement of an XML document would be useful in a wide range of applications, such as XML indexing, XML clustering. In this paper, we propose to mine a novel type of association rules from a sequence of changes to XML structure, which we call XML Structural Delta Association Rule (XSD-AR). We formulate the problem of XSD-AR mining by considering both the frequency and the degree of changes to XML structure. An algorithm, which is derived from the FP-growth, and its optimizing strategy are developed for the problem. Preliminary experiment results show that our algorithm is efficient and scalable at discovering a complete set of XSD-ARs.

A methodology for clustering XML documents by structure

Information Systems, 2006

The processing and management of XML data are popular research issues. However, operations based ... more The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of clustering methods with distances that estimate the similarity between tree structures. This paper presents a framework for clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, we study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. We suggest the usage of structural summaries for trees to improve the performance of the distance calculation and at the same time to maintain or even improve its quality. Our approach is tested using a prototype testbed.

Clustering homogeneous XML documents using weighted similarities on XML attributes

XML (eXtensible Markup Language) have been adopted by number of software vendors today, it became... more XML (eXtensible Markup Language) have been adopted by number of software vendors today, it became the standard for data interchange over the web and is platform and application independent also. A XML document is consists of number of attributes like document data, structure and style sheet etc. Clustering is method of creating groups of similar objects. In this paper a weighted similarity measurement approach for detecting the similarity between the homogeneous XML documents is suggested. Using this similarity measurement a new clustering technique is also proposed. The method of calculating similarity of document's structure and styling is given by number of researchers, mostly which are based on tree edit distances. And for calculating the distance between document's contents there are number of text and other similarity techniques like cosine, jaccord, tf-idf etc. In this paper both of the similarity techniques are combined to propose a new distance measurement technique for calculating the distance between a pair of homogeneous XML documents. The proposed clustering model is implemened using open source technology java and is validated experimentally. Given a collection of XML documents distances between documents is calculated and stored in the java collections, and then these distances are used to cluster the XML documents.

Knowledge Discovery from XML Documents, First International Workshop, KDXD 2006, Singapore, April 9, 2006, Proceedings

... and Querying Methods Information Retrieval from Distributed Semistructured Documents Using Me... more

Fast and effective clustering of XML data using structural information

Knowledge and Information Systems, 2008

This paper presents the incremental clustering algorithm XCLS that groups the XML documents accor... more This paper presents the incremental clustering algorithm XCLS that groups the XML documents according to structural similarity. A Level structure format is introduced to represent the structure of XML documents for efficient processing. A global criterion function that measures the similarity between the new document and existing clusters is developed. It avoids the need to compute the pair-wise similarity between two individual documents and hence saves a huge amount of computing effort. XCLS is further modified to incorporate the semantic meanings of XML tags for investigating the trade-offs between accuracy and efficiency. The empirical analysis shows that the structural similarity overplays the semantic similarity in the clustering process of the structured data such as XML. The experimental analysis shows that the XCLS method is fast and accurate in clustering the heterogeneous documents by structures.

Mining changes from versions of dynamic XML documents

Knowledge Discovery from XML Documents, 2006

The ability to store information contained in XML documents for future reference becomes a very i... more The ability to store information contained in XML documents for future reference becomes a very important issue these days, as the number of applications which use and exchange data in XML format is growing continuously. Moreover, the contents of XML documents are dynamic and they change across time, so researchers are looking to efficient solutions to store the documents’ versions and eventually extract interesting information out of them. This paper proposes a novel approach for mining association rules from changes between versions of dynamic XML documents, in a simple manner, by using the information contained in the consolidated delta. We argue that by applying our proposed algorithm, important information about the behaviour of the changed XML document in time could be extracted and then used to make predictions about its future performance.

XCLS: a fast and effective clustering algorithm for heterogenous XML documents

Advances in Knowledge Discovery and Data Mining, 2006

This paper presents the incremental clustering algorithm, XML documents Clustering with Level Sim... more This paper presents the incremental clustering algorithm, XML documents Clustering with Level Similarity (XCLS), that groups the XML documents according to structural similarity. A level structure format is introduced to represent the structure of XML documents for efficient processing. A global criterion function that measures the similarity between the new document and existing clusters is developed. It avoids the need to compute the pair-wise similarity between two individual documents and hence saves a huge amount of computing effort. XCLS is further modified to incorporate the semantic meanings of XML tags for investigating the trade-offs between accuracy and efficiency. The empirical analysis shows that the structural similarity overplays the semantic similarity in the clustering process of the structured data such as XML. The experimental analysis shows that the XCLS method is fast and accurate in clustering the heterogeneous documents by structures.

An XML-enabled association rule framework

Database and Expert Systems …, 2003

With the sheer amount of data stored, presented and exchanged using XML nowadays, the ability to ... more With the sheer amount of data stored, presented and exchanged using XML nowadays, the ability to extract knowledge from XML data sources becomes increasingly important and desirable. This paper aims to integrate the newly emerging XML technology with data mining technology, using association rule mining as a case in point. Compared with traditional association mining in the well-structured world (e.g., relational databases), mining from XML data is faced with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger data size. To tackle these challenges, in this paper, we propose an extended XML-enabled association rule framework, which is flexible and powerful enough to represent both simple and complex structured association relationships inherent in XML data.

A Survey on Mining Cryptocurrencies

Recent Trends in Intensive Computing

Advanced monetary standards have acquired huge ubiquity nowadays. Bitcoin is the decentralized, d... more Advanced monetary standards have acquired huge ubiquity nowadays. Bitcoin is the decentralized, disseminated, distributed virtual cash known cryptographic money. Bitcoin mining chips away at standard of the blockchain, which is believed to be one of this present century’s sharp advancement. The blockchain is the arrangement of blocks that are associated so that in the current block there is the hash of the past block. Any adjustment of information in any block in a blockchain brings about a blunder in the entire blockchain. A strategy called mining, where excavators settle a complex numerical riddle, produces Bitcoins. The excavators contend as quickly as time permits to mine the Bitcoin and guarantee the award. Mining should be possible by a solitary individual or by a pool, where a lot of excavators join to mine a solitary block in an organization.

Spectrum trading and sharing in unmanned aerial vehicles based on distributed blockchain consortium system

Computers and Electrical Engineering

Extracting Interesting Knowledge from Versions of Dynamic XML Documents

International Journal of Research in Engineering and Technology, 2013

XML has became very popular for representing semi structured data and a standard for data exchang... more XML has became very popular for representing semi structured data and a standard for data exchange over the web these days. The data exchanged as XML is growing continuously, so the necessity to not only store these large volumes of XML data for future use, but to mine them to discover interesting information has became obvious. The extracted knowledge can be used to make predictions. Recently, a large amount of work has been done in XML data mining. Most of the existing work focuses on the static XML data mining, while XML data is dynamic in real applications. So research has been focused more on XML documents versions and extracting information from versions of XML documents. Approach proposed in this paper is for mining association rules from changes of versions of dynamic XML documents by using information present in the consolidated delta which can be used for making future predictions.

Visual Monitoring System Using Simple Network Management Protocol

2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015

Network, Storage, Resources, Application services monitoring is becoming increasingly important f... more Network, Storage, Resources, Application services monitoring is becoming increasingly important for provision of QoS (Quality of services) in IT. So to manage the huge data centers servers resources and installed application on it required more human resources and time to check every server status by login to it. So in this paper we are introducing the new network and application services monitoring tool. Using it you can manage your all servers in DC. In monitoring we are providing features like: Manage different servers platform (Windows and Linux), Hardware Inventory like how many Disks, Memory, CPU are in used currently. As well as we will monitor how many applications on server and relative services of it. Highlight max disk, memory, CPU usage servers. We will provide alerts configurations visually so Administrator troubleshoots issues and cause of issue quickly. A benefit of system is in very less time or at a time you can manage N number of servers activity. So Organizations Qos (Quality of services) will be increase and also the downtime. For Failure of any resource will be detected in very less time. Troubleshooting also be an easy, Single Human resource can manage complete N number of server.

Internet of Thing Based Home Appliances Control

2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015

Now a days people are using a smart phones and wish to done the thing very efficiently that rapid... more Now a days people are using a smart phones and wish to done the thing very efficiently that rapid increase in number of users internet using the fast decade had made Internet a part and parcel of life and IOT is latest and emerging information technology. Internet of thing is growing network of everyday object from industrial machine to consumer goods that can share the information and complete task while you are busy with other activities. How internet of thing can be use for handling home appliances smartly. Smart home apllications provides the comfort, convenience and user friendly handling the many home applinces. The papar includes various model for web connectivity and also basics of information about the energy conservation system. The home automation system differ from other system by allowing the user to operate the system from anywhere around the world through the IOT.

Searching Web Page Using Entropy Estimation

Data Mining and Knowledge Engineering, 2009

Explosive growth of the web has made information search and extraction harder to the web. User ne... more Explosive growth of the web has made information search and extraction harder to the web. User needs to automatically search product based web pages to locate the product description from huge data. In this paper, we propose simple technique to locate products in the retrieved web page of the e-commercial web site. For this we are taking the benefits of hierarchical structure of HTML language. First it discovers the set of product descriptions based on the measure of entropy at each node in the HTML tag tree of the retrieved web page. Afterward, a set of association rules based on heuristic features is employed for more accuracy in the product extraction.

An Optimistic Approach for Clustering Multi-version XML Documents Using Compressed Delta

International Journal of Electrical and Computer Engineering (IJECE), 2015

Today with Standardization of XML as an information exchange over web, huge amount of information... more Today with Standardization of XML as an information exchange over web, huge amount of information is formatted in the XML document. XML documents are huge in size. The amount of information that has to be transmitted, processed, stored, and queried is often larger than that of other data formats. Also in real world applications XML documents are dynamic in nature. The versatile applicability of XML documents in different fields of information maintenance and management is increasing the demand to store different versions of XML documents with time. However, storage of all versions of an XML document may introduce the redundancy. Self describing nature of XML creates the problem of verbosity,in result documents are in huge size. This paper proposes optimistic approach to Re-cluster multi-version XML documents which change in time by reassessing distance between them by using knowledge from initial clustering solution and changes stored in compressed delta. Evolving size of XML docume...

Web Site Mining Using Entropy Estimation

2010 International Conference on Data Storage and Data Engineering, 2010

With the unstable growth of the Web, there is an ever-Increasing volume of data and information p... more With the unstable growth of the Web, there is an ever-Increasing volume of data and information published in numerous Web pages. Web mining aims to develop new techniques to effectively extract and mine useful knowledge or information from these Web pages. And allows user to easily locate desired object from huge data. In this paper, we propose simple web site mining technique by mining product information from the pages of the e-commercial web site. For this we are taking the benefits of hierarchical structure of HTML language. First it discovers the set of product descriptions based on the measure of entropy at each node in the HTML tag tree of the retrieved web page. Afterward, a set of association rules based on heuristic features is employed for more accuracy in the product extraction.

FASST Mining: Discovering Frequently Changing Semantic Structure from Versions of Unordered XML Documents

Database Systems for Advanced Applications, 2005

In this paper, we present a FASST mining approach to extract the frequently changing semantic str... more In this paper, we present a FASST mining approach to extract the frequently changing semantic structures (FASSTs), which are a subset of semantic substructures that change frequently, from versions of unordered XML documents. We propose a data structure, H-DOM + , and a FASST mining algorithm, which incorporates the semantic issue and takes the advantage of the related domain knowledge. The distinct feature of this approach is that the FASST mining process is guided by the user-defined concept hierarchy. Rather than mining all the frequent changing structures, only these frequent changing structures that are semantically meaningful are extracted. Our experimental results show that the H-DOM + structure is compact and the FASST algorithm is efficient with good scalability. We also design a declarative FASST query language, FASSTQUEL, to make the FASST mining process interactive and flexible.

Mining association rules from XML data using XQuery

Microcomputer Applications, 2004

In recent years XML has became very popular for representing semistructured data and a standard f... more In recent years XML has became very popular for representing semistructured data and a standard for data exchange over the web. Mining XML data from the web is becoming increasingly important. Several encouraging attempts at developing methods for mining XML data have been proposed. However, efficiency and simplicity are still a barrier for further development. Normally, pre-processing or post-processing are required for mining XML data, such as transforming the data from XML format to relational format. In this paper, we show that extracting association rules from XML documents without any preprocessing or post-processing using XQuery is possible and analyze the XQuery implementation of the well-known Apriori algorithm. In addition, we suggest features that need to be added into XQuery in order to make the implementation of the Apriori algorithm more efficient.

Mining association rules from structural deltas of historical xml documents

… in Knowledge Discovery and Data Mining, 2004

Previous work on XML association rule mining focuses on mining from the data existing in XML docu... more Previous work on XML association rule mining focuses on mining from the data existing in XML documents at a certain time point. However, due to the dynamic nature of online information, an XML document typically evolves over time. Knowledge obtained from mining the evolvement of an XML document would be useful in a wide range of applications, such as XML indexing, XML clustering. In this paper, we propose to mine a novel type of association rules from a sequence of changes to XML structure, which we call XML Structural Delta Association Rule (XSD-AR). We formulate the problem of XSD-AR mining by considering both the frequency and the degree of changes to XML structure. An algorithm, which is derived from the FP-growth, and its optimizing strategy are developed for the problem. Preliminary experiment results show that our algorithm is efficient and scalable at discovering a complete set of XSD-ARs.

A methodology for clustering XML documents by structure

Information Systems, 2006

The processing and management of XML data are popular research issues. However, operations based ... more The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of clustering methods with distances that estimate the similarity between tree structures. This paper presents a framework for clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, we study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. We suggest the usage of structural summaries for trees to improve the performance of the distance calculation and at the same time to maintain or even improve its quality. Our approach is tested using a prototype testbed.

Clustering homogeneous XML documents using weighted similarities on XML attributes

XML (eXtensible Markup Language) have been adopted by number of software vendors today, it became... more XML (eXtensible Markup Language) have been adopted by number of software vendors today, it became the standard for data interchange over the web and is platform and application independent also. A XML document is consists of number of attributes like document data, structure and style sheet etc. Clustering is method of creating groups of similar objects. In this paper a weighted similarity measurement approach for detecting the similarity between the homogeneous XML documents is suggested. Using this similarity measurement a new clustering technique is also proposed. The method of calculating similarity of document's structure and styling is given by number of researchers, mostly which are based on tree edit distances. And for calculating the distance between document's contents there are number of text and other similarity techniques like cosine, jaccord, tf-idf etc. In this paper both of the similarity techniques are combined to propose a new distance measurement technique for calculating the distance between a pair of homogeneous XML documents. The proposed clustering model is implemened using open source technology java and is validated experimentally. Given a collection of XML documents distances between documents is calculated and stored in the java collections, and then these distances are used to cluster the XML documents.

Knowledge Discovery from XML Documents, First International Workshop, KDXD 2006, Singapore, April 9, 2006, Proceedings

... and Querying Methods Information Retrieval from Distributed Semistructured Documents Using Me... more

Fast and effective clustering of XML data using structural information

Knowledge and Information Systems, 2008

This paper presents the incremental clustering algorithm XCLS that groups the XML documents accor... more This paper presents the incremental clustering algorithm XCLS that groups the XML documents according to structural similarity. A Level structure format is introduced to represent the structure of XML documents for efficient processing. A global criterion function that measures the similarity between the new document and existing clusters is developed. It avoids the need to compute the pair-wise similarity between two individual documents and hence saves a huge amount of computing effort. XCLS is further modified to incorporate the semantic meanings of XML tags for investigating the trade-offs between accuracy and efficiency. The empirical analysis shows that the structural similarity overplays the semantic similarity in the clustering process of the structured data such as XML. The experimental analysis shows that the XCLS method is fast and accurate in clustering the heterogeneous documents by structures.

Mining changes from versions of dynamic XML documents

Knowledge Discovery from XML Documents, 2006

The ability to store information contained in XML documents for future reference becomes a very i... more The ability to store information contained in XML documents for future reference becomes a very important issue these days, as the number of applications which use and exchange data in XML format is growing continuously. Moreover, the contents of XML documents are dynamic and they change across time, so researchers are looking to efficient solutions to store the documents’ versions and eventually extract interesting information out of them. This paper proposes a novel approach for mining association rules from changes between versions of dynamic XML documents, in a simple manner, by using the information contained in the consolidated delta. We argue that by applying our proposed algorithm, important information about the behaviour of the changed XML document in time could be extracted and then used to make predictions about its future performance.

XCLS: a fast and effective clustering algorithm for heterogenous XML documents

Advances in Knowledge Discovery and Data Mining, 2006

This paper presents the incremental clustering algorithm, XML documents Clustering with Level Sim... more This paper presents the incremental clustering algorithm, XML documents Clustering with Level Similarity (XCLS), that groups the XML documents according to structural similarity. A level structure format is introduced to represent the structure of XML documents for efficient processing. A global criterion function that measures the similarity between the new document and existing clusters is developed. It avoids the need to compute the pair-wise similarity between two individual documents and hence saves a huge amount of computing effort. XCLS is further modified to incorporate the semantic meanings of XML tags for investigating the trade-offs between accuracy and efficiency. The empirical analysis shows that the structural similarity overplays the semantic similarity in the clustering process of the structured data such as XML. The experimental analysis shows that the XCLS method is fast and accurate in clustering the heterogeneous documents by structures.

An XML-enabled association rule framework

Database and Expert Systems …, 2003

With the sheer amount of data stored, presented and exchanged using XML nowadays, the ability to ... more With the sheer amount of data stored, presented and exchanged using XML nowadays, the ability to extract knowledge from XML data sources becomes increasingly important and desirable. This paper aims to integrate the newly emerging XML technology with data mining technology, using association rule mining as a case in point. Compared with traditional association mining in the well-structured world (e.g., relational databases), mining from XML data is faced with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger data size. To tackle these challenges, in this paper, we propose an extended XML-enabled association rule framework, which is flexible and powerful enough to represent both simple and complex structured association relationships inherent in XML data.