Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
IEEE Access
The volume of adult content on the world wide web is increasing rapidly. This makes an automatic detection of adult content a more challenging task, when eliminating access to ill-suited websites. Most pornographic webpage-filtering systems are based on n-gram, naïve Bayes, K-nearest neighbor, and keyword-matching mechanisms, which do not provide perfect extraction of useful data from unstructured web content. These systems have no reasoning capability to intelligently filter web content to classify medical webpages from adult content webpages. In addition, it is easy for children to access pornographic webpages due to the freely available adult content on the Internet. It creates a problem for parents wishing to protect their children from such unsuitable content. To solve these problems, this paper presents a support vector machine (SVM) and fuzzy ontology-based semantic knowledge system to systematically filter web content and to identify and block access to pornography. The proposed system classifies URLs into adult URLs and medical URLs by using a blacklist of censored webpages to provide accuracy and speed. The proposed fuzzy ontology then extracts web content to find website type (adult content, normal, and medical) and block pornographic content. In order to examine the efficiency of the proposed system, fuzzy ontology, and intelligent tools are developed using Protégé 5.1 and Java, respectively. Experimental analysis shows that the performance of the proposed system is efficient for automatically detecting and blocking adult content. INDEX TERMS Data mining, semantic knowledge, fuzzy ontology, SVM, adult content identification.
Currently, the amount of adult (pornographic) content on the Internet is increasing rapidly. This makes an automatic detection of adult content a more challenging task, when eliminating access to ill-suited websites. It is easy for children to access pornographic webpages due to the freely available adult content on the Internet. It creates a problem for parents wishing to protect their children from such unsuitable content.In 2005, the European Parliament launched a large program called "Safer Use of the Internet", particularly for young people. Some webpages contain a huge amount of combined data related to healthcare (information on diseases, mental health, and physical fitness) and sexual knowledge (medicine for sexual health, birth control, treatment during pregnancy, etc.). In this system, we focus on the recognition of Web adult content. A fuzzy-ontology/SVM-based adult content detection system is proposed to automate the classification of pornographic versus medical websites. The proposed mechanism offers an adult content detection system that classifies webpages into normal, pornographic, or medical webpages using extracted web content features. The adult Web page bag recognition is carried out using multi-instance learning based on the combination of classifying texts, images and videos in Web pages.
International Journal of Future Computer and Communication, 2017
The need of information is such magnitude. It would trigger advances in information technology. Many things can be done by using the internet. The main objective that people use the Internet is to find the information. Unfortunately, not all the information available on the internet it is true and good consumed. There are plenty of information on the internet which is a hoax and contains elements that are contrary to morals and ethics such as terrorism, racism, and pornography. Thus, it required a strong faith and knowledge to be able to sort the incoming information. Due there are many children who also use the internet and conditions where the parents cannot supervise the activities of their children continuously, it would require an application that is embedded in the web browser so that bad content can be blocked automatically. This article focuses on the research of handling pornographic content on the text based webpages. It needed a smart application to be able to distinguish text that contains pornography. Thus, we are implementing artificial intelligence into our research by applying Naive Bayes and information retrieval method. As a result, the application is able to block 88.02% of the pornographic content.
IEEE Transactions on Knowledge and Data Engineering, 2000
This paper describes a Web filtering system "WebGuard," which aims to automatically detect and filter adult content on the Web. WebGuard uses data mining techniques to classify URLs into two classes: suspect URLs and normal URLs. The suspect URLs are stored in a database, which is constantly and automatically updated in order to reflect the highly dynamic evolution of the Web. When working, WebGuard simply captures a user's URL, matches it with the suspect URLs stored in the database and takes an appropriate action -filtering or blockingaccording to the result of the analysis. We started out with a study of most existing software so as to get to know the possibilities and functionalities available on the market at the moment. This phase enabled us to better evaluate the performances of our product as it was being developed. Thus, the second phase of our work was devoted to research into the usual algorithms regarding their advantages and drawbacks. Having gathered this knowledge, we are currently implementing a system that will combine several algorithms in order to increase the software's performance. Our preliminary results show that it can detect and filter adult content effectively.
International Journal of Internet Protocol Technology, 2017
The paper outlines a framework for automated categorisation of web pages to protect against inappropriate content. The paper contains the framework overview, analysis of state-of-the-art, description of the developed prototype and its evaluation based on series of experiments. Several sources are used for the categorisation, namely text, HTML tags and URL addresses. During the categorisation, this data and other information are analysed using machine learning and data mining methods. Finally, the evaluation of the categorisation quality is performed. The categorisation system developed as a result of this work are planned to be partially implemented in F-Secure Corporation in mass production systems performing analysis of web content.
Lecture Notes in Computer Science, 2011
In this work we present InFeRno, an intelligent web pornography elimination system, classifying web pages based solely on their visual content. The main characteristics of our system include: (i) a powerful vector space with a small but sufficient number of features that manage to improve the discriminative ability of the SVM classifier; (ii) an extra class (bikini) that strengthens the performance of the classifier; (iii) an overall classification scheme that achieves high accuracy at considerably lower runtime costs compared to current state-of-theart systems; and (iv) a full-fledged implementation of the proposed system capable of being integrated with ICAP-aware web proxy cache servers.
Visión electrónica, 2017
The incursion of the Internet has created new forms of information and communication, but it can also carry great dangers, when its use is related to inappropriate content, such as, access to harmful contents and the rise of new kinds of crimes. In this situation, automatic filtering systems identify improper Internet content. This paper describes the use of an algorithm, to automatically filter out inappropriate Web pages. To accomplish this (automatic filtering) task implementation method TAN (Tree Augmented Naive Bayes) is plasma. Data mining algorithms and computational learning for the extraction process, representation and classification of web pages are implemented.
International Journal of Computer Vision and Image Processing, 2(1), 75-90, January-March 2012, 2012
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
Journal of Advances in Computer Research, 2015
Filtering of web pages with inappropriate contents is one of the major issues in the field of intelligent network's security. Having a good intelligent filtering method with high accuracy and speed is needed for any country in order to control users' access to the web. So, it has been considered by many researchers. Presenting web pages in an understandable way by machines is one of the most important preprocessing steps. Thus, offering a way to describe web pages with lower dimensions would be very effective, especially in determining the nature of web pages with respect to whether they should be filtered out or not. In this paper, we propose an automatic method to detect forbidden keywords from web pages. Next, we define a new representation of web pages in vector form which consists of weighted sum and frequency of forbidden keywords in different parts of web pages named RWSF. For this, a ranking dictionary of keywords including forbidden keywords is used. To evaluate the proposed method, 2643 pages consisting of 1311 normal pages and 1332 forbidden pages were used. Among these, 1851 pages were used to train the system and 792 pages were used for system evaluation. The system has been assessed using various classifiers such as: k-Nearest Neighbor, Support Vector Machines, Decision Tree and Artificial Neural Networks. Evaluation results indicate the high efficiency and accuracy of the proposed method in all classifiers.
2012 International Symposium on Innovations in Intelligent Systems and Applications, 2012
Internet is an infinite information repository that also contains harmful contents like; pornography, violence, and hate messages. It is very important to obstruct these kinds of contents from underage children not to adversely affect their development. Today, there are many commercial software products developed for this purpose. But the filtering capabilities of these commercial software are limited to text based and image based contents. Different techniques must be used to filter video based contents. This article describes an agent-based system which is developed for the detection of videos containing pornographic contents. Videos on the Internet can be divided into six groups as; anime, commercial, music, sitcom, sports, and porn related. The proposed system uses the Hidden Markov Model (HMM) based classification technique to classify the videos into these pre-defined categories with intelligent agents. Color features are extracted from each video frame and used as observation sequences in HMM for classification. According to the classification results, the videos, which are closely related to the category of sex and pornography, are filtered to the underage users. The test results obtained indicate that the classification has been satisfactory.
ACM SIGKDD Explorations Newsletter, 2006
The World Wide Web has now become a humongous archive of various contents. The inordinate amount of information found on the web presents a challenge to deliver right information to the right users. On one hand, the abundant information is freely accessible to all web denizens; on the other hand, much of such information may be irrelevant or even deleterious to some users. For example, some control and filtering mechanisms are desired to prevent inappropriate or offensive materials such as pornographic websites from reaching children. Ways of accessing websites are termed as Access Scenarios . An Access Scenario can include using search engines (e.g., image search that has very little textual content), URL redirection to some websites, or directly typing (porn) website URLs. In this paper we propose a framework to analyze a website from several different aspects or information sources, and generate a classification model aiming to accurately classify such content irrespective of acc...
2003
As Internet grows quickly, pornography, which is often printed into a small quantity of publication in the past, becomes one of the highly distributed information over Internet. However, pornography may be harmful to children, and may affect the efficiency of workers.
IJIIS: International Journal of Informatics and Information Systems
According to Pornography Statistics,more than 34 percent of Internet users exposeto pornography. There are 12 percent of the total number of websites and 72 million monthly visitors.Internet pornography (Internet Porn) is addictive to teenagers and kids around the world. The normal practice is to block those websites or filter out pornographyfrom kids.In order to do so, researchers has to find a way to detect and classify first. The pixel features including YCbCr range, area of human skin are chosen as pornographyfeatures because of their easy acquisition. C4.5 (Data mining technique)is applied to construct a decision tree. The purpose of this paper is to classify pornography images in a simple if-then rule. The accuracy of experimental result is 85.2%.
Visual Communications and Image Processing 2003, 2003
The paper addresses the problem of distinguishing between pornographic and non-pornographic photographs, for the design of semantic filters for the web. Both, decision forests of trees built according to CART (Classification And Regression Trees) methodology and Support Vectors Machines (SVM), have been used to perform the classification. The photographs are described by a set of low-level features, features that can be automatically computed simply on gray-level and color representation of the image. The database used in our experiments contained 1500 photographs, 750 of which labeled as pornographic on the basis of the independent judgement of several viewers.
The 11th IEEE International Conference on Networks, 2003. ICON2003., 2003
Web filtering is used t o prevent access to undesirable Web pages. In this paper we review a number of existing approaches and point out the shortcomings of each. We then propose a Web filtering system that uses text classification approach to classify Web pages into desirable and undesirable ones, and propose a text elassification algorithm t h a t is well suited to this application.
Lecture Notes in Computer Science, 2000
We present a method to detect automatically pornographic content on the Web. Our method combines techniques from language engineering and image analysis within a machine-learning framework. Experimental results show that it achieves nearly perfect performance on a set of hard cases.
IEEE Access, 2021
Nowadays, fraudulent and malicious websites are emerging as a harmful and very common problem on the Internet. It causes huge money losses and irreparable damage for both companies and particulars. To face this situation, governments have approved multiple law projects. This way, the legality on the Internet is being enforced and sanctions to those offenders who develop illegal or malicious activities are being imposed. However, governments still need a way to simplify the classification of websites into risky or non-risky, since most of this work is manual. This paper presents the DOmains Classifier based on RIsky Websites (DOCRIW) framework to detect domains that contain possible fraud or malicious content. It is based on two main components. The first component is a previously built knowledge base containing information from risky websites. The second one complements the system with a binary classifier able to label a website (as risky or not) considering just its domain. The system makes use of web information sources and includes host-based variables. It also applies similarity measures, supervised learning algorithms and optimization methods to enhance its performance. The presented work is experimental, rendering promising outcomes. INDEX TERMS Risky website detection, malware alerts, knowledge-based systems, similarity metrics, combination of information.
… Journal of Computer …, 2012
In the era of internet, users are keen to discover more in the web. As the number of web pages increases day-by-day malicious web pages are also increasing proportionally. This paper focus on detecting maliciousness in a web page using genetically evolved fuzzy rules. The above formed rules are filtered by Support Vector Machine and finally storing the result in a symbolic knowledge base, with appropriate weightage for each rule. This provides an insight to symbolic and non-symbolic intelligence in malicious web page detection.
Int. J. Comb. Optim. Probl. Informatics, 2016
In this paper, the problem to be solved is the visualization of undesirable contents on the Internet, both by children and young people. The a...
Proceedings of the 9th Joint Conference on Information Sciences (JCIS), 2006
Due to the flood of pornographic web sites on the internet, content-based web filtering has become an important technique to detect and filter inappropriate information on the web. This is because pornographic web sites contain many sexually oriented texts, images, and other information that can be helpful to filter them. In this paper, we build and examine a system to filter web pornography based on image content. Our system consists of three main processes: (i) normalized R/G ratio, (ii) histogram, and (iii) human composition matrix (HCM) based on skin detection. The first process is using the pixel ratios (red and green color channels) for image filtering. The second process, histogram analysis, is to estimate frequency intensities of an image. If an image falls within the range of training set results, it is likely to be a pornographic image. The last process is HCM based on human skin detection. The experimental results show an effective accuracy after testing. This would demonstrate that our hierarchical image filtering techniques can achieve substantial improvements.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.