Web Crawlers Research Papers

Website Search Engine Optimization: Geographical and Cultural Point of View

2024, Journal of Software Engineering and Applications

The concept of Webpage visibility is usually linked to search engine optimization (SEO), and it is based on global in-link metric [1]. SEO is the process of designing Webpages to optimize its potential to rank high on search engines,... more

descriptionView Paper arrow_downwardDownload

A Game for Studying Maintenance Alerts' Effectiveness

by Sarit Kraus

2024, Adaptive Agents and Multi-Agents Systems

In this paper we present a spaceship game which allows us to evaluate human behavior with respect to maintenance and repairing malfunctions. We ran an experiment in which subjects played the spaceship game twice. In one of the games, they... more

descriptionView Paper arrow_downwardDownload

Determinants of police officers' support for the public-private partnerships (PPPs) in policing cyberspace

by Mahesh Nalla

2024, Policing

Purpose-This exploratory research examined law enforcement officers' attitudes toward the public-private partnerships (PPPs) in policing cyberspace. Particularly, by investigating the predictors of police officers' support for the PPPs,... more

descriptionView Paper arrow_downwardDownload

DICE-E: A Framework for Conducting Darknet Identification, Collection, Evaluation with Ethics

by Victor Benjamin

2024, MIS Quarterly

Society's growing dependence on computers and information technologies has been matched by an escalation of the frequency and sophistication of cyber attacks committed by criminals operating from the Darknet. As a result, security... more

descriptionView Paper arrow_downwardDownload

Efficient intelligent crawler for hamming distance based on prioritization of web documents

by manju more

2024

with the exponential increase in data storage. Ranking models are used in search engines to locate relevant pages and rank them in decreasing order of relevance. They are an integral component of a search engine. The offline gathering... more

descriptionView Paper arrow_downwardDownload

An Introduction to a Meta-meta-search Engine

by martin lalnunsanga

2024

Imagine that all the information in the entire world written in every known language, and every graphic image, video clip, or photograph copied digitally was available at your fingertips. This vast amount of data could then be reduced to... more

descriptionView Paper arrow_downwardDownload

Website Search Engine Optimization: Geographical and Cultural Point of View

by Rawan Ghnemat

2024, Journal of Software Engineering and Applications

The concept of Webpage visibility is usually linked to search engine optimization (SEO), and it is based on global in-link metric [1]. SEO is the process of designing Webpages to optimize its potential to rank high on search engines,... more

descriptionView Paper arrow_downwardDownload

Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing

by Lawrence Muchemi

2023, International Journal of Artificial Intelligence & Applications

In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy,... more

descriptionView Paper arrow_downwardDownload

Hyphe : utiliser le Web comme terrain d'enquête

by Audrey BANEYX

2023, HAL (Le Centre pour la Communication Scientifique Directe)

descriptionView Paper arrow_downwardDownload

Website Search Engine Optimization: Geographical and Cultural Point of View

by Fawaz AL Zaghoul

2023

The concept of Webpage visibility is usually linked to search engine optimization (SEO), and it is based on global in-link metric [1]. SEO is the process of designing Webpages to optimize its potential to rank high on search engines,... more

descriptionView Paper arrow_downwardDownload

Computers as Surrogate Agents

by Thomas Powers

2023, Information Technology and Moral Philosophy

Computer ethicists have long been intrigued by the possibility that computers, computer programs, and robots might develop to a point at which they could be considered moral agents. In such a future, computers might be considered... more

descriptionView Paper arrow_downwardDownload

Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing

by Gregory Grefenstette

2023, International Journal of Artificial Intelligence & Applications

In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy,... more

descriptionView Paper arrow_downwardDownload

Effects of Automated Messages on Internet Users Attempting to Access “Barely Legal” Pornography

by Richard Wortley

2023, Sexual Abuse

With the increasing number of individuals accessing online child sexual exploitation material (CSEM), there is an urgent need for primary prevention strategies to supplement the traditional focus on arrest and prosecution. We examined... more

descriptionView Paper arrow_downwardDownload

Speeding Up the Web Crawling Process on a Multi-Core Processor Using Virtualization

by Reyadh Naoum

2023, International Journal on Web Service Computing

A Web crawler is an important component of the Web search engine. It demands large amount of hardware resources (CPU and memory) to crawl data from the rapidly growing and changing Web. So that the crawling process should be a continuous... more

descriptionView Paper arrow_downwardDownload

Automatic web news extraction using tree edit distance

by Alberto Laender

2023, Proceedings of the 13th international conference on World Wide Web

The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant information within this huge repository of data. Although... more

descriptionView Paper arrow_downwardDownload

(C) Global Journal Of Engineering Science And Researches [32-36] GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES COMPARISON OF FILTERING RESULT FROM DECISION TREE BASED TECHNIQUE AND COUNT OF THE SEARCH WORDS

by Sujeet Tiwari

2023

Maximum search engines use only the search keywords for searching. Due to the ambiguity of semantics and usages of the search keywords, the results are noisy and many of them do not match the user’s search goals. In general the search... more

descriptionView Paper arrow_downwardDownload

Comparison of Filtering Result from Decision Tree Based Technique and Count of the Search Words

by Sujeet Tiwari

2023

Maximum search engines use only the search keywords for searching. Due to the ambiguity of semantics and usages of the search keywords, the results are noisy and many of them do not match the user’s search goals. In general the search... more

descriptionView Paper arrow_downwardDownload

On Automated Agents’ Rationality

by Ariella Richardson

2023

Agents that interact with humans are known to benefit from integrating behavioral science and exploiting the fact that humans are irrational. Therefore, when designing agents for interacting with automated agents, it is crucial to know... more

Agents that interact with humans are known to benefit from integrating behavioral science and exploiting the fact that humans are irrational. Therefore, when designing agents for interacting with automated agents, it is crucial to know whether the other agents are acting irrationally and if so to what extent. However, little is known about whether irrationality is found in automated agent design. Do automated agents suffer from irrationality? If so, is it similar in nature and extent to human irrationality? How do agents act in domains where human irrationality is motivated by emotion? This is the first time that extensive experimental evaluation was performed in order to resolve these questions. We evaluated agent rationality (for non-expert agents) in several environments and compared agent actions to human actions. We found that automated agents suffer from the same irrationality that humans display, although to a lesser degree. Automated Agents are integrated into countless environments, such as Electronic commerce, Web crawlers, Military agents, Space Exploration probes and Automated drivers. Due to the high importance of automated multi-agent environments, many competitions were established where automated agents compete with each other in order to achieve a goal [35,2,1,36]. Modeling agents is beneficial for agent-agent interaction [29]. However, building such a model is a complex issue, and furthermore, if the model built is too far from the actual opponent's behavior, using it may become detrimental [25,22]. How should designers plan their agents when opponent modeling is unavailable? Can any general assumptions be made on automated agents and used for agent design? Research into peoples' behavior has found that people often do not make strictly rational decisions but instead use sub-optimal, bounded policies. This behavior has been attributed to a variety of reasons including: a lack of knowledge of one's own preferences, the effects of the task complexity, framing effects, the interplay between emotion and cognition, the problem of self-control, the value of anticipation, future discounting, anchoring and many other effects [37,23,5,11]. Since people do not usually use fully rational strategies themselves, agents based on game theory approach, which assume rational behavior in humans often perform poorly [28,20,8]. Many studies have shown that psychological factors and human decision-making theory are needed in order to develop a good model of true human behavior, which in turn is required for optimizing the performance of agents interacting with humans [

descriptionView Paper arrow_downwardDownload

Towards the Ontology Web Search Engine

by Olegs Verhodubs

2023, arXiv: Information Retrieval

The project of the Ontology Web Search Engine is presented in this paper. The main purpose of this paper is to develop such a project that can be easily implemented. Ontology Web Search Engine is software to look for and index ontologies... more

descriptionView Paper arrow_downwardDownload

On Automated Agents’ Rationality

by Avi Rosenfeld

2023

Agents that interact with humans are known to benefit from integrating behavioral science and exploiting the fact that humans are irrational. Therefore, when designing agents for interacting with automated agents, it is crucial to know... more

Agents that interact with humans are known to benefit from integrating behavioral science and exploiting the fact that humans are irrational. Therefore, when designing agents for interacting with automated agents, it is crucial to know whether the other agents are acting irrationally and if so to what extent. However, little is known about whether irrationality is found in automated agent design. Do automated agents suffer from irrationality? If so, is it similar in nature and extent to human irrationality? How do agents act in domains where human irrationality is motivated by emotion? This is the first time that extensive experimental evaluation was performed in order to resolve these questions. We evaluated agent rationality (for non-expert agents) in several environments and compared agent actions to human actions. We found that automated agents suffer from the same irrationality that humans display, although to a lesser degree. Automated Agents are integrated into countless environments, such as Electronic commerce, Web crawlers, Military agents, Space Exploration probes and Automated drivers. Due to the high importance of automated multi-agent environments, many competitions were established where automated agents compete with each other in order to achieve a goal . Modeling agents is beneficial for agent-agent interaction [29]. However, building such a model is a complex issue, and furthermore, if the model built is too far from the actual opponent's behavior, using it may become detrimental . How should designers plan their agents when opponent modeling is unavailable? Can any general assumptions be made on automated agents and used for agent design? Research into peoples' behavior has found that people often do not make strictly rational decisions but instead use sub-optimal, bounded policies. This behavior has been attributed to a variety of reasons including: a lack of knowledge of one's own preferences, the effects of the task complexity, framing effects, the interplay between emotion and cognition, the problem of self-control, the value of anticipation, future discounting, anchoring and many other effects . Since people do not usually use fully rational strategies themselves, agents based on game theory approach, which assume rational behavior in humans often perform poorly . Many studies have shown that psychological factors and human decision-making theory are needed in order to develop a good model of true human behavior, which in turn is required for optimizing the performance of agents interacting with humans .

descriptionView Paper arrow_downwardDownload

Spam, damn spam, and statistics

by spam spam

2023, Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004

The increasing importance of search engines to commercial web sites has given rise to a phenomenon we call "web spam", that is, web pages that exist only to mislead search engines into (mis)leading users to certain web sites. Web spam is... more

descriptionView Paper arrow_downwardDownload

A Study on Web Crawlers and Crawling Algorithms

by Nay Chi Lynn

2022

Making use of search engines is most popular Internet task apart from email. Currently, all major search engines employ web crawlers because effective web crawling is a key to the success of modern search engines. Web crawlers can give... more

descriptionView Paper arrow_downwardDownload

Licensed under the Creative Commons Attribution-ShareAlike 4.0 License

by Jean Baptiste ZOUA

2022

This is a Diamond Open Access article distributed under the terms of the Creative Commons H T U Attribution-NonCommercial-ShareAlike 4.0 International (CC-BY-NC-SA 4.0) LicenseU T H , T which permits unrestricted non-commercial useT ,... more

descriptionView Paper arrow_downwardDownload

Using web crawler technology to support design-related web information collection in idea generation

by Zhihua Wang

2022

Effective information gathering in problem and task related fields with which designers or design teams may not be familiar is a key part of the design process. Designers usually consult with subject experts to access expert information.... more

descriptionView Paper arrow_downwardDownload

Automatic web news extraction using tree edit distance

by Davi Reis

2022, Proceedings of the 13th conference on World Wide Web - WWW '04

The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant information within this huge repository of data. Although... more

descriptionView Paper arrow_downwardDownload

Spam, damn spam, and statistics

by La Avenida

2022, Proceedings of the 7th International Workshop on the Web and Databases colocated with ACM SIGMOD/PODS 2004 - WebDB '04

The increasing importance of search engines to commercial web sites has given rise to a phenomenon we call "web spam", that is, web pages that exist only to mislead search engines into (mis)leading users to certain web sites. Web spam is... more

descriptionView Paper arrow_downwardDownload

Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing

by lawrence muchemi

2022, International Journal of Artificial Intelligence & Applications

In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy,... more

descriptionView Paper arrow_downwardDownload

Centre for Labour Studies : Biennial Report : 2011-2012

by Anna Borg

2022

This article looks at the controversial decision taken by Yahoo CEO Marissa Mayer to ban telework in early 2013. It analyses the pros and cons of teleworking and searches for the underlying assumption that may have led to this ruling. A... more

descriptionView Paper arrow_downwardDownload

Structure-driven crawler generation by example

by Edleno Silva De Moura

2022, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06

Many Web IR and Digital Library applications require a crawling process to collect pages with the ultimate goal of taking advantage of useful information available on Web sites. For some of these applications the criteria to determine... more

descriptionView Paper arrow_downwardDownload

Comprehensive Review of Web Focused Crawling

by Promila Devi

2022

Finding useful information from the Web which has a huge and widely distributed structure requires efficient search techniques. Distributive and varying nature of Web resources is always major issue for search engines maintain latest... more

descriptionView Paper arrow_downwardDownload

Performance Evaluation of Web Crawler

by Sandhya Pundhir

2022, ijcaonline.org

Extracting information from the web is becoming gradually important and popular. To find Web pages one typically uses search engines that are based on the web crawling framework. A web crawler is a software module that fetches data from... more

descriptionView Paper arrow_downwardDownload

Tor Marketplaces Exploratory Data Analysis: The Drugs Case

by Gianluigi Me

2022, Communications in Computer and Information Science

The anonymous marketplaces ecosystem represents a new channel for black market/goods and services, offering a huge variety of illegal items. For many darknet marketplaces, the overall sales incidence is not (yet) comparable with the... more

descriptionView Paper arrow_downwardDownload

Data Capture and Analysis of Darknet Markets

by Roderic Broadhurst

2022, SSRN Electronic Journal

Darknet markets have been studied to varying degrees of success for several years (since the original Silk Road was launched in 2011), but many obstacles are involved which prevent a complete and systematic survey. The Australian National... more

descriptionView Paper arrow_downwardDownload

Survey of Web Crawling Algorithms

by chetan agrawal

2022, Advances in Vision Computing: An International Journal

The World Wide Web is the largest collection of data today and it continues increasing day by day. A web crawler is a program from the huge downloading of web pages from World Wide Web and this process is called Web crawling. To collect... more

descriptionView Paper arrow_downwardDownload

A General Evaluation Framework for Topical Crawlers

by Gautam Pant

2022, Information Retrieval

Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. As the Web mining field matures, the disparate crawling strategies proposed in the... more

descriptionView Paper arrow_downwardDownload

My web intelligence : un outil pour l’analyse du web et des réseaux

by Amar Lakel

2022, I2D - Information, données & documents

L’analyse des sources ouvertes necessite des outils qui soient capables d’effectuer des crawls de sites web pour mieux les categoriser et faciliter leurs analyses sous des formes notamment cartographiques. Base sur l’analyse des... more

descriptionView Paper arrow_downwardDownload

Differences in Time Delay between Search Engine Crawlers at Web Sites

by Dr Sojan Lal

2022

Web log mining provides tremendous information about user traffic and search engine behavior at web sites. The behavior of search engines could be used in analyzing server load, quality of search engines, dynamics of search engine... more

descriptionView Paper arrow_downwardDownload

Offline analysis of web logs to identify offensive web crawlers

by Nilani Algiriyage

2022

With the continuous growth and rapid advancement of web based services, the traffic generated by web servers have drastically increased. Analyzing such data, which is normally known as click stream data, could reveal a lot of information... more

descriptionView Paper arrow_downwardDownload

Design and development of new automatic on-line media monitoring system

by Michal Zabovsky

2022, 2012 7th International Conference on Telecommunication Systems, Services, and Applications (TSSA)

we introduce a new fully automated online media monitoring system MNSight. We explain how the system works, for who is designated, its architecture and scalability. We show that it is easily accessible media monitoring system for the... more

descriptionView Paper arrow_downwardDownload

A Comparative Study of Hidden Web Crawlers

by Sonali Gupta

2022, International Journal of Computer Trends and Technology

A large amount of data on the WWW remains inaccessible to crawlers of Web search engines because it can only be exposed on demand as users fill out and submit forms. The Hidden web refers to the collection of Web data which can be... more

descriptionView Paper arrow_downwardDownload

Search Optimization using Context based Search

by Paramjeet Rawat

2022

Finding meaningful information among the billions of information resources on the web is a tedious task as the popularity of Internet is growing rapidly. The future of web is a structured semantic web in place of unstructured information... more

descriptionView Paper arrow_downwardDownload

Algorithm of Ontology Transformation to Concept Map for Usage in Semantic Web Expert System

by Janis Grundspenkis

2022, Applied Computer Systems

The main purpose of this paper is to present an algorithm of OWL (Web Ontology Language) ontology transformation to concept map for subsequent generation of rules and also to evaluate the efficiency of this algorithm. These generated... more

descriptionView Paper arrow_downwardDownload

Intelligent agents for rehabilitation and care of disabled and chronic patients

by Sarit Kraus

2022

The number of people with disabilities is continuously increasing. Providing patients who have disabilities with the rehabilitation and care necessary to allow them good quality of life creates overwhelming demands for health and... more

descriptionView Paper arrow_downwardDownload

On Automated Agents’ Rationality

by Sarit Kraus

2022

Agents that interact with humans are known to benefit from integrating behavioral science and exploiting the fact that humans are irrational. Therefore, when designing agents for interacting with automated agents, it is crucial to know... more

Agents that interact with humans are known to benefit from integrating behavioral science and exploiting the fact that humans are irrational. Therefore, when designing agents for interacting with automated agents, it is crucial to know whether the other agents are acting irrationally and if so to what extent. However, little is known about whether irrationality is found in automated agent design. Do automated agents suffer from irrationality? If so, is it similar in nature and extent to human irrationality? How do agents act in domains where human irrationality is motivated by emotion? This is the first time that extensive experimental evaluation was performed in order to resolve these questions. We evaluated agent rationality (for non-expert agents) in several environments and compared agent actions to human actions. We found that automated agents suffer from the same irrationality that humans display, although to a lesser degree. Automated Agents are integrated into countless environments, such as Electronic commerce, Web crawlers, Military agents, Space Exploration probes and Automated drivers. Due to the high importance of automated multi-agent environments, many competitions were established where automated agents compete with each other in order to achieve a goal [35,2,1,36]. Modeling agents is beneficial for agent-agent interaction [29]. However, building such a model is a complex issue, and furthermore, if the model built is too far from the actual opponent's behavior, using it may become detrimental [25,22]. How should designers plan their agents when opponent modeling is unavailable? Can any general assumptions be made on automated agents and used for agent design? Research into peoples' behavior has found that people often do not make strictly rational decisions but instead use sub-optimal, bounded policies. This behavior has been attributed to a variety of reasons including: a lack of knowledge of one's own preferences, the effects of the task complexity, framing effects, the interplay between emotion and cognition, the problem of self-control, the value of anticipation, future discounting, anchoring and many other effects [37,23,5,11]. Since people do not usually use fully rational strategies themselves, agents based on game theory approach, which assume rational behavior in humans often perform poorly [28,20,8]. Many studies have shown that psychological factors and human decision-making theory are needed in order to develop a good model of true human behavior, which in turn is required for optimizing the performance of agents interacting with humans [

descriptionView Paper arrow_downwardDownload

Finding Potential Seeds through Rank Aggregation of Web Searches

by Rajendra Prasath

2022, Lecture Notes in Computer Science

This paper presents a potential seed selection algorithm for web crawlers using a gain-share scoring approach. Initially we consider a set of arbitrarily chosen tourism queries. Each query is given to the selected N commercial Search... more

descriptionView Paper arrow_downwardDownload

Vries, “Uncovering the unarchived web

by Anat Ben-David

2022

Many national and international heritage institutes real-ize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list... more

descriptionView Paper arrow_downwardDownload

Uncovering the unarchived web

by Anat Ben-David

2022, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval - SIGIR '14

Many national and international heritage institutes realize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of... more

descriptionView Paper arrow_downwardDownload

Detection of malicious and non-malicious website visitors using unsupervised neural network learning

by Aijun An

2022, Applied Soft Computing

Distributed denials of service (DDoS) attacks are recognized as one of the most damaging attacks on the Internet security today. Recently, malicious web crawlers have been used to execute automated DDoS attacks on web sites across the... more

descriptionView Paper arrow_downwardDownload

Combining classifiers to identify online databases

by Luciano Barbosa

2022, Proceedings of the 16th international conference on World Wide Web - WWW '07

We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database domain D, our goal is to select from F only the forms that... more

descriptionView Paper arrow_downwardDownload

TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling

by Cédrick Fairon

2022, Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

We present a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation. Our simple yet effective approach harvests hypernyms with substring inclusion and... more

descriptionView Paper arrow_downwardDownload

Log In

Web Crawlers

Related Topics