Semester Project (Deep Learning)
M.Sc. Data Science (Weekend)
Project Title: Analyzing Research Trends in Computer Science
Summary:
In this project, you will analyze a large corpus of research papers in the field of Computer
Science published between the years from 2001 to 2020 (20 years). The goal of this project
is to identify emerging research trends and key areas of interest within the field over the past
two decades. You will be tasked with gathering and processing data from a selection of
prominent Computer Science journals, then using advanced data analysis techniques,
specifically Deep Neural Networks (DNNs), to uncover insights into the evolving landscape
of research in Computer Science.
Project Overview:
Computer Science is a rapidly evolving field, with new advancements and subfields
constantly emerging. Over the years, the focus of research has shifted, with different areas
gaining prominence due to technological advancements, industry needs, and scientific
developments. Identifying these shifts can provide valuable insights into the future direction
of the field, as well as help researchers, educators, and industry professionals make
informed decisions about where to focus their efforts.
In this project, you will analyze a collection of research papers from major Computer
Science journals. You will focus on understanding how the focus areas of research have
changed from 2001 to 2020, with an emphasis on identifying emerging trends and key topics
in the field.
Data Collection:
Your first task will be to gather data from a selection of Computer Science journals (List
attached for each group). Each group is assigned a list of journals that are recognized as
major sources of Computer Science research.
The data you will collect from each journal should include the following key information for
each published article between 2001 and 2020:
1. Title of the Paper: The full title of the paper, which will help you identify the topic of
research.
2. Abstract of the Paper: The abstract provides a concise summary of the research and
is essential for understanding the focus of each article.
3. Citation Count: The number of citations that the article has received, which can be
an indicator of the paper's impact and relevance in the field.
4. Publication Year: The year in which the paper was published, which will allow you to
track how the focus of research has shifted over time.
5. Authors & their Affiliation: The affiliation of authors for collaboration network
analysis. You can use only the country name as an affiliation of an author which can
be extracted from author’s addresses.
6. Key Words: The keywords of each article (if available).
You will need to use web scraping techniques to extract this data from the journals. There
are various tools and programming languages you can use for web scraping, such as Python
with libraries like BeautifulSoup, Scrapy, or Selenium. The data you collect should be
organized in a structured format (such as CSV) to facilitate subsequent analysis.
Data Processing:
Once you have gathered the data, the next step will involve cleaning and preprocessing it.
This involves tasks such as:
• Removing duplicates or irrelevant entries.
• Handling missing data or incomplete records.
• Standardizing text (e.g., handling variations in spelling, formatting, or abbreviations).
At this stage, you will also need to ensure that your dataset is structured properly for
analysis. For instance, you should create a table or a database with each record containing
the relevant data fields (title, abstract, citation count, etc.) so that it can be easily input into
your analysis tools.
Applying Deep Neural Networks:
The final step of the project will involve applying Deep Neural Networks (DNNs) to the
cleaned data to identify emerging trends in the research. You will use DNNs to analyze
patterns in the text (such as topics and keywords) and the citation data (to identify highly
influential research). The DNN model will be trained to recognize patterns across multiple
articles, time periods, and research topics, enabling it to predict the trends in Computer
Science research over time.
Expected Outcomes:
By the end of this project, your team will have developed a model and performed data
analysis including:
1. Exploratory Data Analysis: You will perform Exploratory Data Analysis and draw
different plots to explore the data. Additionally, you will perform statistical analysis to
get insights into the data.
2. Identify Key Research Trends: The model will show how specific topics in Computer
Science have evolved, including which areas have gained more attention and which
have declined.
3. Highlight Emerging Areas of Interest: By analyzing the data from the last two
decades, you will be able to identify newly emerging fields or technologies that are
likely to shape the future of Computer Science.
4. Analyze Citation Impact: The model will also be able to identify which papers or
areas of research have had the most significant impact on the field, as measured by
citation counts.
This analysis will provide valuable insights into the development of Computer Science and
help predict future trends. It will also contribute to a better understanding of how research
in this field is interconnected, which areas have been neglected, and where more resources
may be needed.
Final Deliverables:
At the end of the project, each group will be required to:
1. Submit a report summarizing the findings, including detailed analysis and
visualizations of the trends identified.
2. Provide datasets of the collected articles, including titles, abstracts, citation counts,
and any other relevant metadata. (Both raw and cleaned datasets).
3. All the source code files including code for scraping, cleaning, exploratory data
analysis, DNN implementation etc. (Only Source Files)
4. Present your findings to the class, demonstrating the trends you identified, the
methods used, and the impact of your findings.