A
Seminar ReportOn
Data Science
Submitted by
Miss. Preeti Prashant Rakate
Under the guidance of
Prof. V. V. Kadam
Faculty of MCA
In partial fulfillment for the award of the degreeof
MASTER OF COMPUTER APPLICATION
(MCA-II) Semester-III
(2021-2022)
Under Commerce FacultyAT
YSPM’s
Yashoda Technical Campus, Satara (YTC)
Yashoda Shikshan Prasarak Mandal’s
YASHODA TECHNICAL CAMPUS, SATARA
NH-4, Wa Wadhe Phata, Satara. Tele Fax- 02162-271238/39/40
Website- www.yes.edu.in, website- yes.edu.in Email: - [email protected] by
AICTE- New Delhi, Govt. of Maharashtra (DTE, Mumbai)
Affiliated to Shivaji University, Kolhapur / MSBTE, Mumbai.
NAAC Accredited Institute
Ref. No:- Date:
Certificate
This is to certify that the seminar report entitled “Web Scraping-Process,
Techniques, Tool ” submitted by “Miss. Preeti Prashant Rakate” MCA II
Exam Seat No: “ ”
in partial fulfillment of the award of Master of Computer Application
(M.C.A.) course submitted to Shivaji University, Kolhapur for the year 2021-
2022 and this is a genuine and bonafide work prepared under my supervision
and guidance.
To the best of knowledge and belief the matter presented in this project
report has not been submitted earlier to any university for similar purpose.
Place: Satara
Date:
Guide Examiner HOD
INDEX
Sr. No Content Page No.
1 Abstract
2 Introduction
3 History of Data Science
4 Definition of Data Science
5 What is Data science?
The Data Science Process
6 Application / Uses of Data Science
7 Important of Data Science
8 Advantages
Disadvantages
9 Conclusion
10 References
Introduction
Data management and analysis is done by computer programming . In
the data science ,two programming language are most popular - Paython and R.
Data is manipulated to extract information out of it. The mathematical
foundation of data science is statistics and probability. Data Science has become
very popular it is helping your business improve productivity. Multinational
companies can also take advantage of planning motivate, from data science you
to small and medium enterprises.
In a world which is increasingly becoming a digital space, organization
deal with zetta and yottabytes of structured and unstructured information every
day. Evolving technologies have enabled better cost savings and smarter to
store critical data. In the todays now industry, there is a huge need for skilled,
certified data scientists. They are among the highest-paid professionals in the
IT industry.
Before we see into the definition of data science let’s see the history of
data science. It is nothing new that have been introduced today. A Data existed
in 1940’s as well however it was not viewed the way we see it today.
Statisticians played an important role during this period and they used to do
data analysis manually. They lacked use of computer for this purpose as such
important was less.
The use of Industry of data science in popular industry’s in IT companies
organization need to address their complex and expanding data environment in
order to identify new value sources, to exploit future opportunities, and to grow
or optimise efficiently. The differentiating factor for an organization is ‘what
value they extract from their repository of data using analytics and how well
they present it.
Here we list some of the biggest and best companies that are hiring data
scientists at top-notch salaries. The ‘Google’ is by far the biggest companies
that is on a hiring spree for top-notch data scientist . Science today most of
Google is driven by data scientists, artificial intelligence and machine learning,
Google offers some of the best data science salaries. ‘Amazon.in’ is a global e-
commerce and cloud computing giant that is hiring data scientists on a big
scale. They need data scientists to find out about the customer mindset, enhance
the geographical reach of both the e-commerce domain and cloud domain
among other business-driven goals. ‘Visa’ is an online financial gateway for
most of the companies and Visa does transactions in the range of hundreds of
millions over the course of a regular day. Due to this the requirement for data
scientists is huge at Visa to generate more revenue, check fraudulent
transactions, customize the products and services as per the customer
requirements among other things.
History of Data Science
The term data science has appeared in various can text over the 30 years ago bit
did not become an established term utility recently in an early uses as a
substitute for computer science by a Peter Naur in 1960. Naur later introduced
the term "datalogy".[16] In 1974, Naur published Concise Survey of Computer
Methods, which freely used the term data science in its survey of the
contemporary data processing methods that are used in a wide range of
applications.
In 1996,member of the international of Federation of Classification Societies
(IFCS) met in Kobe for their biennial conference. Here, for the first time, the
term data science is included in the title of the conference ("Data Science,
classification, and related methods"),[17] after the term was introduced in a
roundtable discussion by Chikio Hayashi.[4]
In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics =
Data Science?"[18] for his appointment to the H. C. Carver Professorship at
the University of Michigan.[19] In this lecture, he characterized statistical work
as a trilogy of data collection, data modeling and analysis, and decision making.
In his conclusion, he initiated the modern, non-computer science, usage of the
term "data science" and advocated that statistics be renamed data science and
statisticians data scientists.[18] Later, he presented his lecture entitled "Statistics
Data Science?" as the first of his 1998 P.C. Mahalanobis Memorial
Lectures.[20]These lectures honor Prasanta Chandra Mahalanobis, an Indian
scientist and statistician and founder of the Indian Statistical Institute.
In 2001, William S. Cleveland introduced data science as an independent
discipline, extending the field of statistics to incorporate "advances in
computing with data" in his article "Data Science: An Action Plan for
Expanding the Technical Areas of the Field of Statistic .
In April 2002,the international Council for Science(ICSU) Committee on Data
for Science and Technology (CODATA)[22] started the Data Science
Journal,[23] a publication focused on issues such as the description of data
systems, their publication on the internet, applications and legal
issues.[24] Shortly thereafter, in January 2003, Columbia University began
publishing The Journal of Data Science,[25] which provided a platform for all
data workers to present their vi0ews and exchange ideas.
Around 2007,[citation needed] Turing award winner Jim Gray envisioned "data-
driven science" as a "fourth paradigm" of science that uses the computational
analysis of large data as primary scientific method[5][6] and "to have a world in
which all of the science literature is online, all of the science data is online, and
they interoperate with each other."[27]
In the 2012 Harvard Business Review article "Data Scientist: The Sexiest Job
ofthe 21st Century",[7]DJ Patil claims to have coined this term in 2008
with Jeff Hammerbacher to define their jobs at LinkedIn and Facebook,
respectively. He asserts that a data scientist is "a new breed", and that a
"shortage of data scientists is becoming a serious constraint in some sectors",
but describes a much more business-oriented role.
In 2013, the IEEE Task Force on Data Science and Advanced
Analytics[28] was launched. In 2013, the first "European Conference on Data
Analysis (ECDA)" was organised in Luxembourg, establishing the European
Association for Data Science (EuADS). The first international conference: IEEE
International Conference on Data Science and Advanced Analytics was
launched in 2014.[29] In 2014, General Assembly launched student-paid
bootcamp and The Data Incubatorlaunched a competitive free data science
fellowship.[30] In 2014, the American Statistical Association section on
Statistical Learning and Data Mining renamed its journal to "Statistical Analysis
and Data Mining: The ASA Data Science Journal" and in 2016 changed its
section name to "Statistical Learning and Data Science".[31] In 2015, the
International Journal on Data Science and Analytics[32] was launched by
Springer to publish original work on data science and big data analytics. In
September 2015 the Gesellschaft für Klassifikation (GfKl) added to the name of
the Society "Data Science Society" at the third ECDA conference at
the University of Essex, Colchester, UK.
Definition of Data Science
Data science is a multidisciplinary field that combines skills in software
engineering that combines skills in software engineering and statistics with
domain experience to support the end-to- add analysis of large and diverse data
sets, ultimately uncovering value for an organization and then communicating it
to stakeholders as actionable results
What is Data science?
Data management and analysis is done by computer programming .
In the data science ,two programming language are most popular - Paython and
R. Data is manipulated to extract information out of it. The mathematical
foundation of data science is statistics and probability. Data Science has become
very popular it is helping your business improve productivity. Multinational
companies can also take advantage of planning motivate, from data science you
to small and medium enterprises.
Data science is a study of the flow of information from colossal amount of data
present in an organization repository. It involves obtaining meaningful insights
from and unstructured data which is processed through analytical,
programming , and business skills. Companies are focusing on data analytics for
their growth major benefits form the data they already possess. We will also see
few examples that helped companies making best out of data science.
Before we see into the definition of data science let’s see the history of data
science let’s see history of data science it is nothing new that have been
introduced today. Data existed in 1940’s and 1950’s as well however it was not
viewed the way we see it today statistics played an important role during this
time period and they used to do data analysis manually. They lacked use of
computers for this purpose as such it’s importance was less.
-In 1940’s & 1950’s data storage was a big issue
- Today we have apply data storage opportunities
-
Here data science of diagram can be further confused the fact that
common disciplines that a data scientist may draw upon .A data science’s
level of same experience and knowledge in each often varies along a
scale ranging for big and perfect ,and expert, in which ideas.
While these ,and other disciplines and areas experience are all characters of data
scientist role ,like to think of a data foundation as being based on four pillars.
this data science data engineering, scientific method, visualization, Domain
expertise important of thinks in a data science.
The Pillars of Data science expert:
- Business domain
- Statistics and probability
- Computer science & Software programming
- Write & verbal communication
The Data Science Process
The data science process can be a bit variable depending on the project goals
and approach taken, but generally the following:
The data science process involves these phases, more or less:
- Data acquisition, collection, and storage
- Access, ingest, and integrate data
- Processing and cleaning data
- Choosing one or more potential models and algorithms
- Initial data investigation and exploratory data analysis(EDA)
- Measuring and improving result(validation and tuning)
- Delivering, communication, and presenting final result
- Repeat the process to solve a new problem
- Apply data science method and techniques(e.g., machine learning , statistical
modelling, artificial intelligence)
Here is a diagram for Data Science Process:
Application / Uses of Data Science
The data science use of large amount store of data companies have a become
intelligent to push and sell products as per customer purchasing power and
interest .
Internet Search
Digital Advertisements(Targeted Advertising and re-targeting)
Recommender System
Image Recognition
Speech Recognition
Gaming
Price Comparison Website
Airline Route Planning
Fraud and Risk Detection
Delivery logistics
Miscellaneous
Coming up in Future
Self-Driving Cars
Robots
Healthcare
Augmented Reality
Important of Data Science
In the last few years, the data science is really far enough, so they are
integral to understanding the work of many industries .However, the
following are example of why complex world-class culture and
economy data are always an integral part.
Customer of data science branch relational ship help to understand a
number of improved and powerful ways that customer have the power
and support of any of the brand and have a big role in their success and
unsuccess ,with the brand being able to connect their customers
individually so that better power and restriction are well educated. It can
be done.
One of the reasons that status science is attracting so much attention is
that when many companies use data in a broad way when they allow the
data to be interacted in such a compelling and powerful way, they can
share their stories with them well so that a good connect is created. And it
does not connect with customer like this do to all the human emotions can
be generated.
Chart Title
Computer Science
9%
20% Statistics and
9% mathmaticas
11% Economic and social
19% science
13% Data science and
19% analysis
Natural science
Advantages:
Data science competence can be developed easier due to the possible
transfer between the colleagues.
Specialization possible in the team.
Team lead has data science competence.
Data science is the science of systematically discovering patterns
useful knowledge and predict something of value
Disadvantages:
Business and process understanding might suffer.
Longer distances for coordination with department.
High risk for data science tasks to come off badly in the daily in the
business (competing task)
Knowledge exchange between various departments difficult.
Conclusion
Hopefully, this research paints A clearer picture for your and Helps you
understand the core skills and qualifications people currently employed as data
scientists have in addition, the country wise segmentation is invaluable, as
Geographical differences pertain, and so does the skill set required to land the
job.
Reference :
History of data science:
16. https://en.m.wikipedia.org/wiki/data_science#cite_note-16
17. https://en.m.wikipedia.org/wiki/data_science#cite_note-17
4. https://en.m.wikipedia.org/wiki/data_science#cite_note-Hayashi-4
19. http://en.m.wikipedia.org/wiki/data_science#cite_note-cfiwu01-19
18. https://en.m. wikipedia.org/wiki/data_science#cite_note-cfjwutk-18
20. http://en.m.wikipedia.org/wiki/data_science#cite_note-cfjwu02-20
22. https://en.m. wikipedia.org/wiki/data_science#cite_note-ics12-22
23. https://en.m. wikipedia.org/wiki/data_science#cite_note-dsj12-23
24. http://en.m. wikipedia.org/wiki/data_science#cite_note-dsj02-24
25. http://en.m. wikipedia.org/wiki/data_science#cite_note-jds03-25
5. http://en.m. wikipedia.org/wiki/data_science#cite_note-TansleyTolle2009-5
6. http://en.m. wikipedia.org/wiki/data_science#cite_note-BellHey2009-6
27. http://en.m. wikipedia.org/wiki/data_science#cite_note-27
7. http://en.m. wikipedia.org/wiki/data_science#cite_note-Harvard-7
28. http://en.m. wikipedia.org/wiki/data_science#cite_note-28
29. http://en.m. wikipedia.org/wiki/data_science#cite_note-29
30. http://en.m. wikipedia.org/wiki/data_science#cite_note-30
31. http://en.m. wikipedia.org/wiki/data_science#cite_note-ASA-31
32. http://en.m. wikipedia.org/wiki/data_science#cite_note-3s