TRAINING SHEET
The professionalism and
expansive technical knowledge
demonstrated by our instructor
were incredible. The quality of
the Cloudera training was on
par with a university.
General Dynamics
Cloudera Introduction to Data Science:
Building Recommender Systems
Take Your Knowledge to the Next Level with Clouderas Data Science
Training and Certification
Data scientists build information platforms to ask and answer previously unimaginable
questions. Learn how data science helps companies reduce costs, increase profits, improve
products, retain customers, and identify new opportunities.
Cloudera Universitys three-day course helps participants understand what data scientists
do and the problems they solve. Through in-class simulations, participants apply data
science methods to real-world challenges in different industries and, ultimately, prepare for
data scientist roles in the field.
Hands-On Hadoop
Through instructor-led discussion and interactive, hands-on exercises, participants will
navigate the Hadoop ecosystem, learning topics such as:
The role of data scientists, vertical use cases, and business applications of data
science
Where and how to acquire data, methods for evaluating source data, and data
transformation and preparation
Types of statistics and analytical methods and their relationship
Machine learning fundamentals and breakthroughs, the importance of algorithms, and
data as a platform
How to implement and manage recommenders using Apache Mahout and how to set
up and evaluate data experiments
Steps for deploying new analytics projects to production and tips for working at scale
Audience & Prerequisites
This course is suitable for developers, data analysts, and statisticians with basic knowledge
of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, and Apache Hive. Students
should have proficiency in a scripting language; Python is strongly preferred, but familiarity
with Perl or Ruby is sufficient.
Data Scientist Certification
Upon completion of the course, attendees are encouraged to continue their study and
register for the Cloudera Certified Professional: Data Scientist (CCP:DS) exam. Certification
is a great differentiator; it helps establish you as a leader in the field, providing employers
and customers with tangible evidence of your skills and expertise.
TRAINING SHEET
Course Outline: Cloudera Introduction to Data Science
Introduction
Data Transformation
Anonymization
Data Science Overview
What Is Data Science?
File Format Conversion
The Growing Need for Data Science
Joining Datasets
The Role of a Data Scientist
Data Analysis and Statistical Methods
Implementing Recommenders
with Apache Mahout
Overview
Similarity Metrics for Binary Preferences
Similarity Metrics for Numeric Preferences
Scoring
Finance
Relationship Between Statistics and
Probability
Retail
Descriptive Statistics
Measuring Recommender Effectiveness
Advertising
Inferential Statistics
Designing Effective Experiments
Use Cases
Defense and Intelligence
Fundamentals of Machine Learning
Telecommunications and Utilities
Overview
Healthcare and Pharmaceuticals
The Three Cs of Machine Learning
Project Lifecycle
Steps in the Project Lifecycle
Lab Scenario Explanation
Experimentation and Evaluation
Conducting an Effective Experiment
User Interfaces for Recommenders
Production Deployment and Beyond
Spotlight: Nave Bayes Classifiers
Deploying to Production
Importance of Data and Algorithms
Tips and Techniques for Working at Scale
Recommender Overview
Summarizing and Visualizing Results
What Is a Recommender System?
Considerations for Improvement
Where to Source Data
Types of Collaborative Filtering
Next Steps for Recommenders
Acquisition Techniques
Limitations of Recommender Systems
Conclusion
Fundamental Concepts
Appendix A :
Hadoop Overview
Data Acquisition
Evaluating Input Data
Data Formats
Introduction to Apache Mahout
Data Quantity
What Apache Mahout Is (and Is Not)
Data Quality
A Brief History of Mahout
Availability and Installation
Demonstration: Using Mahouts Item-Based
Recommender
cloudera.com
1-888-789-1488 or 1-650-362-0488
Cloudera, Inc., 1001 Page Mill Road, Palo Alto, CA 94304, USA
Appendix B:
Mathematical Formulas
Appendix C :
Language and Tool Reference
2015 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA
and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.
cloudera-training-sheet-introduction-to-data-science-103