Course Outline
1. Course Identity
A. Course as listed in CUHK-Shenzhen
The information in this block should be exactly as approved by CUHK Senate. In case
there are any differences, please explain in the table below.
Course code MKT4220
Course title (English) Big Data Marketing
Course title (Chinese) 市场大数据分析
Units 3
Description (English) With the rapid development of high technology, all sorts of high-volume
data are everywhere and booming exponentially. Only until recently,
well-equipped computers are able to handle and analyze these large-
scale data. Currently the most important challenge for researchers is
how to process the big data, dig out the valuable information from the
big data, and obtain meaningful insights from the information. The
purpose of this course is to provide fundamental knowledge to
familiarize students with the most important information technologies
used for preprocessing, storing, manipulating and analyzing the big
data. At the beginning of the course a brief overview of the big data is
introduced to the students: the definition, visualization, toolbox, and
applications. Then the basic platforms for handling big data are
introduced, such as Hadoop and Spark. On these platforms there exists
several different data storage methods and functions to upload,
distribute and preprocess the big data. Then several analytical
algorithms on these platforms are introduced to analyze the big data and
extract the valuable information. Finally, how to visualize the analytic
results and how to get insights from these results are demonstrated to
students. The second half of this course mainly focuses on one of the
analytical algorithms, deep learning. Deep learning is known as deep
structured and hierarchical learning, which attempts to model high-level
abstractions in the data. Basics of deep learning are introduced including
how to set up deep neural network and perform the parameter training.
The purpose is to create students’ awareness of deep learning, familiar
with it through assignments and provide useful insights in the business
application.
Description (Chinese) 随着高新技术的发展,大量的商业信息数据快速增长、无处不在。
最近高速计算机的出现,使得我们具备了处理大型数据的硬件条
件。当前最重要的科学研究挑战就是如何存储、分析大数据;如何
从大数据中提取有用信息;如何利用提取到的信息指导相关领域。
1
该课程的主旨就是指导学生如何了解、掌握最基本的大数据处理技
术:存储、预处理、分析大数据。课程大致结构如下:1)介绍何
为大数据;大数据的特点;大数据的研究难点;2)学习如何操作
使用常用的大数据处理平台;熟悉大数据存储常用界面 Spark;3)
学习使用大数据处理的基本操作;利用人工智能进行高阶数据处
理。其中 2 和 3 是本课程的学习重点和难点。通过这门课程的学
习,学生能基本掌握如何处理商业信息大数据,分析该数据,从而
得到有用的商业信息,制定合理的市场营销策略。
2. Prerequisites / Co-requisites
Please state prerequisites and co-requisites, in terms of courses in CUHK-Shenzhen* or
any other requirements (e.g., having taken certain subjects in high school).
(* Because course codes may not yet be stable, please provide both course code and
course tile.)
A. Prerequisites
Fundamental statistics, programming language, i.e., R or Python
3. Learning Outcomes
After learning this course, students can master the basic techniques and tools to handle
the large volume of data arising from the practice or the research problem.
4. Course syllabus
5. Assessment Scheme
Component/ method % weight
Final Exam (open-book) 25
Mid-term Presentation 25
Final Project (report, code, presentation) 45
Class Engagement 5
6. Grade descriptor
2
Grade Description
A Outstanding performance on all learning outcomes.
A- Generally outstanding performance on all (or almost all) learning outcomes.
B+ Substantial performance on all learning outcomes, OR high performance on some learning
B outcomes which compensates for less satisfactory performance on others, resulting in overall
B- substantial performance.
C+ Satisfactory performance on the majority of learning outcomes, possibly with a few
C weaknesses.
C-
D Barely satisfactory performance on a number of learning outcomes
F Unsatisfactory performance on a number of learning outcomes, OR failure to meet specified
assessment requirements.
7. Feedback for evaluation
CTE
8. Reading
A. Required
No required textbooks. Students are required to read my lecture slides.
B. Recommended
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools,
Techniques, NoSQL, and Graph. By David Loshin, Elsevier, August 23, 2013.
9. Course components
Activity Hours/week
Lecture (1.5 hour each, 2 lectures per week) 3 hours/week
10. Indicative teaching plan
Week Content/ topic/ activity
1 Introduction to Big Data Analytics
2 Memory Hierarchy
3 Introduction to Spark
4 Setting up Python with Spark
5 Spark DataFrame Basics
3
6 Introduction to Machine Learning with MLlib
7 Linear Regression, Logistic Regression
8 Decision Trees, Random Forests
9 Classification
10 Filtering for Recommender System
11 Natural Language Processing
12 Spark Streaming with Python
13 Final Project Presentation
14 Continued Final Project Presentation
11. Implementation plan (2021–22)
The implementation plan may vary from year to year. Please indicate expected
enrolment, and number of sections.
60 students for lectures
[Example: 150 students for lecture (x 2); 30 students for tutorials (x 10)]
12. Approval
Has the course title been included in the programme submission approved by CUHK
Senate? Are there any differences?
Yes
Have the details (as in this document) been approved at School or other level in CUHK-
Shenzhen?
Yes
13. Any other information
N/A
14. Version date
Version number 001
As of (date) 210801