Here are the answers:
1. When irrelevant attributes have been removed from the data
2. Regression
3. B and C (Predicting the number of pages in a document, Predicting the profit of
a company)
4. Alphabet Nest
5. Comfy
6. Previous Experiences
7. 1980s
8. False Positive and True Negative
9. Recall and Precision
10. Store block location
11. YARN
12.
- A1 (Map Phase) → B2 (Parses input into records as key-value pairs)
- A2 (Partition Phase) → B4 (Each mapper must determine which reducer will
receive each of the outputs)
- A3 (Shuffle Phase) → B1 (Fetches input data from all map tasks for the portion
corresponding to the reduce task’s bucket)
- A4 (Sort Phase) → B5 (Sorts all map outputs into a single run)
- A5 (Reduce Phase) → B3 (Writes output to a file in HDFS)
13. 3, 1, 4, 5, 2
14. 2 and 3 (Operations are performed by multiple processors, Handles small-scale
data)
15. ;
16. It is commonly used to analyze social media coverage.
17. Log in to cloud lab Web console.
18. They do not query actual data.
19. /user/hive/warehouse
20. Three
21. Value
22. 1x10^21
23. Virality
24. Vulnerability
25. A situation where one or more clients are unable to access a service.
26. MLlib
27. Queue Elasticity
28. Scheduler
29. Hadoop
30. FIFO scheduler
31. Dominant Resource Fairness
32. yarn-site.xml
33. Top-down
34. Density
35. Whenever Beer is bought, diaper is also bought
36. Binary Classification
37. Maximize the margin
38. Multi-collinearity
39. **128 MB**
40. **Block Replication**
41. **Web GUI**
42. **Gets a directory listing of user's home directory in HDFS**
43. **!**
44. **NULL**
45.
- **A1 (Catalog)** — **B3 (Provides lookup service for Impala daemons.)**
- **A2 (State Store)** — **B2 (Relays metadata changes to all the Impala daemons
in a cluster.)**
- **A3 (Impala Daemon)** — **B1 (A daemon process that runs on each node of the
cluster.)**
46.
- **A1 (Text)** — **B2 (It is delimited by a comma or a tab.)**
- **A2 (Sequence)** — **B1 (It is widely supported inside and outside the Hadoop
ecosystem.)**
- **A3 (Avro Data)** — **B4 (It is not human readable.)**
- **A4 (Parquet)** — **B3 (It uses advanced optimizations described in Google’s
Dremel paper.)**
47. **Boolean**
48. **Diagnostics → logs → view**
49. **.in**
50. **Full log**
51. **Refresh stale services**
52. **dfs.datanode.http.address**
53.
- **A1 (Host)** — **B4 (A machine (typically physical) running the CM agent.)**
- **A2 (Rack)** — **B1 (Machines in the same rack, typically served by the same
switch.)**
- **A3 (Service)** — **B2 (A system, which may be distributed, running on a
cluster.)**
- **A4 (Config)** — **B3 (A key-value pair associated with a scope.)**
54.
- **A1 (Service)** — **B3 (A category of managed functionality in Cloudera
Manager.)**
- **A2 (Service Instance)** — **B5 (An instance of a service running on a
cluster that spans many role instances.)**
- **A3 (Roles)** — **B2 (Daemons or processes that take care of a service.)**
- **A4 (Role Instance)** — **B1 (An instance of a role running on a host.)**
- **A5 (Role Group)** — **B4 (A set of configuration properties for a set of
role instances.)**
55. **Flume**
56. **Computation frameworks**
57. **Presto**
58. **QJM**
59. **Rack awareness**
60. **3, 5**
61. **Select()**
62. **Data Visualization**
63. **Controls the number of bins**
64. **When we want to plot between 1 numerical and 1 categorical variable**
65. **Error**
66. **Character**
67. **Convolutional Neural Networks**
68. **Quality**
69.
- **A1 (Machine Learning - Product Analytics)** — **B3 (Movie Recommendations)**
- **A2 (ML Applications – Accounting)** — **B1 (Pay-roll management)**
- **A3 (Sales performance of various entities)** — **B2 (Statistical Analysis)**
- **A4 (Major classes of machine learning process)** — **B5 (Training and
testing)**
- **A5 (Training data patterns are used to classify test data)** — **B4 (Learned
Model)**
70. **Four**
71. **Raw Data**
72. **Domain-Specific**
73. **2017**
74. **1 hour**
75. **Hadoop**