IBM Cloud and Big Data Quiz
IBM Cloud and Big Data Quiz
A Notebook
B Db2 Warehouse
C IBM Cloud
Question: 2
When sharing a notebook, what will always point to
the most recent version of the notebook?
Your answer
B The permalink
D PixieDust visualization
Question: 3
When creating a Watson Studio project, what do you
need to specify?
Your answer
A Spark service
B Data service
C Collaborators
Your answer
D Data assets
Question: 4
You can import preinstalled libraries if you are using
which languages? (Select two.)
(Please select ALL that apply)
Your answer
A R
B Python
C Bash
D Rexx
E Scala
Question: 5
Who can control a Watson Studio project assets?
Your answer
A Viewers
B Editors
C Collaborators
D Tenants
Question: 6
Which environmental variable needs to be set to
properly start ZooKeeper?
Your answer
A ZOOKEEPER_APP
B ZOOKEEPER_DATA
C ZOOKEEPER
D ZOOKEEPER_HOME
Question: 7
Which is the primary advantage of using column-
based data formats over record-based formats?
Your answer
Question: 8
What is the primary purpose of Apache NiFi?
Your answer
Question: 9
What are three examples of Big Data? (Choose
three.)
(Please select ALL that apply)
Your answer
D bank records
Question: 10
What ZK CLI command is used to list all the ZNodes
at the top level of the ZooKeeper hierarchy, in the
ZooKeeper command-line interface?
Your answer
A get /
B create /
C listquota /
D ls /
Question: 11
What is the default data format Sqoop parses to
export data to a database?
Your answer
A JSON
B CSV
C XML
D SQL
Question: 12
Under the MapReduce v1 architecture, which
function is performed by the TaskTracker?
Your answer
Question: 13
Which statement describes "Big Data" as it is used
in the modern business world?
Your answer
A Indexed databases containing very large volumes of historical data used for comp
C Structured data stores containing very large data sets such as video and audio st
D The summarization of large indexed data stores to provide information about pote
Question: 14
Under the MapReduce v1 architecture, which
function is performed by the JobTracker?
Your answer
Question: 15
Which statement is true about the Hadoop
Distributed File System (HDFS)?
Your answer
C HDFS links the disks on multiple nodes into one large file system.
D HDFS is the framework for job scheduling and cluster resource management.
Question: 16
How does MapReduce use ZooKeeper?
Your answer
Question: 17
Which two Spark libraries provide a native shell?
(Choose two.)
(Please select ALL that apply)
Your answer
A Python
B Scala
C C#
D Java
E C++
Question: 18
What is an authentication mechanism in
Hortonworks Data Platform?
Your answer
A IP address
B Preshared keys
C Kerberos
D Hardware token
Question: 19
What is Hortonworks DataPlane Services (DPS) used
for?
Your answer
A Manage, secure, and govern data stored across all storage environments.
Question: 20
What must be done before using Sqoop to import
from a relational database?
Your answer
Question: 21
What is the native programming language for Spark?
Your answer
A Scala
B C++
C Java
D Python
Question: 22
Which Hortonworks Data Platform (HDP) component
provides a common web user interface for
applications running on a Hadoop cluster?
Your answer
A YARN
B HDFS
C Ambari
D MapReduce
Question: 23
Which Spark RDD operation returns values after
performing the evaluations?
Your answer
A Transformations
B Actions
C Caching
D Evaluations
Question: 24
Which two are use cases for deploying ZooKeeper?
(Choose two.)
(Please select ALL that apply)
Your answer
Question: 25
In a Hadoop cluster, which two are the result of
adding more nodes to the cluster? (Choose two.)
(Please select ALL that apply)
Your answer
Question: 26
Which Spark RDD operation creates a directed
acyclic graph through lazy evaluations?
Your answer
A Distribution
B GraphX
C Transformations
D Actions
Question: 27
Which feature allows application developers to
easily use the Ambari interface to integrate Hadoop
provisioning, management, and monitoring
capabilities into their own applications?
Your answer
A REST APIs
B Postgres RDBMS
D AMS APIs
Question: 28
What is one disadvantage to using CSV formatted
data in a Hadoop data store?
Your answer
B Fields must be positioned at a fixed offset from the beginning of the record.
D Data must be extracted, cleansed, and loaded into the data warehouse.
Question: 29
Which element of Hadoop is responsible for
spreading data across the cluster?
Your answer
A YARN
B MapReduce
Your answer
C AMS
D HDFS
Question: 30
Which component of the Apache Ambari
architecture stores the cluster configurations?
Your answer
A Authorization Provider
C Postgres RDBMS
Question: 31
Which two are examples of personally identifiable
information (PII)? (Choose two.)
(Please select ALL that apply)
Your answer
A Time of interaction
C Email address
D IP address
Question: 32
Under the MapReduce v1 architecture, which
element of the system manages the map and reduce
functions?
Your answer
A SlaveNode
B JobTracker
C MasterNode
D StorageNode
E TaskTracker
Question: 33
Which component of the HDFS architecture
manages storage attached to the nodes?
Your answer
A NameNode
B StorageNode
C DataNode
D MasterNode
Question: 34
Which of the "Five V's" of Big Data describes the
real purpose of deriving business insight from Big
Data?
Your answer
A Volume
Your answer
B Value
C Variety
D Velocity
E Veracity
Question: 35
Which component of the Spark Unified Stack
supports learning algorithms such as, logistic
regression, naive Bayes classification, and SVM?
Your answer
A Spark Learning
B Spork
C Spark SQL
D MLlib
Question: 36
Which two descriptions are advantages of Hadoop?
(Choose two.)
(Please select ALL that apply)
Your answer
Question: 37
Which two of the following are row-based data
encoding formats? (Choose two.)
(Please select ALL that apply)
Your answer
A CSV
B Avro
C ETL
D Parquet
E RC and ORC
Question: 38
Which statement describes the action performed by
HDFS when data is written to the Hadoop cluster?
Your answer
Question: 39
Under the MapReduce v1 architecture, which
element of MapReduce controls job execution on
multiple slaves?
Your answer
A MasterNode
B JobTracker
C SlaveNode
D TaskTracker
E StorageNode
Question: 40
Which component of the Spark Unified Stack
provides processing of data arriving at the system in
real-time?
Your answer
A MLlib
B Spark SQL
C Spark Streaming
D Spark Live
Question: 41
Which two registries are used for compiler and
runtime performance improvements in support of
the Big SQL environment? (Choose two)
(Please select ALL that apply)
Your answer
A DB2ATSENABLE
B DB2FODC
C DB2COMPOPT
D DB2RSHTIMEOUT
E DB2SORTAFTER_TQ
Question: 42
Which script is used to backup and restore the Big
SQL database?
Your answer
A bigsql_bar.py
B db2.sh
C bigsql.sh
D load.py
Question: 43
You need to create a table that is not managed by
the Big SQL database manager. Which keyword
would you use to create the table?
Your answer
A STRING
B BOOLEAN
Your answer
C SMALLINT
D EXTERNAL
Question: 44
Which two of the following data sources are
currently supported by Big SQL? (Choose two)
(Please select ALL that apply)
Your answer
A Oracle
B PostgreSQL
C Teradata
D MySQL
E MariaDB
Question: 45
Which port is the default for the Big SQL Scheduler
to get administrator commands?
Your answer
A 7055
B 7054
C 7052
D 7053
Question: 46
Which tool should you use to enable Kerberos
security?
Your answer
A Hortonworks
B Ambari
C Apache Ranger
D Hive
Question: 47
Which two options can be used to start and stop Big
SQL? (Choose two)
(Please select ALL that apply)
Your answer
A Scheduler
B DSM Console
C Command line
Question: 48
Which command is used to populate a Big SQL
table?
Your answer
A CREATE
Your answer
B QUERY
C SET
D LOAD
Question: 49
Which feature allows the bigsql user to securely
access data in Hadoop on behalf of another user?
Your answer
A Impersonation
B Privilege
C Rights
D Schema
Question: 50
Which command would you run to make a remote
table accessible using an alias?
Your answer
A SET AUTHORIZATION
B CREATE SERVER
C CREATE WRAPPER
D CREATE NICKNAME
Question: 51
The Big SQL head node has a set of processes
running. What is the name of the service ID running
these processes?
Your answer
A Db2
B hdfs
C user1
D bigsql
Question: 52
Which file format contains human-readable data
where the column values are separated by a
comma?
Your answer
A Parquet
B ORC
C Delimited
D Sequence
Question: 53
Which Big SQL authentication mode is designed to
provide strong authentication for client/server
applications by using secret-key cryptography?
Your answer
A Public key
Your answer
B Flat files
C Kerberos
D LDAP
Question: 54
Which type of foundation does Big SQL build on?
Your answer
A Jupyter
B Apache HIVE
C RStudio
D MapReduce
Question: 55
You need to monitor and manage data security
across a Hadoop platform. Which tool would you
use?
Your answer
A SSL
B HDFS
C Hive
D Apache Ranger
Question: 56
What can be used to surround a multi-line string in a
Python code cell by appearing before and after the multi-
line string?
Your answer
A """
B "
Question: 57
For what are interactive notebooks used by data
scientists?
Your answer
Question: 58
What Python statement is used to add a library to
the current code cell?
Your answer
A pull
B import
C load
Your answer
D using
Question: 59
What Python package has support for linear algebra,
optimization, mathematical integration, and
statistics?
Your answer
A NLTK
B Pandas
C NumPy
D SciPy
Question: 60
Which three main areas make up Data Science
according to Drew Conway? (Choose three.)
(Please select ALL that apply)
Your answer
A Traditional research
B Machine learning
C Substantive expertise
E Hacking skills
1- Select all the components of HDP which provides data access capabilities
Pig
Sqoop
Flume
MapReduce
Hive
2- Select the components that provides the capability to move data from relational database
into Hadoop.
Sql
Sqoop
Hive
Kafka
Flume
Ambari
HBase
Phoenix
Hive
Sqoop
4- True or False: The following components are value-add from IBM: Big Replicate, Big SQL,
BigIntegrate, BigQuality, Big Match
TRUE
FALSE
5- True or False: Data Science capabilities can be achieved using only HDP.
TRUE
FALSE
6- True or False: Ambari is backed by RESTful APIs for developers to easily integrate with
their own applications.
TRUE
FALSE
8- Which page from the Ambari UI allows you to check the versions of the software installed
on your cluster?
Monitor page
Integrate page
The Admin > Manage Ambari page
The Admin > Provision page
9- True or False?Creating users through the Ambari UI will also create the user on the HDFS.
TRUE
FALSE
10- True or False? You can use the CURL commands to issue commands to Ambari.
TRUE
FALSE
11- True or False: Hadoop systems are designed for transaction processing.
TRUE
FALSE
12- What is the default number of replicas in a Hadoop system?
5
4
3
2
13- True or False: One of the driving principal of Hadoop is that the data is brought to the
program.
TRUE
FALSE
14- True or False: Atleast 2 Name Nodes are required for a standalone Hadoop cluster.
TRUE
FALSE
15- True or False: The phases in a MR job are Map, Shuffle, Reduce and Combiner
TRUE
FALSE
16- Centralized handling of job control flow is one of the the limitations of MR v1.
TRUE
FALSE
ResourceMaster
ApplicationMaster
ApplicationManager
ResourceManager
Generality
Versality
Speed
Ease of use
18- What are the languages supported by Spark?
(Please select the THREE that apply)
Javascript
HTML
Python
Java
Scala
TRUE
FALSE
20- What would you need to do in a Spark application that you would not need to do in a
Spark shell to start using Spark?
21- True or False: NoSQL database is designed for those that do not want to use SQL.
TRUE
FALSE
SQL
Hive
HBase
Hive
Hadoop
HBase
24- Which Apache project provides coordination of resources?
Streams
Spark
Zeppelin
ZooKeeper
26- True or False: Slider provides an intuitive UI which allows you to dynamically allocate
YARN resources.
TRUE
FALSE
27- True or False: Knox can provide all the security you need within your Hadoop
infrastructure.
TRUE
FALSE
28- True or False: Sqoop is used to transfer data between Hadoop and relational databases.
TRUE
FALSE
29- True or False: For Sqoop to connect to a relational database, the JDBC JAR files for that
database must be located in $SQOOP_HOME/bin.
TRUE
FALSE
30- True or False: Each Flume node receives data as "source", stores it in a "channel", and
sends it via a "sink".
TRUE
FALSE
31- Through what HDP component are Kerberos, Knox, and Ranger managed?
Zookeeper
Ambari
Apache Knox
Apache Ranger
Apache Camel
Apache Knox
33- One of the governance issue that Hortonworks DataPlane Service (DPS) address is
visibility over all of an organization's data across all of their environments — on-prem,
cloud, hybrid — while making it easy to maintain consistent security and governance
TRUE
FALSE
34- True or false: The typical sources of streaming data are Sensors, "Data exhaust" and
high-rate transaction data.
TRUE
FALSE
35- What are the components of Hortonworks Data Flow(HDF)?
Flow management
Stream processing
All of the above
None of the above
Enterprise services
36- True or False: NiFi is a disk-based, microbatch ETL tool that provides flow management
TRUE
FALSE
37- True or False: MiNiFi is a complementary data collection tool that feeds collected data to
NiFi
TRUE
FALSE
38- What main features does IBM Streams provide as a Streaming Data Platform?
(Please select the THREE that apply)
Flow management
Analysis and visualization
Sensors
Rich data connections
Development support
Natural Language
Semi-structured
Graph-based
Structured
Machine-Generated
Unstructured
40- What are the 4Vs of Big Data?
(Please select the FOUR that apply)
Veracity
Velocity
Variety
Value
Volume
Visualization
41- What are the most important computer languages for Data Analytics?
(Please select the THREE that apply)
Scala
HTML
R
SQL
Python
42- True or False: GPUs are special-purpose processors that traditionally can be used to
power graphical displays, but for Data Analytics lend themselves to faster algorithm
execution because of the large number of independent processing cores.
TRUE
FALSE
43- True or False: Jupyter stores its workbooks in files with the .ipynb suffix. These files can
not be stored locally or on a hub server.
TRUE
FALSE
44- $BIGSQL_HOME/bin/bigsql startcommand is used to start Big SQL from the command line?
TRUE
FALSE
45- What are the two ways you can work with Big SQL.
(Please select the TWO that apply)
JQuery
R
JSqsh
Web tooling from DSM
Yes
No
48- The BOOLEAN type is defined as SMALLINT SQL type in Big SQL.
TRUE
FALSE
49- Using the LOAD operation is the recommended method for getting data into your Big
SQL table for best performance.
TRUE
FALSE
50- Which file storage format has the highest performance?
Delimited
Sequence
RC
Parquet
Avro
51- What are the two ways to classify functions?
Built-in functions
Scalar functions
User-defined functions
None of the above
52- True or False: UMASK is used to determine permissions on directories and files.
TRUE
FALSE
53- True or False: You can only Kerberize a Big SQL server before it is installed.
TRUE
FALSE
54- True or False: Authentication with Big SQL only occurs at the Big SQL layer or the client's
application layer.
TRUE
FALSE
TRUE
FALSE
TRUE
FALSE
57- True or False: Nicknames can be used for wrappers and servers.
TRUE
FALSE
58- True or False: Server objects defines the property and values of the connection.
TRUE
FALSE
59- True or False: The purpose of a wrapper provide a library of routines that doesn't
communicates with the data source.
TRUE
FALSE
60- True or False: User mappings are used to authenticate to the remote data source.
TRUE
FALSE
61- True or False: Collaboration with Watson Studio is an optional add-on component that
must be purchased.
TRUE
FALSE
62- True or False: Watson Studio is designed only for Data Scientists, other personas would
not know how to use it.
TRUE
FALSE
63- True or False: Community provides access to articles, tutorials, and even data sets that
you can use.
TRUE
FALSE
64- True or False: You can import visualization libraries into Watson Studio.
TRUE
FALSE
65- True or False: Collaborators can be given certain access levels.
TRUE
FALSE
66- True or False: Watson Studio contains Zeppelin as a notebook interface.
TRUE
FALSE
67- Spark is developed in which language
Java
Scala
Python
R
68- In Spark Streaming the data can be from what all sources?
Kafka
Flume
Kinesis
All of the above
S
71- Which is an advantage that Zeppelin holds over Jupyter?
Your answer
72- Why might a data scientist need a particular kind of GPU (graphics
processing unit)?
Your answer
A. %list-all-magic
B. %dirmagic
C. %list-magic
D. %lsmagic
74- What is the first step in a data science pipeline?
Your answer
A. Exploration
B. Acquisition
C. Manipulation
D. Analytics
76- You have a distributed file system (DFS) and need to set permissions on
the the /hive/warehouse directory to allow access to ONLY the bigsql user.
Which command would you run?
Your answer
A. umask
B. HDFS
C. Kerberos
D. GRANT
79- How many Big SQL management node do you need at minimum?
Your answer
A. 4
B. 1
C. 3
D. 2
80- Which directory permissions need to be set to allow all users to create
their own schema?
Your answer
A. 755
B. 666
C. 777
D. 700
A. CREATE FUNCTION
C. TRANSLATE FUNCTION
A. Directories
B. Schemas
C. Hives
D. Files
83- Which Big SQL feature allows users to join a Hadoop data set to data in
external databases?
Your answer
A. Fluid query
B. Impersonation
C. Integration
D. Grant/Revoke privileges
84- Which two commands would you use to give or remove certain privileges
to/from a user?
Your answer
A. INSERT
B. GRANT
C. SELECT
D. REVOKE
E. LOAD
85- What is an advantage of the ORC file format?
Your answer
A. Efficient compression
86- You are creating a new table and need to format it with parquet. Which
partial SQL statement would create the table in parquet format?
Your answer
A. CREATE AS parquetfile
B. CREATE AS parquet
C. STORED AS parquetfile
D. STORED AS parquet
88- You need to enable impersonation. Which two properties in the bigsql-
conf.xml file need to be marked true?
Your answer
A. bigsql.alltables.io.doAs
B. bigsql.impersonation.create.table.grant.public
C. DB2_ATS_ENABLE
D. DB2COMPOPT
E. $BIGSQL_HOME/conf
89- Using the Java SQL Shell, which command will connect to a database
called mybigdata?
Your answer
A. ./java tables
B. ./jsqsh mybigdata
C. ./java mybigdata
D. ./jsqsh go mybigdata
B. Data source
C. Nickname
D. User mapping
A. Scalability
C. Resource utilization
92- Which feature makes Apache Spark much easier to use than MapReduce?
Your answer
A. Cassandra
B. REDIS
C. HBase
D. MongoDB
A. Zookeeper
B. Pig
C. Hive
D. Sqoop
95- Under the MapReduce v1 programming model, which shows the proper
order of the full set of MapReduce phases?
Your answer
A. C#
B. C++
C. Java
D. .NET
E. Python
F. Scala
B. Parallel Processing
D. RAID-0
B. Ambari Wizard
D. Ambari Server
101- Apache Spark can run on which two of the following cluster managers?
Your answer
A. oneSIS
C. Nomad
D. Apache Mesos
E. Hadoop YARN
A. Map
B. Combiner
C. Reduce
D. Split
A. JBOD
B. RAID
C. LVM
D. SSD
104- What is the name of the Hadoop-related Apache project that utilizes an in-
memory architecture to run applications faster than MapReduce?
Your answer
A. Spark
B. Python
C. Pig
D. Hive
A. YARN
C. Ambari
D. HBase
E. MapReduce
A. NodeRefreshed
B. NodeExpired
C. NodeChildrenChanged
D. NodeDeleted
A. Spark
B. Ambari
C. HBase
D. MapReduce
A. HBase
B. HDFS
C. YARN
D. MapReduce
A. Administration
B. Audit
C. Resiliency
D. Speed
E. Data Protection
114- How can a Sqoop invocation be constrained to only run one mapper?
Your answer
A. TaskManager
B. JobMaster
C. ResourceManager
D. ApplicationMaster
116- Apache Spark provides a single, unifying platform for which three of the
following types of operations?
Your answer
A. ACID transactions
B. graph operations
C. record locking
D. batch processing
E. machine learning
F. transaction processing
A. Spark
B. Pig
C. YARN
D. Hive
118- Under the MapReduce v1 programming model, what happens in a
"Reduce" step?
Your answer
A. Auditing
B. Authentication
C. Authorization
D. Availability
120- Under the YARN/MRv2 framework, the JobTracker functions are split into
which two daemons?
Your answer
A. JobMaster
B. TaskManager
C. ApplicationMaster
D. ScheduleManager
E. ResourceManager
121- Under the YARN/MRv2 framework, which daemon arbitrates the execution
of tasks among all the applications in the system?
Your answer
A. ApplicationMaster
B. JobMaster
C. ScheduleManager
D. ResourceManager
A. Druid
C. NiFi
D. Storm
124- If a Hadoop node goes down, which Ambari component will notify the
Administrator?
Your answer
C. Ambari Wizard
D. REST API
A. HDFS
B. Hive
C. YARN
D. MapReduce
E. Big SQL
F. Cloudbreak
A. Projects
B. Data Assets
C. Analytic Assets
D. Collaborators
127- Which type of cell can be used to document and comment on a process in
a Jupyter notebook?
Your answer
A. Markdown
B. Code
C. Kernel
D. Output
129- Where does the unstructured data of a project reside in Watson Studio?
Your answer
A. Wrapper
B. Database
C. Object Storage
D. Tables
130- Before you create a Jupyter notebook in Watson Studio, which two items
are necessary?
Your answer
A. Spark Instance
B. Scala
C. Project
D. File
E. URL
Dstream
RDD
Shared Variable
None of the above
133- Can we add or setup new string computation after SparkContext starts
Yes
No
134- Which of the following is not the feature of Spark?
143- In addition to stream processing jobs, what all functionality do Spark provides?
Machine learning
Graph processing
Batch processing
All of the above
145- Which of the following is not true for Hadoop and Spark?
Both are data processing platforms
Both are cluster computing environments
Both have their own file system
Both use open source APIs to link between different tools
146- How much faster can Apache Spark potentially run batch-processing programs when
processed in memory than MapReduce can?
10 times faster
20 times faster
100 times faster
200 times faster
147- Which of the following provide the Spark Core’s fast scheduling capability to perform
streaming analytics.
RDD
GraphX
Spark Streaming
Spark R
148- Which of the following is the reason for Spark being Speedy than MapReduce?
DAG execution engine and in-memory computation
Support for different language APIs like Scala, Java, Python and R
RDDs are immutable and fault-tolerant
None of the above
149- Can you combine the libraries of Apache Spark into the same Application, for example,
MLlib, GraphX, SQL and DataFrames etc.
Yes
No
151- Which of the following is not a function of Spark Context in Apache Spark?
Entry point to Spark SQL
To Access various services
To set the configuration
To get the current status of Spark Application
156- Can we edit the data of RDD, for example, the case conversion?
Yes
No
161- For Multiclass classification problem which algorithm is not the solution?
Naive Bayes
Random Forests
Logistic Regression
Decision Trees
194- FlatMap transforms an RDD of length N into another RDD of length M. which of the
following is true for N and M.
a. N>M
b. N<M
c. N<=M
Either a or b
Either b or c
Either a or c
197- In aggregate function can we get the data type different from as that input data type?
Yes
No
198- In which of the following Action the result is not returned to the driver.
collect()
top()
countByValue()
foreach()
121- The primary Machine Learning API for Spark is now the _____ based API
DataFrame
Dataset
RDD
All of the above
123- SparkSQL translates commands into codes. These codes are processed by
Driver nodes
Executor Nodes
Cluster manager
None of the above
124- Spark SQL plays the main role in the optimization of queries.
True
False
128- Which of the following is true for the tree in Catalyst optimizer?
A tree is the main data type in the catalyst.
New nodes are defined as subclasses of TreeNode class.
A tree contains a node object.
All of the above
129- Which of the following is true for the rule in Catalyst optimizer?
We can manipulate tree using rules.
We can define rules as a function from one tree to another tree.
Using rule we get the pattern that matches each pattern to a result.
All of the above
130- Which of the following is not a Spark SQL query execution phases?
Analysis
Logical Optimization
Execution
Physical planning
131- In Spark SQL optimization which of the following is not present in the logical plan -
Constant folding
Abstract syntax tree
Projection pruning
Predicate pushdown
132- In the analysis phase which is the correct order of execution after forming unresolved
logical plan
abcd
acbd
adbc
dcab
133-In the Physical planning phase of Query optimization we can use both Coast-based and
Rule-based optimization.
True
False
134-DataFramein Apache Spark prevails over RDD and does not contain any feature of RDD.
True
False
135- Which of the following are the common feature of RDD and DataFrame?
Immutability
In-memory
Resilient
All of the above
137- In Dataframe in Spark Once the domain object is converted into a data frame, the
regeneration of domain object is not possible.
True
False
138- DataFrameAPI has provision for compile-time type safety.
True
False
a. RDD
b. DataFrame
c. Dataset
Both a and b
Both b and c
Both a and c
143- After transforming into DataFrame one cannot regenerate a domain object
True
False
148- Which of the following is slow to perform simple grouping and aggregation operations.
RDD
DataFrame
Dataset
All of the above
149- Which of the following is good for low-level transformation and actions.
RDD
DataFrame
Dataset
All of the above
151- Which of the following is not true for Apache Spark Execution?
To simplify working with structured data it provides DataFrame abstraction in
Python, Java, and Scala.
The data can be read and written in a variety of structured formats. For example,
JSON, Hive Tables, and Parquet.
Using SQL we can query data,only from inside a Spark program and not from
external tools.
The best way to use Spark SQL is inside a Spark application. This empowers us to
load data and query it with SQL.
152- When SQL run from the other programming language the result will be
DataFrame
DataSet
Either DataFrame or Dataset
Neither DataFrame nor Dataset
154- Dataset API is not supported by Python. But because of the dynamic nature of Python,
many benefits of Dataset API are available.
True
False
155- Which of the following is true for Catalyst optimizer?
The optimizer helps us to run queries much faster than their counter RDD part.
The optimizer helps us to run queries little faster than their counter RDD part.
The optimizer helps us to run queries in the same speed as their counter RDD part.
157- With the help of Spark SQL, we can query structured data as a distributed dataset
(RDD).
True
False
166- Whichcommand is used to check the status of all daemons running in the HDFS.
jps
fsck
distcp
None of the above
167- What license is Apache Hadoop distributed under?
Apache License 2.0
Shareware
Mozilla Public License
Commercial
169- Apache Hadoop achieves reliability by replicating the data across multiple hosts, and
hence does not require ________ storage on hosts.
Standard RAID levels
RAID
ZFS
Operating system
175- Which of the below apache system deals with ingesting streaming data to hadoop
Flume
Oozie
Hive
Kafka
177- Whichcommand lists the blocks that make up each file in the filesystem
hdfs fsck / -files -blocks
hdfs fsck / -blocks -files
hdfs fchk / -blocks -files
hdfs fchk / -files -blocks
179- Whchof the file contains the configuration setting for HDFS daemons
yarn-site.xml
hdfs-site.xml
mapred-site.xml
None of the above
181- Whchof the file contains the configuration setting for NodeManager and
ResourceManager
yarn-site.xml
hdfs-site.xml
mapred-site.xml
None of the above
182- Hadoop can be used to create distributed clusters, based on commodity servers, that
provide low-cost processing and storage for unstructured data
True
False
185- Which of the following is used to ingest streaming data into Hadoop clusters
Flume
Sqoop
Both the above
None of the above
186- Hadoop distributed file system behaves similarly to which of the following:
RAID-1 Filesystem
RAID-0 Filesystem
Both the above
All of the above
188- Which of the following is used to ingest data into Hadoop clusters?
Flume
Sqoop
Both the above
Nonw of the above
189- Which of the following is a data processing engine for clustered computing?
Drill
Oozie
Spark
All of the above
190- Whichtool could be used to move data from RDBMS data to HDFS?
Sqoop
Flume
Both the above
None of the above
191- All the files in a directory in HDFS can be merged together using which of the
following?
Put merge
Get merge
Remerge
Merge all
192- Whichof these provides a Stream processing system used in Hadoop ecosystem?
Hive
Solr
Tez
Spark
193- The client reading the data from HDFS filesystem in Hadoop does which of the
following?
Gets only the block locations form the namenode
Gets the data from the namenode
Gets both the data and block location from the namenode
Gets the block location from the datanode
194- Whichof the following jobs are optimized for scalability but not latency
Mapreduce
Drill
Oozie
Hive
199- HDFS allows a client to read a file which is already opened for writing?
False
True
200- What happens when a file is deleted from the command line?
It is permanently deleted if trash is enabled.
It is permanently deleted and the file attributes are recorded in a log file.
It is placed into a trash directory common to all users for that cluster.
None of the above
203- Checkpoint node download the FsImage and EditLogs from the NameNode & then
merge them & store the modified FsImage
Into persistent storage
Back to the active NameNode
205- Which command is used to know the current status of the safe mode
hadoop dfsadmin –safemode get
hadoop dfsadmin –safemode getStatus
hadoop dfsadmin –safemode status
None of the above
207- Whichof the following feature overcomes this single point of failure
None of the above
HDFS federation
High availability
Erasure coding
214- In which process duplicate task is created to improve the overall execution time
Erasure coding
Speculative execution
HDFS federation
None of the above
215- In which mode each daemon runs on a single node but there is separate java process
for each daemon
Local (Standalone) mode
Fully distributed mode
Pseudo-distributed mode
None of the above
216- In which mode each daemon runs on a single node as a single java process
Local (Standalone) mode
Pseudo-distributed mode
Fully distributed mode
None of the above
218- Which configuration file is used to control the HDFS replication factor?
mapred-site.xml
hdfs-site.xml
core-site.xml
yarn-site.xml
223- Which of the following Hadoop config files is used to define the heap size?
hdfs-site.xml
core-site.xml
hadoop-env.sh
mapred-site.xml
224- Which of the following feature you will use submit jars, static files for MapReduce job
during runtime
Distributed cache
Speculative execution
Data locality
Erasure coding
225- Which of the following method used to set the output directory
FileOutputFormat.setOutputgetpath()
OutputFormat.setOutputpath()
FileOutputFormat.setOutputpath()
OutputFormat.setOutputgetpath()
227- Which tool is used to distributes data evenly on all disks of a datanode
Balancer
Disk Balancer
228- Which of the following must be set true enable diskbalnecr in hdfs-site.xml
dfs.balancer.enabled
dfs.disk.balancer.enabled
dfs.diskbalancer.enabled
dfs.disk.balancer.enabled
229- In disk balancer datanode uses which volume choosing the policy to choose the disk for
the block.
Round-robin
Available space
All of the above
None of the above
230- Which among the following is configuration files in Hadoop
core-site.xml
hdfs-site.xml
yarn-site.xml
All of the above
236- Which of the following command is used to check for various inconsistencies
zkfc
fs
fsck
fetchdt
239- Pig is a:
Programming Language
Data Flow Language
Query Language
Database
243- Which of the following is a column-oriented database that runs on top of HDFS
Hive
Sqoop
HBase
Flume
244- Which command is used to show all the Hadoop daemons that are running on the
machine
distcp
jps
fsck
245- Hadoop is a framework that works with a variety of related tools. Common cohorts
include:
MapReduce, Hive and HBase
MapReduce, MySQL and Google Apps
MapReduce, Hummer and Iguana
MapReduce, Heron and Trumpet
Question: 1
Which capability does IBM BigInsights add to enrich
Hadoop?
Your answer
A Jaql
C Adaptive MapReduce
Question: 2
What is one of the four characteristics of Big Data?
Your answer
A value
B volume
C verifiability
D volatility
Question: 3
Which Hadoop-related project provides common
utilities and libraries that support other Hadoop sub
projects?
Your answer
A Hadoop Common
Your answer
B Hadoop HBase
C MapReduce
D BigTable
Question: 4
Which type of Big Data analysis involves the
processing of extremely large volumes of constantly
moving data that is impractical to store?
Your answer
B Text Analysis
C Stream Computing
D MapReduce
Question: 6
Which primary computing bottleneck of modern
computers is addressed by Hadoop?
Your answer
A 64-bit architecture
B disk latency
C MIPS
Your answer
Question: 7
Which Big Data function improves the decision-
making capabilities of organizations by enabling the
organizations to interpret and evaluate structured
and unstructured data in search of valuable
business information?
Your answer
A stream computing
B data warehousing
C analytics
Question: 8
What is one of the two technologies that Hadoop
uses as its foundation?
Your answer
A HBase
B Apache
C Jaql
D MapReduce
Question: 9
What key feature does HDFS 2.0 provide that HDFS
does not?
Your answer
Question: 10
What are two of the core operators that can be used
in a Jaql query? (Select two.)
Your answer
A LOAD
B JOIN
C TOP
D SELECT
Question: 11
Which type of language is Pig?
Your answer
A SQL-like
B compiled language
Your answer
C object oriented
D data flow
Question: 12
If you need to change the replication factor or
increase the default storage block size, which file do
you need to modify?
Your answer
A hdfs.conf
B hadoop-configuration.xml
C hadoop.conf
D hdfs-site.xml
Question: 13
To run a MapReduce job on the BigInsights cluster,
which statement about the input file(s) must be true?
Your answer
A The file(s) must be stored on the local file system where the map reduce job was
D No matter where the input files are before, they will be automatically copied to wh
Question: 14
What is a characteristic of IBM GPFS that
distinguishes it from other distributed file systems?
Your answer
B posix compliance
Question: 15
Which statement represents a difference between
Pig and Hive?
Your answer
Question: 16
D mkdir mydata
Question: 17
A Reduce
B Shuffle
C Combine
D Map
Question: 18
Under the MapReduce programming model, which
task is performed by the Reduce step?
Your answer
Question: 19
Which element of the MapReduce architecture runs
map and reduce jobs?
Your answer
A Reducer
B JobScheduler
C TaskTracker
D JobTracker
Question: 20
What is one of the two driving principles of
MapReduce?
Your answer
A Cluster
B Distributed
C Remote
D Debugging
E Local
Question: 22
Which statement is true regarding the number of
mappers and reducers configured in a cluster?
Your answer
B The number of mappers and reducers can be configured by modifying the mapred-site.xm
Question: 23
Which command displays the sizes of files and
directories contained in the given directory, or the
length of a file, in case it is just a file?
Your answer
A hadoop size
B hdfs -du
C hdfs fs size
D hadoop fs -du
Question: 24
Following the most common HDFS replica
placement policy, when the replication factor is
three, how many replicas will be located on the local
rack?
Your answer
A three
B two
C one
D none
Question: 25
In the MapReduce processing model, what is the
main function performed by the JobTracker?
Your answer
Question: 26
How are Pig and Jaql query languages similar?
Your answer
Question: 27
Under the HDFS architecture, what is one purpose of
the NameNode?
Your answer
A hadoop fs list
B hdfs root
C hadoop fs -Is /
D hdfs list /
Question: 29
What is one function of the JobTracker in
MapReduce?
Your answer
D manages storage
Question: 30
In addition to the high-level language Pig Latin, what
is a primary component of the Apache Pig platform?
Your answer
D runtime environment
Question: 31
Which statement is true about Hadoop Distributed
File System (HDFS)?
Your answer
Question: 32
Which is a use-case for Text Analytics?
Your answer
A BigSheets client
B Microsoft Excel
C Eclipse
D Web Browser
Question: 34
Which technology does Big SQL utilize for access to
shared catalogs?
Your answer
A Hive metastore
B RDBMS
C MapReduce
D HCatalog
Question: 35
Which statement will make an AQL view have
content displayed?
Your answer
Question: 36
You work for a hosting company that has data
centers spread across North America. You are trying
to resolve a critical performance problem in which a
large number of web servers are performing far
below expectations. You know that the information
written to log files can help determine the cause of
the problem, but there is too much data to manage
easily. Which type of Big Data analysis is
appropriate for this use case?
Your answer
A Text Analytics
B Stream Computing
C Data Warehousing
D Temporal Analysis
Question: 37
Which utility provides a command-line interface for
Hive?
Your answer
A Thrift client
Your answer
B Hive shell
Question: 38
What is an accurate description of HBase?
Your answer
Question: 39
Which Hadoop-related technology provides a user-
friendly interface, which enables business users to
easily analyze Big Data?
Your answer
A BigSQL
B BigSheets
C Avro
D HBase
Question: 40
What drives the demand for Text Analytics?
Your answer
A Text Analytics is the most common way to derive value from Big Data.
Question: 41
In Hive, what is the difference between an external
table and a Hive managed table?
Your answer
D An external table refers to the data stored on the local file system.
Question: 42
Which statement about NoSQL is true?
Your answer
A It provides all the capabilities of an RDBMS plus the ability to manage Big Data.
B It is a database technology that does not use the traditional relational model.
Your answer
Question: 43
If you need to JOIN data from two workbooks, which
operation should be performed beforehand?
Your answer
A "Copy" to create a new sheet with the other workbook data in the current workboo
C "Load" to create a new sheet with the other workbook data in the current workboo
Question: 44
What is the "scan" command used for in HBase?
Your answer
C AQLBuilder
Question: 46
What is the most efficient way to load 700MB of data
when you create a new HBase table?
Your answer
A Pre-create regions by specifying splits in create table command and use the inser
B Pre-create regions by specifying splits in create table command and bulk loading
C Pre-create the column families when creating the table and use the put command
D Pre-create the column families when creating the table and bulk loading the data.
Question: 47
The following sequence of commands is executed:
create 'table_1','column_family1','column_family2'
put 'table_1','row1','column_family1:c11','r1v11'
put 'table_1','row2','column_family1:c12','r1v12'
put 'table_1','row2','column_family2:c21','r1v21'
put 'table_1','row3','column_family1:d11','r1v11'
put 'table_1','row2','column_family1:d12','r1v12'
put 'table_1','row2','column_family2:d21','r1v21'
In HBase, which value will the "count 'table_1'"
command return?
Your answer
A 4
B 3
C 6
D 2
Question: 48
Which Hive command is used to query a table?
Your answer
A TRANSFORM
B SELECT
C GET
D EXPAND
Question: 49
Why develop SQL-based query languages that can
access Hadoop data sets?
Your answer
C because data stored in a Hadoop cluster lends itself to structured SQL queries
Question: 50
Which key benefit does NoSQL provide?
Your answer
D It can cost-effectively manage data sets too large for traditional RDBMS.
Question: 51
What makes SQL access to Hadoop data difficult?
Your answer
A list tables
B describe tables
C show all
D show tables
Question: 53
In HBase, what is the "count" command used for?
Your answer
Question: 54
Which Hadoop-related technology supports analysis
of large datasets stored in HDFS using an SQL-like
query language?
Your answer
A HBase
Your answer
B Pig
C Jaql
D Hive
Question: 55
How can the applications published to BigInsights
Web Console be made available for users to
execute?
Your answer
Question: 56
Which component of Apache Hadoop is used for
scheduling and running workflow jobs?
Your answer
A Eclipse
B Oozie
C Jaql
D Task Launcher
Question: 57
What is one of the main components of Watson
Explorer (InfoSphere Data Explorer)?
Your answer
A validater
B replicater
C crawler
D compressor
Question: 58
IBM InfoSphere Streams is designed to accomplish
which Big Data function?
Your answer
Question: 59
Which IBM Big Data solution provides low-latency
analytics for processing data-in-motion?
Your answer
B InfoSphere Streams
C InfoSphere BigInsights
Question: 60
Which IBM tool enables BigInsights users to
develop, test and publish BigInsights applications?
Your answer
A Avro
B HBase
C Eclipse
B gaining new insight through the capabilities of the world's interconnected intellige
C providing solutions to help customers manage and grow large database systems
D using modern technology to efficiently store the massive amounts of data generat
examen blanc big data
question reponses
Question: 1 A Jaql
Which capability does IBM BigInsights add to enrich Hadoop? B Fault tolerance through HDFS replication
C Adaptive MapReduce
D Parallel computing on commodity servers
Question: 2 A value
What is one of the four characteristics of Big Data? B volume
C verifiability
D volatility
Question: 8 A HBase
What is one of the two technologies that Hadoop uses as its foundation? B Apache
C Jaql
D MapReduce
Question: 9 A a high throughput, shared file system
What key feature does HDFS 2.0 provide that HDFS does not? B high availability of the NameNode
C data access performed by an RDBMS
D random access to data in the cluster
Question: 10 A LOAD
What are two of the core operators that can be used in a Jaql query? (Select two.) B JOIN
C TOP
D SELECT
Question: 11 A SQL-like
Which type of language is Pig? B compiled language
C object oriented
D data flow
If you need to change the replication factor or increase the default storage block size, which A hdfs.conf
file do you need to modify? B hadoop-configuration.xml
C hadoop.conf
D hdfs-site.xml
Question: 13 A The file(s) must be stored on the local file system where the map reduce job was
To run a MapReduce job on the BigInsights cluster, which statement about the input file(s) developed.
must be true? B The file(s) must be stored in HDFS or GPFS.
C The file(s) must be stored on the JobTracker.
D No matter where the input files are before, they will be automatically copied to where
the job runs.
What is a characteristic of IBM GPFS that distinguishes it from other distributed file systems? A operating system independence
B posix compliance
C no single point of failure
D blocks that are stored on different nodes
Question: 15 A Pig is used for creating MapReduce programs.
Which statement represents a difference between Pig and Hive? B Pig has a shelf interface for executing commands.
C Pig is not designed for random reads/writes or low-latency queries.
D Pig uses Load, Transform, and Store.
Question: 17 A Reduce
B Shuffle
In which step of a MapReduce job is the output stored on the local disk? C Combine
D Map
Question: 19 A Reducer
Which element of the MapReduce architecture runs map and reduce jobs? B JobScheduler
C TaskTracker
D JobTracker
Question: 19 A Reducer
Which element of the MapReduce architecture runs map and reduce jobs? B JobScheduler
C TaskTracker
D JobTracker
Question: 21 A Cluster
When running a MapReduce job from Eclipse, which BigInsights execution models are B Distributed
available? (Select two.) C Remote
D Debugging
E Local
Question: 22 A The number of reducers is always equal to the number of mappers.
Which statement is true regarding the number of mappers and reducers configured in a B The number of mappers and reducers can be configured by modifying the mapred-site.
cluster? xml file.
C The number of mappers and reducers is decided by the NameNode.
D The number of mappers must be equal to the number of nodes in a cluster.
Question: 24 A three
Following the most common HDFS replica placement policy, when the replication factor is B two
three, how many replicas will be located on the local rack? C one
D none
Question: 39 A BigSQL
Which Hadoop-related technology provides a user-friendly interface, which enables business B BigSheets
users to easily analyze Big Data? C Avro
D HBase
Question: 40 A Text Analytics is the most common way to derive value from Big Data.
What drives the demand for Text Analytics? B MapReduce is unable to process unstructured text.
C Data warehouses contain potentially valuable information.
D Most of the world's data is in unstructured or semi-structured text.
Question: 41 A An external table refers an existing location outside the warehouse directory.
In Hive, what is the difference between an external table and a Hive managed table? B An external table refers to a table that cannot be dropped.
C An external table refers to the data from a remote database.
D An external table refers to the data stored on the local file system.
Question: 42 A It provides all the capabilities of an RDBMS plus the ability to manage Big Data.
Which statement about NoSQL is true? B It is a database technology that does not use the traditional relational model.
C It is based on the highly scalable Google Compute Engine.
D It is an IBM project designed to enable DB2 to manage Big Data.
Question: 43 A "Copy" to create a new sheet with the other workbook data in the current workbook
If you need to JOIN data from two workbooks, which operation should be performed B "Group" to bring together the two workbooks
beforehand? C "Load" to create a new sheet with the other workbook data in the current workbook
D "Add" to add the other workbook data to the current workbook
Which tool is used for developing a BigInsights Text Analytics extractor? A Eclipse with BigInsights tools for Eclipse plugin
B BigInsights Console with AQL plugin
C AQLBuilder
D AQL command line
Question: 46 A Pre-create regions by specifying splits in create table command and use the insert
What is the most efficient way to load 700MB of data when you create a new HBase table? command to load data.
B Pre-create regions by specifying splits in create table command and bulk loading the
data.
C Pre-create the column families when creating the table and use the put command to
load the data.
D Pre-create the column families when creating the table and bulk loading the data.
Question: 47 A4
The following sequence of commands is executed: B3
create 'table_1','column_family1','column_family2' C6
put 'table_1','row1','column_family1:c11','r1v11' D2
put 'table_1','row2','column_family1:c12','r1v12'
put 'table_1','row2','column_family2:c21','r1v21'
put 'table_1','row3','column_family1:d11','r1v11'
put 'table_1','row2','column_family1:d12','r1v12'
put 'table_1','row2','column_family2:d21','r1v21'
In HBase, which value will the "count 'table_1'" command return?
Question: 48 A TRANSFORM
Which Hive command is used to query a table? B SELECT
C GET
D EXPAND
Question: 50 A It allows Hadoop to apply the schema-on-ingest model to unstructured Big Data.
Which key benefit does NoSQL provide? B It allows an RDBMS to maintain referential integrity on a Hadoop data set.
C It allows customers to leverage high-end server platforms to manage Big Data.
D It can cost-effectively manage data sets too large for traditional RDBMS.
Question: 51 A Hadoop data is highly structured.
What makes SQL access to Hadoop data difficult? B Data is in many formats.
C Data is located on a distributed file system.
D Hadoop requires pre-defined schema.
Question: 54 A HBase
Which Hadoop-related technology supports analysis of large datasets stored in HDFS using an B Pig
SQL-like query language? C Jaql
D Hive
Question: 56 A Eclipse
Which component of Apache Hadoop is used for scheduling and running workflow jobs? B Oozie
C Jaql
D Task Launcher
Question: 57 A validater
What is one of the main components of Watson Explorer (InfoSphere Data Explorer)? B replicater
C crawler
D compressor
Question: 60 A Avro
Which IBM tool enables BigInsights users to develop, test and publish BigInsights applications? B HBase
C Eclipse
D BigInsights Applications Catalog
Question: 5 A enabling customers to efficiently index and access large volumes of data
Which description identifies the real value of Big Data and Analytics? B gaining new insight through the capabilities of the world's interconnected intelligence
C providing solutions to help customers manage and grow large database systems
D using modern technology to efficiently store the massive amounts of data generated
by social networks
Question: 4 AR
You can import preinstalled libraries if you are using which languages? (Select two.) B Python
(Please select ALL that apply) C Bash
D Rexx
E Scala
Question: 5 A Viewers
Who can control a Watson Studio project assets? B Editors
C Collaborators
D Tenants
Question: 6 A ZOOKEEPER_APP
Which environmental variable needs to be set to properly start ZooKeeper? B ZOOKEEPER_DATA
C ZOOKEEPER
D ZOOKEEPER_HOME
Question: 10 A get /
What ZK CLI command is used to list all the ZNodes at the top level of the ZooKeeper B create /
hierarchy, in the ZooKeeper command-line interface? C listquota /
D ls /
Question: 11 A JSON
What is the default data format Sqoop parses to export data to a database? B CSV
C XML
D SQL
Question: 13 A Indexed databases containing very large volumes of historical data used for compliance
Which statement describes "Big Data" as it is used in the modern business world? reporting purposes.
B Non-conventional methods used by businesses and organizations to capture, manage,
process, and make sense of a large volume of data.
C Structured data stores containing very large data sets such as video and audio streams.
D The summarization of large indexed data stores to provide information about potential
problems or opportunities.
Question: 17 A Python
Which two Spark libraries provide a native shell? (Choose two.) B Scala
(Please select ALL that apply) C C#
D Java
E C++
Question: 18 A IP address
What is an authentication mechanism in Hortonworks Data Platform? B Preshared keys
C Kerberos
D Hardware token
Question: 19 A Manage, secure, and govern data stored across all storage environments.
What is Hortonworks DataPlane Services (DPS) used for? B Transform data from CSV format into native HDFS data.
C Perform backup and recovery of data in the Hadoop ecosystem.
D Keep data up to date by periodically refreshing stale data.
Question: 21 A Scala
What is the native programming language for Spark? B C++
C Java
D Python
Question: 22 A YARN
Which Hortonworks Data Platform (HDP) component provides a common web user interface B HDFS
for applications running on a Hadoop cluster? C Ambari
D MapReduce
Question: 23 A Transformations
Which Spark RDD operation returns values after performing the evaluations? B Actions
C Caching
D Evaluations
Question: 26 A Distribution
Which Spark RDD operation creates a directed acyclic graph through lazy evaluations? B GraphX
C Transformations
D Actions
Question: 29 A YARN
Which element of Hadoop is responsible for spreading data across the cluster? B MapReduce
C AMS
D HDFS
Question: 32 A SlaveNode
Under the MapReduce v1 architecture, which element of the system manages the map and B JobTracker
reduce functions? C MasterNode
D StorageNode
E TaskTracker
Question: 33 A NameNode
Which component of the HDFS architecture manages storage attached to the nodes? B StorageNode
C DataNode
D MasterNode
Question: 34 A Volume
Which of the "Five V's" of Big Data describes the real purpose of deriving business insight from B Value
Big Data? C Variety
D Velocity
E Veracity
Question: 37 A CSV
Which two of the following are row-based data encoding formats? (Choose two.) B Avro
(Please select ALL that apply) C ETL
D Parquet
E RC and ORC
Question: 38 A The data is spread out and replicated across the cluster.
Which statement describes the action performed by HDFS when data is written to the Hadoop B The data is replicated to at least 5 different computers.
cluster? C The MasterNodes write the data to disk.
D The FsImage is updated with the new data map.
Question: 39 A MasterNode
Under the MapReduce v1 architecture, which element of MapReduce controls job execution B JobTracker
on multiple slaves? C SlaveNode
D TaskTracker
E StorageNode
Question: 40 A MLlib
Which component of the Spark Unified Stack provides processing of data arriving at the B Spark SQL
system in real-time? C Spark Streaming
D Spark Live
Question: 41 A DB2ATSENABLE
Which two registries are used for compiler and runtime performance improvements in B DB2F ODC
support of the Big SQL environment? (Choose two) C DB2COMPOPT
(Please select ALL that apply) D DB2RSHTIMEOUT
E DB2SORTAFTER_TQ
Question: 42 A bigsql_bar.py
Which script is used to backup and restore the Big SQL database? B db2.sh
C bigsql.sh
D load.py
Question: 43 A STRING
You need to create a table that is not managed by the Big SQL database manager. Which B BOOLEAN
keyword would you use to create the table? C SMALLINT
D EXTERNAL
Question: 44 A Oracle
Which two of the following data sources are currently supported by Big SQL? (Choose two) B PostgreSQL
(Please select ALL that apply) C Teradata
D MySQL
E MariaDB
Question: 45 A 7055
Which port is the default for the Big SQL Scheduler to get administrator commands? B 7054
C 7052
D 7053
Question: 46 A Hortonworks
Which tool should you use to enable Kerberos security? B Ambari
C Apache Ranger
D Hive
Question: 47 A Scheduler
Which two options can be used to start and stop Big SQL? (Choose two) B DSM Console
(Please select ALL that apply) C Command line
D Java SQL shell
Question: 48 A CREATE
Which command is used to populate a Big SQL table? B QUERY
C SET
D LOAD
Question: 49 A Impersonation
Which feature allows the bigsql user to securely access data in Hadoop on behalf of another B Privilege
user? C Rights
D Schema
Question: 51 A Db2
The Big SQL head node has a set of processes running. What is the name of the service ID B hdfs
running these processes? C user1
D bigsql
Question: 52 A Parquet
Which file format contains human-readable data where the column values are separated by a B ORC
comma? C Delimited
D Sequence
Question: 54 A Jupyter
Which type of foundation does Big SQL build on? B Apache HIVE
C RStudio
D MapReduce
Question: 55 A SSL
You need to monitor and manage data security across a Hadoop platform. Which tool would B HDFS
you use? C Hive
D Apache Ranger
Question: 56 A """
What can be used to surround a multi-line string in a Python code cell by appearing before and B "
after the multi-line string? C
Question: 58 A pull
What Python statement is used to add a library to the current code cell? B import
C load
D using
Question: 59 A NLTK
What Python package has support for linear algebra, optimization, mathematical integration, B Pandas
and statistics? C NumPy
D SciPy
Which Big SQL datatype should be avoided because it causes significant performance A. CHAR
degradation? * B. STRING
C. UNION
D. VARCHAR
You need to create multiple Big SQL tables with columns defined as CHAR. What needs to be * A. SET SYSHADOOP.COMPATIBILITY_MODE=1
set to enable CHAR columns? B. CREATE TABLE chartab
C. SET HADOOPCOMPATIBLITY_MODE=True
D. ALTER CHAR DATATYPE TO byte
What is the primary core abstraction of Apache Spark? A. GraphX
* B. Resilient Distributed Dataset (RDD)
C. Spark Streaming
D. Directed Acyclic Graph (DAG)
Which Text Analytics runtime component is used for languages such as Spanish and English by A. Named entity extractors
breaking a stream of text into phrases or words? B. Other extractors
* C. Standard tokenizer
D. Multilingual tokenizer
Question : Which two commands are used to load data into an existing Big SQL table from * A. Load
HDFS? (Choose two.) B. Table
(Please select ALL that apply) C. Select
* D. Insert
E. Create
Which command should you use to set the default schema in a Big SQL table and also create A. default
the schema if it does not exist? B. create
C. format
* D. use
What is missing from the following statement when querying a remote table? CREATE A. TABLE
_______ FOR remotetable1 … B. VIEW
* C. NICKNAME
D. INDEX
What are two major business advantages of using BigSheets? (Choose two.) * A. built-in data readers for multiple formats
(Please select ALL that apply) * B. spreadsheet-like querying and discovery interface
C. command-line-driven data analysis
D. feature rich programming environment
Where should you build extractors in the Information Extraction Web Tool? A. Documents
* B. Canvas
C. Property pane
D. Regular expression
In which text analytics phase are extractors developed and tested? A. Analysis
* B. Rule Development
C. Production
D. Performance Tuning
Which action is performed during the Reduce step of a MapReduce v1 processing cycle? * A. Intermediate results are aggregated.
B. The TaskTracker distributes the job to the cluster.
C. The initial problem is broken into pieces.
D. The JobTrackers execute their assigned tasks.
What are two benefits of using the IBM Big SQL processing engine? (Choose two.) A. Core functionality is written in Java for portability.
(Please select ALL that apply) B. The system is built to be started and stopped on demand.
* C. Various data storage formats are supported.
* D. It provides access to Hadoop data using SQL.
An organization is developing a proof-of-concept for a big data system. Which phase of the big * A. Engage
data adoption cycle is the company currently in? B. Execute
C. Explore
D. Educate
Which feature in a Big SQL federation is a library to access a particular type of data source? A. server
B. table
C. view
* D. wrapper
What is a feature of Apache ZooKeeper? A. generates shell programs for running components of Hadoop
B. monitors log files of cluster members
* C. maintains configuration information for a cluster
D. performance tunes a running cluster
What does the bucketing feature of Hive do? * A. sub-partitioning/grouping of data by hash within partitions
B. allows data to be stored in arrays
C. splits data into collections based on ranges
D. distributes the data dynamically for faster processing
What advantage does the Text Analytics Web UI give you? * A. It generates the AQL syntax for you.
B. It allows only single data types.
C. It allows only one type of file extension.
D. It teaches you how to write AQL syntax.
Which AQL candidate rule combines tuples from two views with the same schema? A. Blocks
B. Select
* C. Union
D. Sequence
Data collected within your organization has a short period of time when it is relevant. Which * A. Velocity
characteristic of a big data system does this represent? B. Validation
C. Variety
D. Volume
Assuming the same data is stored in multiple data formats, which format will provide faster * A. Parquet
query execution and require the least amount of IO operations to process? B. XML
C. flat file
D. JSON
Which feature of Text Analytics allows you to rollback your extractors when necessary? * A. Snapshots
B. Standard tokenizer
C. Scalar functions
D. Multilingual tokenizer
What defines a relation in an AQL extractor? * A. a view
B. a row
C. a schema
D. a column
Which command must be run after compiling a Java program so it can run on the Hadoop * A. jar cf name.jar *.class
cluster? B. hadoop classpath
C. jar tf name.jar
D. rm hadoop.class
What type of NoSQL datastore does HBase fall into? A. document
B. key-value
* C. column
D. graph
Which data inconsistency may appear while using ZooKeeper? A. excessively stale data views
* B. simultaneously inconsistent cross-client views
C. unreliable client updates across the cluster
D. out-of-order updates across clients
What is required to run an EXPLAIN statement in Big SQL? A. the explainable-sql-statement clause
B. the SYSPROC.SYSINSTALLOBJECT procedure
* C. proper authorization
D. a rule
Which command must be run first to become the HDFS user? * A. su - hdfs
B. hadoop fs
C. pwd
D. hdfs
Which Big SQL file format is human readable and supported by most tools, but is the least * A. Delimited
efficient file format? B. Parquet
C. Sequence
D. Avro
What is the default install location for the IBM Open Data Platform on Linux? A. /opt/ibm/iop
B. /var/iop
C. /usr/local/iop
* D. /usr/iop
You need to populate a Big SQL table to test an operation. Which INSERT statement is A. INSERT INTO ... SELECT FROM ...
recommended for testing, only because it does not support parallel reads or writes? * B. INSERT INTO ... VALUES (...)
C. INSERT INTO ... SELECT …
D. INSERT INTO ... SELECT ... WHERE …
Which command is used to launch an interactive Apache Spark shell? A. scala --spark
B. hadoop spark
C. spark
* D. spark-shell
Which data inconsistency may appear while using ZooKeeper? A. excessively stale data views
B. simultaneously inconsistent cross-client views
C. out-of-order updates across clients
D. unreliable client updates across the cluster
Which statement will create a table with parquet files? A. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORED AS PARQUET;
B. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) SAVE AS PARQUETFILE;
* C. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORED AS PARQUETFILE;.
D. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) SAVE AS PARQUET;
What are extractors transformed into when they are executed? A. Candidate generation statements
B. BigSheets function statements
* C. Annotated Query Language (AQL) statements
D. Online Analytical Programming (OLAP) statements
You need to set up the command-line interface JSqsh to connect to a bigsql database. What is A. Run the $JSQSH_HOME/bin/JSQSH script.
the recommended method to set up the connection? B. Run the JSqsh driver wizard.
C. Modify database parameters in the .jsqsh/connections.xml file.
* D. Run the JSqsh connection wizard.
How will the following column mapping command be encoded? cf_data:full_names mapped A. Hex
by (last_name, First_name) separator ',' B. Character
C. Binary
* D. String
Which underlying data representation and access method does Big SQL use? * A. Hive
B. TINYINT
C. MAP
D. SMALLINT
What does the MLlib component of Apache Spark support? * A. scalable machine learning
B. graph computation
C. SQL and HiveQL
D. stream processing
When creating a new table in Big SQL, what additional keyword is used in the CREATE TABLE A. dfs
statement to create the table in HDFS? * B. hadoop
C. replicated
D. cloud
What does the federation feature of Big SQL allow? A. tuning server hardware performance
B. importing data into HDFS
C. rewriting statements for better execution performance
* D. querying multiple data sources in one statement
A Hadoop file listing is performed and one of the output lines is: -rw-r--r-- 5 biadmin biadmin A. permissions
871233 2015-09-12 09:33 data.txt What does the 5 in the output represent? * B. replication factor
C. login id of the file owner
D. data size
What privilege is required to execute an EXPLAIN statement with INSERT privileges in Big SQL? A. SYSMON authority
B. SECADM authority
* C. SQLADM authority
D. SYSCTRL authority
How can you fix duplicate results generated by an extractor from the same text because the A. edit output with overlapping matches
text matches more than one dictionary entry? B. remove union statement
* C. remove with a consolidation rule
D. edit properties of the sequence
Which statement best describes Spark? A. An instance of a federated database.
* B. A computing engine for a large-scale data set.
C. A logical view on top of Hadoop data.
D. An open source database query tool.
Which two tasks can an Apache Ambari admin do that a regular Apache Ambari user cannot A. browse job information
do? (Choose two.) B. view service status
(Please select ALL that apply) * C. modify configurations
* D. run service checks
What is the default replication factor for HDFS on a production cluster? A. 5
B. 1
C. 10
* D. 3
In the ZooKeeper environment, what does atomicity guarantee? * A. Updates completely succeed or fail.
B. Updates are applied in the order created.
C. If an update succeeds, then it persists.
D. Every client sees the same view.
Which basic feature rule of AQL helps find an exact match to a single word or phrase? A. Dictionary
B. Part of Speech
* C. Literals
D. Splits
You have a very large Hadoop file system. You need to work on the data without migrating the A. MapReduce
data out or changing the data format. Which IBM tool should you use? * B. Big SQL
C. Data Server Manager
D. Pig
Which core component of the Hadoop framework is highly scalable and a common tool? A. Sqoop
B. Pig
* C. MapReduce
D. Hive
How can you reduce the memory usage of the ANALYZE command in Big SQL? A. Run everything in one batch.
B. Turn on distribution statistics.
* C. Run the command separately on different batches of columns.
D. Include all the columns in the batch.
What should you do in Text Analytics to fix an extractor that produces unwanted results? A. Re-create the extractors.
B. Remove results with a consolidation rule.
* C. Create a new filter.
D. Edit the properties of the sequence.
QUESTIONS REPONSES CORRECTION
A The list of deployments
B A list of your saved bookmarks
C The email address of the collaborator
You need to add a collaborator to your project. What do you need? D Your project ID c
A URL
B Scala
C File
Before you create a Jupyter notebook in Watson Studio, which two items are necessary? D Spark Instance
(Please select the TWO that apply) E Project de
A Database
B Wrapper
C Object Storage
Where does the unstructured data of a project reside in Watson Studio? D Tables c
A Watson Studio Desktop
B Watson Studio Cloud
C Watson Studio Business
Which Watson Studio offering used to be available through something known as IBM Bluemix? D Watson Studio Local b
A Data Assets
B Projects
C Collaborators
What is the architecture of Watson Studio centered on? D Analytic Assets b
A INSERT
B GRANT
C REVOKE
Which two commands would you use to give or remove certain privileges to/from a user? D LOAD
(Please select the TWO that apply) E SELECT bc
A4
B2
C1
How many Big SQL management node do you need at minimum? D3 c
A /apps/hive/warehouse/data
B /apps/hive/warehouse/bigsql
C /apps/hive/warehouse/
What is the default directory in HDFS where tables are stored? D /apps/hive/warehouse/schema c
A ./java mybigdata
B ./jsqsh mybigdata
C ./java tables
Using the Java SQL Shell, which command will connect to a database called mybigdata? D ./jsqsh go mybigdata b
A 777
B 755
C 700
Which directory permissions need to be set to allow all users to create their own schema? D 666 a
A Files
B Schemas
C Hives
What are Big SQL database tables organized into? D Directories b
A hdfs dfs -chmod 770 /hive/warehouse
B hdfs dfs -chmod 755 /hive/warehouse
C hdfs dfs -chmod 700 /hive/warehouse
You have a distributed file system (DFS) and need to set permissions on the the /hive/warehouseDdirectory
hdfs dfs to-chmod
allow 666
access
/hive/warehouse
to ONLY the bigsql user. Which command would you run? c
A It grants or revokes certain user privileges.
B It grants or revokes certain directory privileges.
C It limits the rows or columns returned based on certain criteria.
Which definition best describes RCAC? D It limits access by using views and stored procedures. c
A A data type of a column describing its value.
B The defined format and rules around a delimited file.
C A container for any record format.
Which statement best describes a Big SQL database table? D A directory with zero or more data files. d
A Big SQL can exploit advanced features
B Data interchange outside Hadoop
C Supported by multiple I/O engines
What is an advantage of the ORC file format? D Efficient compression d
A GRANT
B umask
C Kerberos
You need to determine the permission setting for a new schema directory. Which tool would youDuse?HDFS b
A Scheduler
B DSM
C Jupyter
Which tool would you use to create a connection to your Big SQL database? D Ambari b
A bigsql.alltables.io.doAs
B DB2COMPOPT
C DB2_ATS_ENABLE
You need to enable impersonation. Which two properties in the bigsql-conf.xml file need to be marked true?
D $BIGSQL_HOME/conf
(Please select the TWO that apply) E bigsql.impersonation.create.table.grant.public ae
A CREATE AS parquet
B STORED AS parquetfile
C STORED AS parquet
You are creating a new table and need to format it with parquet. Which partial SQL statement would
D CREATE
createAS
the
parquetfile
table in parquet format? b
A TRANSLATE FUNCTION
B CREATE FUNCTION
C ALTER MODULE ADD FUNCTION
Which command creates a user-defined schema function? D ALTER MODULE PUBLISH FUNCTION b
A YARN
B Spark
C Pig
Which Apache Hadoop application provides an SQL-like interface to allow abstraction of data on semi-structured
D Hive data in a Hadoop datastore? d
A A messaging system for real-time data pipelines.
B A wizard for installing Hadoop services on host servers.
C Moves information to/from structured databases.
Which description characterizes a function provided by Apache Ambari? D Moves large amounts of streaming event data. b
A Writes to a leader server will always succeed.
B All servers keep a copy of the shared data in memory.
C There can be more than one leader server at a time.
Which statement accurately describes how ZooKeeper works? D Clients connect to multiple servers at the same time. b
A MemcacheD
B CouchDB
C Riak
Which NoSQL datastore type began as an implementation of Google's BigTable that can store anyDtypeHbase of data and scale to many petabytes? d
A It is a powerful platform for managing large volumes of structured data.
B It is designed specifically for IBM Big Data customers.
C It is a Hadoop distribution based on a centralized architecture with YARN at its core.
Which statement is true about Hortonworks Data Platform (HDP)? D It is engineered and developed by IBM's BigInsights team. c
A Pig
B Hive
C Python
What is the name of the Hadoop-related Apache project that utilizes an in-memory architecture to
D Spark
run applications faster than MapReduce? d
A It runs on Hadoop clusters with RAM drives configured on each DataNode.
B It supports HDFS, MS-SQL, and Oracle.
C It is much faster than MapReduce for complex applications on disk.
Which statement about Apache Spark is true? D It features APIs for C++ and .NET. c
A It determines the size and distribution of data split in the Map phase.
B It aggregates all input data before it goes through the Map phase.
C It reduces the amount of data that is sent to the Reducer task nodes.
Which statement is true about the Combiner phase of the MapReduce architecture? D It is performed after the Reducer phase to produce the final output. c
A MapReduce v1 APIs cannot be used with YARN.
B MapReduce v1 APIs provide a flexible execution environment to run MapReduce.
C MapReduce v1 APIs are implemented by applications which are largely independent of the execution environment.
Which statement is true about MapReduce v1 APIs? D MapReduce v1 APIs define how MapReduce jobs are executed. c
A Hive
B Cloudbreak
C Big SQL
D MapReduce
Hadoop 2 consists of which three open-source sub-projects maintained by the Apache Software Foundation?
E HDFS
(Please select the THREE that apply) F YARN def
A Authorization Provider
B Ambari Alert Framework
C Postgres RDBMS
Which component of the Apache Ambari architecture integrates with an organization's LDAP or Active
D RESTDirectory
API service? a
A Authenticating and auditing user access.
B Loading bulk data into an Hadoop cluster.
What are two services provided by ZooKeeper? C Maintaining configuration information.
(Please select the TWO that apply) D Providing distributed synchronization. cd
A Sesame
B Neo4j
C MongoDB
What is an example of a Key-value type of NoSQL datastore? D REDIS d
A MLlib
B RDD
C Mesos
Which Spark Core function provides the main element of Spark API? D YARN b
A org.apache.mr
B org.apache.hadoop.mr
C org.apache.hadoop.mapred
Which is the java class prefix for the MapReduce v1 APIs? D org.apache.mapreduce c
A SSD
B JBOD
C RAID
Which hardware feature on an Hadoop datanode is recommended for cost efficient performance? D LVM b
A ApplicationMaster
B JobMaster
C TaskManager
Under the YARN/MRv2 framework, the JobTracker functions are split into which two daemons? D ScheduleManager
(Please select the TWO that apply) E ResourceManager ae
A The number of rows to commit per transaction.
B The number of rows to send to each mapper.
C The table name to export from the database.
What does the split-by parameter tell Sqoop? D The column to use as the primary key. d
A Ambari
B Google File System
C HBase
Hadoop uses which two Google technologies as its foundation? D YARN
(Please select the TWO that apply) E MapReduce be
A Requires extremely rapid processing.
B Data is processed in batch.
Which two are attributes of streaming data? C Simple, numeric data.
(Please select the TWO that apply) D Sent in high volume. ad
Which statement accurately describes how ZooKeeper works? All servers keep a copy of the shared data in memory.
SequenceFiles
A API and perimeter security.
B Management of Kerberos in the cluster.
What two security functions does Apache Knox provide? C Proxying services.
(Please select the TWO that apply) D Database field access auditing. ac
A REDIS
B HBase
C Cassandra
What is an example of a NoSQL datastore of the "Document Store" type? D MongoDB d
A Hive
B Sqoop
C Pig
Which Apache Hadoop application provides a high-level programming language for data transformation
D Zookeeper
on unstructured data? c
A Big Data
B Big Match
C Big Replicate
D Big SQL
What are three IBM value-add components to the Hortonworks Data Platform (HDP)? E Big YARN
(Please select the THREE that apply) F Big Index bcd
A Accumulo
B HBase
C Oozie
Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2, MySQL, or otherD Sqoop
databases? d
A Data Protection
B Speed
C Resiliency
Which three are a part of the Five Pillars of Security? D Audit
(Please select the THREE that apply) E Administration ade
A MLlib
B Mesos
C Spark SQL
Which component of the Spark Unified Stack allows developers to intermix structured database queries
D Java with Spark's programming language? c
A Hadoop YARN
B Apache Mesos
C Nomad
Apache Spark can run on which two of the following cluster managers? D Linux Cluster Manager
(Please select the TWO that apply) E oneSIS ab
A Suitable for transaction processing.
B Libraries that support SQL queries.
C APIs for Scala, Python, C++, and .NET.
Which feature makes Apache Spark much easier to use than MapReduce? D Applications run in-memory. b
A Run Sqoop using the vi editor.
B Use the --import-command line argument.
What are two ways the command-line parameters for a Sqoop invocation can be simplified? C Include the --options-file command line argument.
(Please select the TWO that apply) D Place the commands in a file. cd
A NodeChildrenChanged
B NodeDeleted
Which two are valid watches for ZNodes in ZooKeeper? C NodeExpired
(Please select the TWO that apply) D NodeRefreshed ab
A ResourceManager
B JobMaster
C ScheduleManager
Under the YARN/MRv2 framework, which daemon arbitrates the execution of tasks among all theDapplications
ApplicationMaster
in the system? a
A NiFi
B Hortonworks Data Flow
C Druid
What is the preferred replacement for Flume? D Storm b
A Use the -mapper 1 parameter.
B Use the --limit mapper=1 parameter.
C Use the -m 1 parameter.
How can a Sqoop invocation be constrained to only run one mapper? D Use the --single parameter. c
A Reduce
B Map
C Combiner
Under the MapReduce v1 programming model, which optional phase is executed simultaneouslyDwith Splitthe Shuffle phase? c
A Acquisition
B Manipulation
C Exploration
What is the first step in a data science pipeline? D Analytics a
A Holding the output of a computation.
B Configuring data connections.
C Documenting the computational process.
What is a markdown cell used for in a data science notebook? D Writing code to transform data. c
A Common desktop app.
B Database interface.
C Linux SSH session.
What does the user interface for Jupyter look like to a user? D App in web browser. d
A To display a simple bar chart of data on the screen.
B To collect video for use in streaming data applications.
C To perform certain data transformation quickly.
Why might a data scientist need a particular kind of GPU (graphics processing unit)? D To input commands to a data science notebook. c
A %dirmagic
B %lsmagic
C %list-magic
What command is used to list the "magic" commands in Jupyter? D %list-all-magic b
A The list of deployments
B A list of your saved bookmarks
You need to add a collaborator to your project. What do you need? C The email address of the collaborator
Your answer D Your project ID c
A record locking
B batch processing
C machine learning
D transaction processing
Apache Spark provides a single, unifying platform for which three of the following types of operations?
E graph operations
(Please select the THREE that apply) F ACID transactions bce
A Scala
B Python
C Java
D .NET
Which three programming languages are directly supported by Apache Spark? E C#
(Please select the THREE that apply) F C++ a / c/b
A Map -> Split -> Reduce -> Combine
Under the MapReduce v1 programming model, which shows the proper order of the full set of MapReduce phases?
D Split -> Map -> Combine -> Reduce b
A ResourceManager
B JobMaster
C ApplicationMaster
Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the NodeManager(s) D TaskManager
to execute and monitor tasks? c
A Collector
B Source
C Stream
What is the final agent in a Flume chain named? D Agent a
A Availability
B Authorization
What are two security features Apache Ranger provides? C Authentication
(Please select the TWO that apply) D Auditing bd
A Worker nodes store results on their own local file systems.
B Data is aggregated by worker nodes.
C Worker nodes process pieces in parallel.
Under the MapReduce v1 programming model, what happens in a "Reduce" step? D Input is split into pieces. b
A MapReduce
B Spark
C HBase
Which Apache Hadoop component can potentially replace an RDBMS as a large Hadoop datastoreD and Ambari
is particularly good for "sparse data"? c
A RCFile
B SequenceFiles
C Flat
Which data encoding format supports exact storage of all data in binary representations such as VARBINARY
D Parquet columns? b
A One time export and import of a database.
B An application evaluating sensor data in real-time.
C A system that stores many records in a database.
Which statement describes an example of an application using streaming data? D A web application that supports 10,000 users. b
A Scalability
B Resource utilization
C TaskTrackers can be a bottleneck to MapReduce jobs
What are two primary limitations of MapReduce v1? D Number of TaskTrackers limited to 1,000
(Please select the TWO that apply E Workloads limited to MapReduce ab
A RAM
B network
C CPU
Which component of an Hadoop system is the primary cause of poor performance? D disk latency d
A Partial failure of the nodes during execution.
B Finding a particular node within the cluster.
What are two common issues in distributed systems? C Reduced performance when compared to a single server.
(Please select the TWO that apply) D Distributed systems are harder to scale up. ab
A Ambari Metrics System
B Ambari Alert Framework
C Ambari Server
Which component of the Apache Ambari architecture provides statistical data to the dashboard about
D Ambari
the performance
Wizard of a Hadoop cluster? a
A Impersonation
B Grant/Revoke privileges
C Fluid query
Which Big SQL feature allows users to join a Hadoop data set to data in external databases? D Integration c
A Data source
B User mapping
C Nickname
When connecting to an external database in a federation, you need to use the correct database driver
D Wrapper
and protocol. What is this federation component called in Big SQL? d
A Parsing and loading data into a notebook.
B Autoconfiguring data connections using a registry.
C Extending the core language with shortcuts.
What is a "magic" command used for in Jupyter? D Running common statistical analyses. c
A RAID-0
B Online Transactional Processing
C Parallel Processing
Which computing technology provides Hadoop's high performance D Online Analytical Processing c
A ScheduleManager
B ResourceManager
C ApplicationMaster
Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are components of which D TaskManager
daemon? b
A large number of small data files
B solid state disks
C immediate failover of failed disks
D high-speed networking between nodes
which two factors in a Hadoop cluster increase performance most significantly? E data redundancy on management nodes
(Please select the TWO that apply) F parallel reading of large data files ok d/F
A MapReduce
B YARN
C HDFS
Which component of the Hortonworks Data Platform (HDP) is the architectural center of HadoopDand Hbase
provides resource management and a central platform for Hadoop applications? b
A Ambari Alert Framework
B Ambari Wizard
C Ambari Metrics System
If a Hadoop node goes down, which Ambari component will notify the Administrator? D REST API a
A Code
B Kernel
C Output
Which type of cell can be used to document and comment on a process in a Jupyter notebook? D Markdown d
A Notebooks can be used by multiple people at the same time.
B Users must authenticate before using a notebook.
C Notebooks can be connected to big data engines such as Spark.
Which is an advantage that Zeppelin holds over Jupyter? D Zeppelin is able to use the R language. a
A Jaql
B Fault tolerance through HDFS replication
C Adaptive MapReduce
Which capability does IBM BigInsights add to enrich Hadoop? D Parallel computing on commodity servers c
A value
B volume
C verifiability
What is one of the four characteristics of Big Data? D volatility b
A Hadoop Common
B Hadoop HBase
C MapReduce
Which Hadoop-related project provides common utilities and libraries that support other HadoopDsubBigTable
projects? a
A Federated Discovery and Navigation
B Text Analysis
C Stream Computing
Which type of Big Data analysis involves the processing of extremely large volumes of constantly moving
D MapReduce
data that is impractical to store? c
A 64-bit architecture
B disk latency
C MIPS
Which primary computing bottleneck of modern computers is addressed by Hadoop? D limited disk capacity b
A stream computing
B data warehousing
C analytics
Which Big Data function improves the decision-making capabilities of organizations by enabling the D distributed
organizations
fileto
system
interpret and evaluate structured and unstructured data in search of valuable
c business information?
A HBase
B Apache
C Jaql
What is one of the two technologies that Hadoop uses as its foundation? D MapReduce d
A a high throughput, shared file system
B high availability of the NameNode
C data access performed by an RDBMS
What key feature does HDFS 2.0 provide that HDFS does not? D random access to data in the cluster b
A LOAD
B JOIN
C TOP
What are two of the core operators that can be used in a Jaql query? (Select two.) D SELECT bc
A SQL-like
B compiled language
C object oriented
Which type of language is Pig? D data flow d
A hdfs.conf
B hadoop-configuration.xml
C hadoop.conf
If you need to change the replication factor or increase the default storage block size, which file do
D you
hdfs-site.xml
need to modify? d
A The file(s) must be stored on the local file system where the map reduce job was developed.
B The file(s) must be stored in HDFS or GPFS.
C The file(s) must be stored on the JobTracker.
To run a MapReduce job on the BigInsights cluster, which statement about the input file(s) must be D No
true?
matter where the input files are before, they will be automatically copied to where the
b job runs.
A operating system independence
B posix compliance
C no single point of failure
What is a characteristic of IBM GPFS that distinguishes it from other distributed file systems? D blocks that are stored on different nodes b
A Pig is used for creating MapReduce programs.
B Pig has a shelf interface for executing commands.
C Pig is not designed for random reads/writes or low-latency queries.
Which statement represents a difference between Pig and Hive? D Pig uses Load, Transform, and Store. d
A hdfs -dir mydata
B hadoop fs -mkdir mydata
C hadoop fs -dir mydata
Which command helps you create a directory called mydata on HDFS? D mkdir mydata b
A Reduce
B Shuffle
C Combine
In which step of a MapReduce job is the output stored on the local disk? D Map d
A Worker nodes process individual data segments in parallel.
B Worker nodes store results in the local file system.
C Input data is split into smaller pieces.
Under the MapReduce programming model, which task is performed by the Reduce step? D Data is aggregated by worker nodes. d
A Reducer
B JobScheduler
C TaskTracker
Which element of the MapReduce architecture runs map and reduce jobs? D JobTracker c
A spread data across a cluster of computers
B provide structure to unstructured or semi-structured data
C increase storage capacity through advanced compression algorithms
What is one of the two driving principles of MapReduce? D provide a platform for highly efficient transaction processing a
A Cluster
B Distributed
C Remote
D Debugging
When running a MapReduce job from Eclipse, which BigInsights execution models are available? (Select
E Localtwo.) ae
A The number of reducers is always equal to the number of mappers.
B The number of mappers and reducers can be configured by modifying the mapred-site.xml file.
C The number of mappers and reducers is decided by the NameNode.
Which statement is true regarding the number of mappers and reducers configured in a cluster? D The number of mappers must be equal to the number of nodes in a cluster. b
A hadoop size
B hdfs -du
C hdfs fs size
Which command displays the sizes of files and directories contained in the given directory, or the Dlength
hadoop
of afsfile,
-duin case it is just a file? d
A three
B two
C one
Following the most common HDFS replica placement policy, when the replication factor is three, how D none
many replicas will be located on the local rack? c
A copies Job Resources to the shared file system
B coordinates the job execution
C executes the map and reduce functions
In the MapReduce processing model, what is the main function performed by the JobTracker? D assigns tasks to each cluster node b
A Both are data flow languages.
B Both require schema.
C Both use Jaql query language.
How are Pig and Jaql query languages similar? D Both are developed primarily by IBM. a
A to manage storage attached to nodes
B to coordinate MapReduce jobs
C to regulate client access to files
Under the HDFS architecture, what is one purpose of the NameNode? D to periodically report status to DataNode c
A hadoop fs list
B hdfs root
C hadoop fs -Is /
Which command should be used to list the contents of the root directory in HDFS? D hdfs list / c
A runs map and reduce tasks
B keeps the work physically close to the data
C reports status of DataNodes
What is one function of the JobTracker in MapReduce? D manages storage b
A built-in UDFs and indexing
B platform-specific SQL libraries
C an RDBMS such as DB2 or MySQL
In addition to the high-level language Pig Latin, what is a primary component of the Apache Pig platform?
D runtime environment d
A Data is accessed through MapReduce.
B Data is designed for random access read/write.
C Data can be processed over long distances without a decrease in performance.
Which statement is true about Hadoop Distributed File System (HDFS)? D Data can be created, updated and deleted. a
A managing customer information in a CRM database
B sentiment analytics from social media blogs
C product cost analysis from accounting systems
Which is a use-case for Text Analytics? D health insurance cost/benefit analysis from payroll data b
A BigSheets client
B Microsoft Excel
C Eclipse
Which tool is used to access BigSheets? D Web Browser d
A Hive metastore
B RDBMS
C MapReduce
Which technology does Big SQL utilize for access to shared catalogs? D HCatalog a
A display view <view_name>
B return view <view_name>
C output view <view_name>
Which statement will make an AQL view have content displayed? D export view <view_name> c
A Text Analytics
B Stream Computing
C Data Warehousing
You work for a hosting company that has data centers spread across North America. You are trying
D Temporal
to resolveAnalysis
a critical performance problem in which a large number of web servers are performing
a far below expectations. You know that the infor
A Thrift client
B Hive shell
C Hive SQL client
Which utility provides a command-line interface for Hive? D Hive Eclipse plugin b
A It is a data flow language for structured data based on Ansi-SQL.
B It is a distributed file system that replicates data across a cluster.
C It is an open source implementation of Google's BigTable.
What is an accurate description of HBase? D It is a database schema for unstructured Big Data. c
A BigSQL
B BigSheets
C Avro
Which Hadoop-related technology provides a user-friendly interface, which enables business users
D HBase
to easily analyze Big Data? b
A Text Analytics is the most common way to derive value from Big Data.
B MapReduce is unable to process unstructured text.
C Data warehouses contain potentially valuable information.
What drives the demand for Text Analytics? D Most of the world's data is in unstructured or semi-structured text. d
A An external table refers an existing location outside the warehouse directory.
B An external table refers to a table that cannot be dropped.
C An external table refers to the data from a remote database.
In Hive, what is the difference between an external table and a Hive managed table? D An external table refers to the data stored on the local file system. a
A It provides all the capabilities of an RDBMS plus the ability to manage Big Data.
B It is a database technology that does not use the traditional relational model.
C It is based on the highly scalable Google Compute Engine.
Which statement about NoSQL is true? D It is an IBM project designed to enable DB2 to manage Big Data. b
A "Copy" to create a new sheet with the other workbook data in the current workbook
B "Group" to bring together the two workbooks
C "Load" to create a new sheet with the other workbook data in the current workbook
If you need to JOIN data from two workbooks, which operation should be performed beforehand? D "Add" to add the other workbook data to the current workbook c
A to get detailed information about the table
B to view data in an Hbase table
C to report any inconsistencies in the database
What is the "scan" command used for in HBase? D to list all tables in Hbase c
A Eclipse with BigInsights tools for Eclipse plugin
B BigInsights Console with AQL plugin
C AQLBuilder
Which tool is used for developing a BigInsights Text Analytics extractor? D AQL command line a
A Pre-create regions by specifying splits in create table command and use the insert command to load data.
B Pre-create regions by specifying splits in create table command and bulk loading the data.
C Pre-create the column families when creating the table and use the put command to load the data.
What is the most efficient way to load 700MB of data when you create a new HBase table? D Pre-create the column families when creating the table and bulk loading the data. b
The following sequence of commands is executed:
create 'table_1','column_family1','column_family2'
put 'table_1','row1','column_family1:c11','r1v11'
put 'table_1','row2','column_family1:c12','r1v12'
put 'table_1','row2','column_family2:c21','r1v21'
put 'table_1','row3','column_family1:d11','r1v11' A4
put 'table_1','row2','column_family1:d12','r1v12' B3
put 'table_1','row2','column_family2:d21','r1v21' C6
In HBase, which value will the "count 'table_1'" command return? D2 b
A TRANSFORM
B SELECT
C GET
Which Hive command is used to query a table? D EXPAND b
A because SQL enhances query performance
B because the MapReduce Java API is sometimes difficult to use
C because data stored in a Hadoop cluster lends itself to structured SQL queries
Why develop SQL-based query languages that can access Hadoop data sets? D because the data stored in Hadoop is always structured b
A It allows Hadoop to apply the schema-on-ingest model to unstructured Big Data.
B It allows an RDBMS to maintain referential integrity on a Hadoop data set.
C It allows customers to leverage high-end server platforms to manage Big Data.
Which key benefit does NoSQL provide? D It can cost-effectively manage data sets too large for traditional RDBMS. d
A Hadoop data is highly structured.
B Data is in many formats.
C Data is located on a distributed file system.
What makes SQL access to Hadoop data difficult? D Hadoop requires pre-defined schema. b
A list tables
B describe tables
C show all
Which command can be used in Hive to list the tables available in a database/schema? D show tables d
A to count the number of columns of a table
B to count the number of column families of a table
C to count the number of rows in a table
In HBase, what is the "count" command used for? D to count the number of regions of a table c
A HBase
B Pig
C Jaql
Which Hadoop-related technology supports analysis of large datasets stored in HDFS using an SQL-like
D Hivequery language? d
A They need to be marked as "Shared."
B They need to be copied under the user home directory.
C They need to be deployed with proper privileges.
How can the applications published to BigInsights Web Console be made available for users to execute?
D They need to be linked with the master application. c
A Eclipse
B Oozie
C Jaql
Which component of Apache Hadoop is used for scheduling and running workflow jobs? D Task Launcher b
A validater
B replicater
C crawler
What is one of the main components of Watson Explorer (InfoSphere Data Explorer)? D compressor c
A analyze and react to data in motion before it is stored
B find and analyze historical stream data stored on disk
C analyze and summarize product sentiments posted to social media
IBM InfoSphere Streams is designed to accomplish which Big Data function? D execute ad-hoc queries against a Hadoop-based data warehouse c
A InfoSphere Information Server
B InfoSphere Streams
C InfoSphere BigInsights
Which IBM Big Data solution provides low-latency analytics for processing data-in-motion? D PureData for Analytics b
A Avro
B HBase
C Eclipse
Which IBM tool enables BigInsights users to develop, test and publish BigInsights applications? D BigInsights Applications Catalog c
A enabling customers to efficiently index and access large volumes of data
B gaining new insight through the capabilities of the world's interconnected intelligence
C providing solutions to help customers manage and grow large database systems
Which description identifies the real value of Big Data and Analytics? D using modern technology to efficiently store the massive amounts of data generated by social
b networks
Sqoop
Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2, MySQL, or other databases?
HBase
Which NoSQL datastore type began as an implementation of Google's BigTable that can store any type of data and scale to many petabytes?
A Parallel Processing
B Online Analytical Processing
C Online Transactional Processing
Which computing technology provides Hadoop's high performance? D RAID-0 a
Test Blanc
Question 1
You need to determine the permission setting for a
new schema directory. Which tool would you use?
Your answer
A. Kerberos
B. HDFS
C. umask
D. GRANT
Question 2
Using the Java SQL Shell, which command will
connect to a database called mybigdata?
Your answer
A. ./java tables
B. ./jsqsh mybigdata
C. ./jsqsh go mybigdata
D. ./java mybigdata
Test Blanc
Question 3
Which tool would you use to create a connection to
your Big SQL database?
Your answer
A. Ambari
B. Scheduler
C. DSM
D. Jupyter
D. Efficient compression
Question 6
You need to enable impersonation. Which two
properties in the bigsql-conf.xml file need to be
marked true?
Your answer
A. bigsql.alltables.io.doAs
B. bigsql.impersonation.create.table.grant.public
C. DB2_ATS_ENABLE
D. DB2COMPOPT
E. $BIGSQL_HOME/conf
Test Blanc
Question 7
What are Big SQL database tables organized into?
Your answer
A. Directories
B. Files
C. Hives
D. Schemas
Question 8
Which two commands would you use to give or
remove certain privileges to/from a user?
Your answer
A. GRANT
B. REVOKE
C. INSERT
D. SELECT
E. LOAD
Test Blanc
Question 9
You have a distributed file system (DFS) and need to
set permissions on the the /hive/warehouse
directory to allow access to ONLY the bigsql user.
Which command would you run?
Your answer
Question 11
Which Big SQL feature allows users to join a
Hadoop data set to data in external databases?
Your answer
A. Integration
B. Fluid query
C. Impersonation
D. Grant/Revoke privileges
Question 12
Which directory permissions need to be set to allow
all users to create their own schema?
Your answer
A. 700
B. 755
C. 666
D. 777
Test Blanc
Question 13
What is the default directory in HDFS where tables
are stored?
Your answer
A. /apps/hive/warehouse/
B. /apps/hive/warehouse/data
C. /apps/hive/warehouse/bigsql
D. /apps/hive/warehouse/schema
Question 14
How many Big SQL management node do you need
at minimum?
Your answer
A. 1
B. 2
C. 4
D. 3
Test Blanc
Question 15
When connecting to an external database in a
federation, you need to use the correct database
driver and protocol. What is this federation
component called in Big SQL?
Your answer
A. Wrapper
B. User mapping
C. Data source
D. Nickname
Question 16
Which Apache Hadoop application provides an SQL-
like interface to allow abstraction of data on semi-
structured data in a Hadoop datastore?
Your answer
A. Pig
B. Spark
C. Hive
D. YARN
Question 17
Apache Spark provides a single, unifying platform
for which three of the following types of operations?
Test Blanc
Your answer
A. record locking
B. machine learning
C. batch processing
D. ACID transactions
E. graph operations
F. transaction processing
Question 18
Under the YARN/MRv2 framework, which daemon is
tasked with negotiating with the NodeManager(s) to
execute and monitor tasks?
Your answer
A. JobMaster
B. ApplicationMaster
C. ResourceManager
D. TaskManager
Question 19
Test Blanc
Which component of the Hortonworks Data Platform
(HDP) is the architectural center of Hadoop and
provides resource management and a central
platform for Hadoop applications?
Your answer
A. MapReduce
B. HBase
C. YARN
D. HDFS
Question 20
What is the final agent in a Flume chain named?
Your answer
A. Agent
B. Stream
C. Collector
D. Source
Question 21
Which Apache Hadoop application provides a high-
level programming language for data transformation
on unstructured data?
Your answer
A. Sqoop
Test Blanc
Your answer
B. Zookeeper
C. Hive
D. Pig
Question 22
Which component of the Spark Unified Stack allows
developers to intermix structured database queries
with Spark's programming language?
Your answer
A. Java
B. MLlib
C. Mesos
D. Spark SQL
Test Blanc
Question 23
Under the MapReduce v1 programming model, what
happens in a "Reduce" step?
Your answer
Question 24
Which statement is true about Hortonworks Data
Platform (HDP)?
Your answer
A. Nomad
B. Hadoop YARN
C. oneSIS
D. Apache Mesos
Question 26
Which statement is true about the Combiner phase
of the MapReduce architecture?
Your answer
A. It determines the size and distribution of data split in the Map phase.
B. It reduces the amount of data that is sent to the Reducer task nodes.
D. It aggregates all input data before it goes through the Map phase.
Question 27
Which three are a part of the Five Pillars of
Security?
Your answer
A. Administration
B. Audit
Test Blanc
Your answer
C. Speed
D. Data Protection
E. Resiliency
Question 28
Which feature makes Apache Spark much easier to
use than MapReduce?
Your answer
A. ResourceManager
B. TaskManager
C. JobMaster
D. ApplicationMaster
Test Blanc
Your answer
E. ScheduleManager
A. ResourceManager
B. TaskManager
C. JobMaster
D. ApplicationMaster
E. ScheduleManager
Question 30
Which computing technology provides Hadoop's
high performance?
Your answer
A. RAID-0
B. Parallel Processing
Question 31
What are two security features Apache Ranger
provides?
Test Blanc
Your answer
A. Availability
B. Authorization
C. Authentication
D. Auditing
Question 32
Which two are attributes of streaming data?
Your answer
Question 33
Which statement describes an example of an
application using streaming data?
Your answer
Question 34
Which statement is true about MapReduce v1 APIs?
Your answer
Question 35
What two security functions does Apache Knox
provide?
Your answer
C. Proxying services.
Question 36
Test Blanc
Hadoop 2 consists of which three open-source sub-
projects maintained by the Apache Software
Foundation?
Your answer
A. HDFS
B. Hive
C. Big SQL
D. Cloudbreak
E. YARN
F. MapReduce
Question 37
Which Hadoop ecosystem tool can import data into
a Hadoop cluster from a DB2, MySQL, or other
databases?
Your answer
A. Sqoop
B. HBase
C. Accumulo
D. Oozie
Question 38
Which statement accurately describes how
ZooKeeper works?
Test Blanc
Your answer
Question 39
What does the split-by parameter tell Sqoop?
Your answer
Question 40
Which two are valid watches for ZNodes in
ZooKeeper?
Your answer
A. NodeChildrenChanged
B. NodeRefreshed
C. NodeDeleted
Test Blanc
Your answer
D. NodeExpired
Question 41
Which component of the Apache Ambari
architecture integrates with an organization's LDAP
or Active Directory service?
Your answer
A. REST API
B. Authorization Provider
D. Postgres RDBMS
Question 42
Which hardware feature on an Hadoop datanode is
recommended for cost efficient performance?
Your answer
A. SSD
B. RAID
C. JBOD
D. LVM
Question 43
Test Blanc
Which data encoding format supports exact storage
of all data in binary representations such as
VARBINARY columns?
Your answer
A. SequenceFiles
B. Flat
C. Parquet
D. RCFile
Question 44
What is the name of the Hadoop-related Apache
project that utilizes an in-memory architecture to run
applications faster than MapReduce?
Your answer
A. Pig
B. Spark
C. Python
D. Hive
Question 45
Which statement about Apache Spark is true?
Your answer
Question 46
Which description characterizes a function provided
by Apache Ambari?
Your answer
Question 47
What is the preferred replacement for Flume?
Your answer
A. Druid
C. Storm
D. NiFi
Test Blanc
Question 48
What are two primary limitations of MapReduce v1?
Your answer
A. Scalability
C. Resource utilization
Question 49
What are two services provided by ZooKeeper?
Your answer
Question 50
What are three IBM value-add components to the
Hortonworks Data Platform (HDP)?
Your answer
A. Big Match
Test Blanc
Your answer
B. Big YARN
C. Big SQL
D. Big Replicate
E. Big Data
F. Big Index
Question 51
What does the user interface for Jupyter look like to
a user?
Your answer
A. Database interface.
Question 52
What is the first step in a data science pipeline?
Your answer
A. Analytics
B. Manipulation
Test Blanc
Your answer
C. Exploration
D. Acquisition
Question 53
What command is used to list the "magic"
commands in Jupyter?
Your answer
A. %dirmagic
B. %list-all-magic
C. %list-magic
D. %lsmagic
Question 54
What is a "magic" command used for in Jupyter?
Your answer
Question 55
Which is an advantage that Zeppelin holds over
Jupyter?
Test Blanc
Your answer
Question 56
Where does the unstructured data of a project
reside in Watson Studio?
Your answer
A. Database
B. Wrapper
C. Tables
D. Object Storage
Question 57
Before you create a Jupyter notebook in Watson
Studio, which two items are necessary?
Your answer
A. Scala
B. URL
C. Project
Test Blanc
Your answer
D. Spark Instance
E. File
Question 58
Which type of cell can be used to document and
comment on a process in a Jupyter notebook?
Your answer
A. Kernel
B. Output
C. Code
D. Markdown
Question 59
Which Watson Studio offering used to be available
through something known as IBM Bluemix?
Your answer
Question 60
Test Blanc
What is the architecture of Watson Studio centered
on?
Your answer
A. Collaborators
B. Data Assets
C. Projects
D. Analytic Assets
Big Data Engineer v2
IBM Certification
2018
1/ What are the 4Vs of Big Data? (Please select the FOUR that apply)
• Veracity
• Velocity
• Variety
• Volume
2/ What are the three types of Big Data? (Please select the THREE that apply)
• Semi-structured
• Structured
• Unstructured
3/ Select all the components of HDP which provides data access capabilities
• Pig
• MapReduce
• Hive
4/ Select the components that provides the capability to move data from
relational database into Hadoop.
• Sqoop
• Kafka
• Flume
6/ True or False: The following components are value-add from IBM: Big
Replicate, Big SQL, BigIntegrate, BigQuality, Big Match
• TRUE
7/ True or False: Data Science capabilities can be achieved using only HDP.
FALSE (Big Data Ecosystem UNIT 2)
p.45 // Hortonworks Data Platform.
8/ True or False: Ambari is backed by RESTful APIs for developers to easily
integrate with their own applications.
• True
10/ Which page from the Ambari UI allows you to check the versions of the
software installed on your cluster?
• The Admin > Manage Ambari page
11/ True or False?Creating users through the Ambari UI will also create the
user on the HDFS.
• FALSE
12/ True or False? You can use the CURL commands to issue commands to
Ambari.
• TRUE
13/ True or False: Hadoop systems are designed for transaction processing.
• FALSE
15/ True or False: One of the driving principal of Hadoop is that the data is
brought to the program.
FALSE (Big Data Ecosystem UNIT 4)
⇒ programs are brought to the data, not the data to the program
16/ True or False: Atleast 2 Name Nodes are required for a standalone
Hadoop cluster.
FALSE (Big Data Ecosystem UNIT 4)
⇒ One Name Node is required
17/ True or False: The phases in a MR job are Map, Shuffle, Reduce and
Combiner
• TRUE
18/ Centralized handling of job control flow is one of the the limitations of MR
v1.
• TRUE
20/ What are the benefits of using Spark? (Please select the THREE that
apply)
• Generality
• Speed
• East of use
21/ What are the languages supported by Spark? (Please select the THREE
that apply)
• Python
• Java
• Scala
23/ What would you need to do in a Spark application that you would not
need to do in a Spark shell to start using Spark?
• Import the necessary libraries to load the SparkContext
24/ True or False: NoSQL database is designed for those that do not want to
use SQL.
FALSE (Big Data Ecosystem UNIT 7)
⇒ Note: HBase and other NoSQL distributed data stores are subject to the CAP
Theorem which states that distributed NoSQL data stores can only achieve 2
out of the 3 properties: consistency, availability and partition tolerance.
25/ Which database is a columnar storage database?
• Hbase
31/ True or False: Sqoop is used to transfer data between Hadoop and
relational databases.
• True
34/ Through what HDP component are Kerberos, Knox, and Ranger managed?
• Ambari
36/ One of the governance issue that Hortonworks DataPlane Service (DPS)
address is visibility over all of an organization's data across all of their
environments — on-prem, cloud, hybrid — while making it easy to maintain
consistent security and governance
• True
37/ True or false: The typical sources of streaming data are Sensors, "Data
exhaust" and high-rate transaction data.
• True
39/ True or False: NiFi is a disk-based, microbatch ETL tool that provides flow
management
• True
40/ True or False: MiNiFi is a complementary data collection tool that feeds
collected data to NiFi
• True
41/ What main features does IBM Streams provide as a Streaming Data
Platform? (Please select the THREE that apply)
• Analysis and visualization
• Rich data connections
• Development support
42/ What are the most important computer languages for Data Analytics?
(Please select the THREE that apply)
• Python
• R
• Scala
43/ True or False: GPUs are special-purpose processors that traditionally can
be used to power graphical displays, but for Data Analytics lend themselves to
faster algorithm execution because of the large number of independent
processing cores.
• True
44/ True or False: Jupyter stores its workbooks in files with the .ipynb suffix.
These files can not be stored locally or on a hub server.
FALSE (Introduction to Data Science UNIT 1)
Course 3
50/ Using the LOAD operation is the recommended method for getting data
into your Big SQL table for best performance.
• True
54/ True or False: You can only Kerberize a Big SQL server before it is
installed. False (Big SQL UNIT 4)
55/ True or False: Authentication with Big SQL only occurs at the Big SQL
layer or the client's application layer. False (Big SQL UNIT 4)
56/ True or False: Ranger and impersonation works well together.
False (Big SQL UNIT 4)
57/ True or False: RCAC can hide rows and columns.
• True
58/ True or False: Nicknames can be used for wrappers and servers.
False (Big SQL UNIT 5)
⇒ Nicknames are created to associate the tables in the remote data
source.
59/ True or False: Server objects defines the property and values of the
connection.
• True
60/ True or False: The purpose of a wrapper provide a library of routines that
doesn't communicates with the data source.
False (Big SQL UNIT 5)
61/ True or False: User mappings are used to authenticate to the remote data
source.
• True
Course 4
63/ True or False: Watson Studio is designed only for Data Scientists, other
personas would not know how to use it.
False (Watson Studio UNIT 1)
⇒ Thecollaborative platform allows the users, whether they are data
scientists, data engineers, or application developers
64/ True or False: Community provides access to articles, tutorials, and even
data sets that you can use.
• True
65/ True or False: You can import visualization libraries into Watson Studio.
• True