Data modeling tells you
i. How your data is structured
ii. What operations can be done on the data
iii. What constraints apply to the data
iv. Where the data is stored
Single choice.
(0.5 Points)
i, ii, iii
ii, iii, iv
i, ii, iv
i, iii, iv
2.
State True/False:
i. A query language is declarative
ii. Database programming language is procedural programming language
Single choice.
(0.5 Points)
i-true, ii-true
i-true, ii-false
i-false, ii-true
i-false, ii-false
3.
SQL query which prints the records for students whose name starts with ‘De’ is
Single choice.
(0.5 Points)
Select * from students where name = ‘De’
Select * from students where name like ‘De’
Select * from students where name = ‘%De’
Select * from students where name like ‘%De’ (best option)
4.
What is a subquery?
Single choice.
(0.5 Points)
A query statement within another query
An alternate query that acts as a substitute for a given query
A query that requires two tables in order to calculate the values
A short query than normal
5.
In MongoDB, ____ operator matches any of the values specified in an array.
Single choice.
(0.5 Points)
$ne
$nin
$in
$eq
6.
What are the three layers for the Hadoop Ecosystem?
i. Data Management and Storage
ii. Data Manipulation and Integration
iii. Coordination and Workflow Management
iv. Data Integration and Processing
v. Data Creation and Storage
Single choice.
(0.5 Points)
ii, iii, iv
i, iii, iv
i, ii, v
ii, iv, v
7.
Which of the following statement is FALSE with respect to the big data
processing engines supported by Apache foundation?
Single choice.
(0.5 Points)
The Beam system is a relatively new system for batch and stream processing with a data
flow programming model
Flink has it's own execution engine called Nephele
Spark defined input stream interface abstractions called spouts, and computation
abstractions called bolts (storm has defined…)
Spark was built using an in-memory structure called Resilient Distributed Datasets
8.
____________ software collects and indexes machine data at a very large
scale irrespective of wherever its generated.
Single choice.
(0.5 Points)
TurboTax
Splunk (not sure)
OpenXC
None of these
9.
What is the equivalent MongoDB query for the given SQL query- “select * from
ABC”?
Single choice.
(0.5 Points)
db.ABC.find( )
select.ABC.find( )
db.select.ABC( )
ABC.db.select( )
10.
db.collection.find(<query filter>, <projection>).<cursor modifier>
Which part of the above statement is equivalent to WHERE clause in SQL?
Single choice.
(0.5 Points)
<query filter>
<Projection>
<collection>
<cursor modifier>
11.
Which of the following is TRUE w.r.t Query Language?
i. Specifies the data items we need.
ii. Database programming language
iii. It is declarative
Single choice.
(0.5 Points)
i, ii only
ii, iii only
i, iii only
i, ii and iii
12.
State True(T) or False(F).
i. MongoDB is a collection of documents.
ii. MongoDB does not have adequate support to perform recursive queries
over nested substructures.
Single choice.
(0.5 Points)
i- T, ii- T
i-T, ii-F
i-F, ii-T
i-F, ii-F
13.
The head(5) command in Pandas data frame is used to
Single choice.
(0.5 Points)
View first five rows
View last five rows
View first five columns
View last five attributes
14.
Considering the following schema, what does the given query return?
Schema: Items (name, manf)
Likes (user, item)
Query: SELECT *
FROM Items
WHERE name NOT IN
(SELECT item
FROM Likes
WHERE user=’Joe’);
Single choice.
(0.5 Points)
Selects the name and manufacturer of each item that Joe doesn’t like
Selects the name of each item liked by Joe
Selects the name and manufacturer of each item liked by everyone except Joe
Selects the manufacturer of each item liked by Joe
15.
Which of the following are distinct layers of Hadoop?
I. Data management and storage
II. Query Management
III. Data processing
IV. All the above
Single choice.
(0.5 Points)
IV
I & III
I & II
I
16.
In Hadoop, different varieties of data get retrieved, integrated, and analyzed in
the:
Single choice.
(0.5 Points)
Data management and storage Layer
Query Management Layer
Data processing Layer
All of the above
17.
The goal of data fusion is to:
I. find the values of Data Items from multiple sources
II. derive information that has greater benefit than what would have been
derived from each of the contributing parts
III. combine all data in a source
IV. find the true worth of a data set
Single choice.
(0.5 Points)
I & II (not sure)
II
III
IV
18.
MongoBD query to find a document whose 2nd element in tags is “summer”
Single choice.
(0.5 Points)
db.inventory.find(tags.1:”summer”)
db.inventory.find(tags.2:”summer”)
db.inventory(tags.1:”summer”)
db.inventory(tags.2:”summer”)
19.
The job of data integration system is:
Single choice.
(0.5 Points)
Accumulate all data in one system
Transform the data from the source schema to the schema of the receiving system
Record Customer Interactions
Customer Analytics
20.
Data compression refers to a way of:
I. Compressing the data file
II. Creating an encoded representation of data.
III. Retaining only relevant data
IV. Creating a form smaller than the original representation.
Single choice.
(0.5 Points)
I & II
I & III
III & IV
II & IV
21.
In Hadoop, the YARN Engine is used for:
Single choice.
(0.5 Points)
Batch and Stream Processing
Data Processing
Resource negotiation and scheduling
All of the above
22.
For applications like online gaming and hazards management it is very important
to have a:
I. High latency system
II. Low latency system
III. Batch processing ability
IV. Highly scalable execution
Single choice.
(0.5 Points)
I & II
I & III
III & IV
II & IV
23.
The integration and processing layer includes which of the following tools for
bringing a query interface on top of the storage layer?
I. Spark SQL
II. Vertica
III. Hive
IV. Solr
Single choice.
(0.5 Points)
I & II
II & IV
I & III
III & IV
24.
What does the following line of code do in postgres?
SELECT count(userid) FROM (SELECT buyclicks.userId, teamLevel, price
FROM buyclicks JOIN gameclicks on buyclicks.userId = gameclicks.userId)
temp WHERE price=3 and teamLevel=5;
Single choice.
(0.5 Points)
Counts the users who exists between both gameclicks and buyclicks files
This is an invalid line of code, the subquery is not formatted properly
Finds the total number of user ids (repeats allowed) in buy-clicks that have bought items
with prices worth $3 and was in a team with level 5 at some point in time
Displays the users who have bought items worth $3 and have had a team with level 5
25.
________is the primary form of data in Information Retrieval systems
Single choice.
(0.5 Points)
Image
Text
XML data
HTML
26.
What is the main problem with big data information integration?
Single choice.
(0.5 Points)
Many sources
Mediated Schema
Pay-as-you-go model
Probabilistic schema mapping
27.
With SQL, which of the following query returns the number of records in the
"Product" table?
Single choice.
(0.5 Points)
Select * from Product
Select count (*) from Product
Select count from Product
Select distinct (count) from Product
28.
Which of the following statements using MongoDB will result in counting the
number of unique jobs of Customers?
Single choice.
(0.5 Points)
db.Customers.count (jobs:{$in: false})
db.Customers(jobs: {$exists: true}).count
db.Customers.count (jobs: {$exists: true})
db.count.Customers exists (jobs)
29.
Any big data integration system should:
I. Not integrate all sources of data
II. Should have addressed the record linkage problem
III. Integrate as per the application/business demand
IV. All the above.
Single choice.
(0.5 Points)
IV
III
II & III
I & III
30.
State True(T) or False(F) w.r.t Big Data Management Systems (BDMS).
i. Mainly designed for parallel and distributed processing.
ii. Always guarantees consistency for every update.
Single choice.
(0.5 Points)
i- T, ii- T
i- T, ii- F
i- F, ii- T
i- F, ii- F
31.
Which is the command used to see the databases in MongoDB?
Single choice.
(0.5 Points)
select dbs
show dbs
create dbs
use dbs
32.
Which of the following is an aggregate function?
Single choice.
(0.5 Points)
Select
Project
Count
Join
33.
In Aerospike, which of these dictates the namespace behavior such as the
way of data storage, existence of replica and expiry time for a record.
Single choice.
(0.5 Points)
Indexes
Policies
Bins
None of these
34.
Which of these the MongoDB commands is used to look for the value where a
particular field is greater than 10?
Single choice.
(0.5 Points)
{$gt=10}
{$gt :=10}
{$gt :10}
{$gt ,10}
35.
____ symbol in MongoDB matches none of the values specified in an array.
Single choice.
(0.5 Points)
$nin
$ne
$in
$not
36.
What does it mean to have a _id:0 within our query statement?
Single choice.
(0.5 Points)
Tell MongoDB not to return a document id.
Grab the first object in the results
Does not have an effect, simple convention left for compatibility issues.
Grab as many objects as possible
37.
What is a data item?
Single choice.
(0.5 Points)
Data found in a mediated schema.
Data found in a customer transaction.
The real worth of a data value.
Data that represents an aspect of a real world entity.
38.
________________ statement in SQL ensures that the result will not have any
duplicates.
Single choice.
(0.5 Points)
SELECT UNIQUE
SELECT DISTINCT
SELECT *
SELECT ALL
39.
“Find friend of a friend” feature on social networks make use of _________
data models.
Single choice.
(0.5 Points)
Relational
Semi Structured
Graph
Text
40.
The decision tree algorithm is one technique for ___________.
Single choice.
(0.5 Points)
Connectivity Analysis
Classification
Clustering
Path Analysis