0% found this document useful (0 votes)
14 views2 pages

Big Data Analytics Solved Paper

The document outlines key differences between NoSQL and NewSQL databases, highlighting aspects such as data models, consistency, and scalability. It also discusses MapReduce in NoSQL, RDBMS challenges, supervised vs unsupervised learning, and the importance of dimensionality reduction in big data. Additionally, it covers the role of graph databases in representing complex relationships and types of graph analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

Big Data Analytics Solved Paper

The document outlines key differences between NoSQL and NewSQL databases, highlighting aspects such as data models, consistency, and scalability. It also discusses MapReduce in NoSQL, RDBMS challenges, supervised vs unsupervised learning, and the importance of dimensionality reduction in big data. Additionally, it covers the role of graph databases in representing complex relationships and types of graph analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Big Data Analytics - Final Exam Solved Paper

Q.01 (A)
Differences between NoSQL & NewSQL:
- Data Model: NoSQL = flexible schema; NewSQL = relational schema with ACID.
- Consistency: NoSQL often eventual; NewSQL strong consistency.
- Query Language: NoSQL uses varied APIs; NewSQL uses SQL.
- Scalability: Both scale horizontally; NewSQL keeps SQL advantages.
- Use Cases: NoSQL for un/semi-structured data; NewSQL for transactional workloads.

Limitations of NoSQL → NewSQL:


- Lack of ACID transactions.
- Hard for complex joins/analytics.
- Fragmented tooling and query syntax.

Q.01 (B)
MapReduce in NoSQL:
- Map: process chunks → key/value pairs.
- Shuffle/Sort: group by key.
- Reduce: aggregate values.
- Data locality and parallelism.

Advantages vs RDBMS:
- Horizontal scaling.
- Works with un/semi-structured data.
- Fault tolerant.
Disadvantages:
- High latency.
- Hard for iterative algorithms.
- Complex joins.

Q.02
RDBMS Challenges:
- Vertical scaling limits.
- Rigid schema.
- Join overhead at scale.
- Storage & throughput bottlenecks.

How HDFS solves:


- Horizontal scaling.
- Data replication & fault tolerance.
- High throughput for big files.
- Data locality.
- Stores any format data.

Q.03 (A)
Supervised vs Unsupervised:
- Labels: Supervised has labels; Unsupervised no labels.
- Goal: Prediction vs pattern discovery.
- Evaluation: Accuracy vs cluster metrics.
- Algorithms: SVM, trees vs k-means, PCA.

Examples:
- Supervised: Spam detection.
- Unsupervised: Customer segmentation.
Q.03 (B)
SVM Working:
- Finds hyperplane with max margin.
- Uses support vectors.
- Kernel trick for nonlinear separation.
- Soft margin for tradeoff errors.

Best use cases:


- High-dimensional data.
- Small/medium datasets.

Q.04 (A)
Reinforcement Learning in Big Data:
- Learns from interaction & rewards.
- Handles sequential decisions & delayed rewards.
- Applications: recommendations, trading, resource allocation.
- Uses Deep RL for large datasets.

Q.04 (B)
Dimensionality Reduction Importance:
- Reduces curse of dimensionality.
- Removes noise & redundancy.
- Speeds up computation.
- Enables visualization.
- Methods: PCA, feature selection, t-SNE, autoencoders.

Q.05
Graph DBs in Big Data:
- Represent nodes & edges for complex relationships.
- Efficient traversal & multi-hop queries.
- Find hidden patterns (fraud rings, communities).

Types of Graph Analytics:


1. Centrality (PageRank).
2. Community detection.
3. Shortest paths.
4. Connected components.
5. Triangle counting/motifs.
6. Graph embeddings.
7. Subgraph matching.
8. Temporal graph analytics.

You might also like