Big Data Analytics - Final Exam Solved Paper
Q.01 (A)
Differences between NoSQL & NewSQL:
- Data Model: NoSQL = flexible schema; NewSQL = relational schema with ACID.
- Consistency: NoSQL often eventual; NewSQL strong consistency.
- Query Language: NoSQL uses varied APIs; NewSQL uses SQL.
- Scalability: Both scale horizontally; NewSQL keeps SQL advantages.
- Use Cases: NoSQL for un/semi-structured data; NewSQL for transactional workloads.
Limitations of NoSQL → NewSQL:
- Lack of ACID transactions.
- Hard for complex joins/analytics.
- Fragmented tooling and query syntax.
Q.01 (B)
MapReduce in NoSQL:
- Map: process chunks → key/value pairs.
- Shuffle/Sort: group by key.
- Reduce: aggregate values.
- Data locality and parallelism.
Advantages vs RDBMS:
- Horizontal scaling.
- Works with un/semi-structured data.
- Fault tolerant.
Disadvantages:
- High latency.
- Hard for iterative algorithms.
- Complex joins.
Q.02
RDBMS Challenges:
- Vertical scaling limits.
- Rigid schema.
- Join overhead at scale.
- Storage & throughput bottlenecks.
How HDFS solves:
- Horizontal scaling.
- Data replication & fault tolerance.
- High throughput for big files.
- Data locality.
- Stores any format data.
Q.03 (A)
Supervised vs Unsupervised:
- Labels: Supervised has labels; Unsupervised no labels.
- Goal: Prediction vs pattern discovery.
- Evaluation: Accuracy vs cluster metrics.
- Algorithms: SVM, trees vs k-means, PCA.
Examples:
- Supervised: Spam detection.
- Unsupervised: Customer segmentation.
Q.03 (B)
SVM Working:
- Finds hyperplane with max margin.
- Uses support vectors.
- Kernel trick for nonlinear separation.
- Soft margin for tradeoff errors.
Best use cases:
- High-dimensional data.
- Small/medium datasets.
Q.04 (A)
Reinforcement Learning in Big Data:
- Learns from interaction & rewards.
- Handles sequential decisions & delayed rewards.
- Applications: recommendations, trading, resource allocation.
- Uses Deep RL for large datasets.
Q.04 (B)
Dimensionality Reduction Importance:
- Reduces curse of dimensionality.
- Removes noise & redundancy.
- Speeds up computation.
- Enables visualization.
- Methods: PCA, feature selection, t-SNE, autoencoders.
Q.05
Graph DBs in Big Data:
- Represent nodes & edges for complex relationships.
- Efficient traversal & multi-hop queries.
- Find hidden patterns (fraud rings, communities).
Types of Graph Analytics:
1. Centrality (PageRank).
2. Community detection.
3. Shortest paths.
4. Connected components.
5. Triangle counting/motifs.
6. Graph embeddings.
7. Subgraph matching.
8. Temporal graph analytics.