Big Data Analytics - Exam Short Notes
Q.01 (A) NoSQL vs NewSQL - NoSQL: Flexible schema, eventual consistency, API-based,
massive scalability. - NewSQL: SQL + ACID, relational schema, horizontal scaling. - NoSQL limits:
No ACID, hard joins, fragmented tooling → NewSQL created. Q.01 (B) MapReduce - Map: split
data, key/value output. - Reduce: aggregate results. - Pros: scalable, fault tolerant, works on any
data. - Cons: high latency, not for interactive queries. Q.02 RDBMS Challenges & HDFS
Solutions - Challenges: scaling limits, rigid schema, join overhead, storage bottlenecks. - HDFS:
horizontal scaling, replication, high throughput, stores any format. Q.03 (A) Learning Types -
Supervised: labels, prediction, e.g. spam filter. - Unsupervised: no labels, pattern discovery, e.g.
clustering. Q.03 (B) SVM - Finds max-margin hyperplane, uses kernels, good for high-dimensional
small data. Q.04 (A) RL in Big Data - Learns via reward feedback, handles sequential decisions,
uses Deep RL for big data. Q.04 (B) Dimensionality Reduction - Reduces features, removes
noise, speeds computation, visualizes data. Q.05 Graph DB & Analytics - Graph DB: nodes &
edges for relationships, efficient traversal. - Analytics types: centrality, communities, shortest paths,
components, motifs, embeddings.