Data Analytics Exam Solutions
Q1. Types of Data
Data analytics me data ke 2 main types hote hain: Structured aur Unstructured.
- Structured Data: Yeh fixed format me hota hai, jaise rows aur columns me. Examples: Excel sheets,
relational databases.
- Unstructured Data: Free-form data hota hai, jaise text, images, aur videos. Examples: Social media posts,
emails, audios.
- Semi-structured Data: Partially organized hota hai, jaise XML aur JSON files.
Conclusion: Dono data types ko mix karke zyada advanced insights derive ki ja sakti hain.
Q2. Phases of Data Analytics Lifecycle
1. Discovery: Problem aur objectives ko define karte hain.
2. Data Preparation: Data ko clean aur preprocess karte hain.
3. Model Planning: Algorithms aur techniques select karte hain.
4. Model Building: Models train aur test karte hain.
5. Results Communication: Insights ko visualize aur share karte hain.
6. Operationalize: Final model ko deploy karte hain.
Conclusion: Lifecycle ek structured approach ensure karta hai for effective data analysis.
Q3. Decision Trees: Working and Importance
Decision Tree ek supervised machine learning algorithm hai jo data ko classify karne aur predict karne ke liye
use hota hai.
- Working: Root node se start karta hai, jahan data split hota hai based on attribute values. Leaf nodes
decisions ya classifications show karte hain.
- Importance: Yeh intuitive aur explainable hote hain, jo real-world decision-making me kaam aate hain.
Data Analytics Exam Solutions
Applications: Fraud detection, medical diagnosis, aur loan approval.
Q4. Steps in Bayesian Data Analysis
Bayesian data analysis ek statistical approach hai jo uncertainties ko quantify karta hai:
1. Define Prior Beliefs: Problem ke pehle assumptions set karte hain.
2. Likelihood Function: Data observe karte hue probabilities calculate karte hain.
3. Compute Posterior: Updated probabilities nikalte hain.
4. Validate Model: Model ko evaluate karte hain.
Conclusion: Bayesian methods dynamic aur real-world uncertainties ke liye ideal hain.
Q5. K-Means Clustering
K-Means ek unsupervised learning algorithm hai jo similar data points ko clusters me group karta hai.
- Working: Data ko randomly initialized centroids ke around cluster karta hai aur centroids ko iteratively
update karta hai.
- Example: E-commerce me customer segmentation ke liye.
Applications: Market segmentation, anomaly detection, aur image compression.
Q6. Comparison: RDBMS, NoSQL, and Hadoop Systems
RDBMS, NoSQL, aur Hadoop systems ka use alag scenarios me hota hai:
- RDBMS: Structured data ke liye. Example: MySQL.
- NoSQL: Flexible schema aur unstructured data ke liye. Example: MongoDB.
- Hadoop: Distributed systems aur big data analytics ke liye. Example: HDFS.
Comparison Table: Hadoop distributed aur scalable hai, jabki RDBMS transactional aur NoSQL flexible hote
hain.
Data Analytics Exam Solutions
Q7. Multivariate Analysis Techniques with Use Cases
Multivariate analysis multiple variables ke relationships ko samajhne ke liye hota hai:
- PCA: Dimensionality reduction.
- Clustering: Data grouping.
- Factor Analysis: Hidden factors identify karna.
Applications: Marketing me customer segmentation, finance me risk assessment.
Q8. Components of Hadoop and MapReduce
Hadoop big data ke distributed processing ke liye use hota hai:
- Components: HDFS, YARN, MapReduce, aur Hadoop Common.
- MapReduce Workflow: Input splitting, mapping, shuffling, aur reducing ke steps.
Applications: E-commerce recommendations, genomics, aur fraud detection.
Q9. Role of Visualization Tools
Data visualization tools raw data ko graphs aur charts me convert karte hain:
- Tools: Tableau, Power BI, matplotlib.
- Applications: Healthcare me disease tracking, finance me trends, aur marketing me customer insights.
Importance: Data ko simplify karke insights derive karte hain.
Q10. Hive Architecture and Features
Hive ek SQL-like tool hai jo Hadoop ke upar kaam karta hai:
- Architecture: Components include Metastore, Driver, Compiler, aur HDFS.
- Features: Scalable, extensible, aur SQL-like queries.
Applications: Transaction analysis aur risk management.
Data Analytics Exam Solutions
Q11. Supervised vs. Unsupervised Learning
Supervised aur unsupervised learning ka use alag scenarios me hota hai:
- Supervised Learning: Labeled data ke saath. Example: Spam detection.
- Unsupervised Learning: Unlabeled data ke saath. Example: Customer segmentation.
Conclusion: Problem aur data type ke basis par selection hota hai.
Q12. Advantages of PCY Algorithm Over Apriori
PCY algorithm Apriori ke comparison me memory aur efficiency me better hai:
- PCY uses hashing aur bitmaps, jo memory-efficient hain.
- Apriori multiple scans karta hai, jabki PCY optimized hai.
Applications: Frequent itemset mining in large-scale datasets.
Q13. Bernoulli Sampling and SON Algorithm
Stream data analysis me Bernoulli sampling aur SON algorithm ka use hota hai:
- Bernoulli Sampling: Random sampling with fixed probability. Example: Social media data.
- SON Algorithm: Distributed systems me frequent patterns ke liye efficient.
Applications: Fraud detection aur web log analysis.
Q14. Predictive vs. Prescriptive Analytics
Predictive aur prescriptive analytics ka use decision-making me hota hai:
- Predictive: Future trends predict karna. Example: Sales forecasting.
- Prescriptive: Best actions recommend karna. Example: Dynamic pricing.
Conclusion: Predictive insights aur prescriptive actions dono ka combination powerful hai.
Q15. Hierarchical Clustering
Data Analytics Exam Solutions
Hierarchical clustering ek tree-like structure me data points ko organize karta hai:
- Types: Agglomerative (bottom-up) aur Divisive (top-down).
- Applications: Genomics, marketing segmentation.
Advantages: Visual representation using dendrograms.
Q16. Streaming Data Processing vs. Traditional Data Processing
Data processing ke approaches real-time aur batch-based hote hain:
- Streaming: Continuous data. Example: Stock market updates.
- Traditional: Batch processing. Example: Monthly reports.
Comparison: Streaming real-time hai, jabki traditional periodic analysis ke liye.
Q17. Prediction Error in Regression and Classification
Prediction error model ki accuracy ko evaluate karta hai:
- Regression: Errors include MAE, MSE, aur R-squared.
- Classification: Metrics include confusion matrix, precision, recall.
Example: Misclassification rate aur sales forecast accuracy.
Q18. Steps in Data Analysis Process
Data analysis ek systematic process hai:
- Steps: Objectives define karna, data collect aur clean karna, modeling aur visualization.
- Applications: Business optimization aur trend analysis.
Conclusion: Insights ko actionable recommendations me convert karta hai.