0% found this document useful (0 votes)
34 views5 pages

Introduction To NoSQL

Uploaded by

shivaraj BG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views5 pages

Introduction To NoSQL

Uploaded by

shivaraj BG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NoSQL Big Data Management, MongoDB and Cassandra

1. Introduction
2. NoSQL Data Store
• NoSQL
• Schema-less Models
• Increasing flexibility for Data Manipulation
3. NoSQL Data Architecture Patterns
• Key-value store
• Document store
• Tabular data
• Object data store
• Graph Database
• Variations of NoSQL Architectural patterns
4. NoSQL to Manage Big Data
• Using NoSQL to Manage Big Data
5. Shared-Nothing Architecture for Big Data Tasks
• Choosing the distribution models
• Ways of Handling Bigdata problems
6. MongoDB, Databases
7. Cassandra Databases
This chapter focuses on providing detailed concepts of NoSQL data architectural patterns,
Management of Big Data, data distribution models, handling of big data problems using
NoSQL, MongoDB for document and Cassandra for columnar stores.

Learning Objectives:
1. Get conceptual understanding of NoSQL data stores, big data solutions, schema-less
models, and increased flexibility for data manipulation.
2. Get knowledge of NoSQL data architecture patterns namely, key-value pairs, tabular,
column family, big table, record columnar (RC), optimized row columnar (OCR) and
parquet, document, object and graph data stores, and the variations in architectural
patterns.
3. Get conceptual understanding of NoSQL data store management, applications and
handling problems in big data.
4. Solve Big data analytics using shared-nothing architecture, choosing a distribution
model among master-slave and peer-to-peer models, and get the knowledge of four
ways by which the NoSQL handles the bigdata problems
5. Apply the MongoDB databases and query commands.
6. Use the Cassandra databases, data model, clients, and integrate them with Hadoop.
Learning Outcome:
1. A new category of data stores is NoSQL (Not Only SQL) databases. NoSQL is an
altogether new approach of thinking about data stores.
2. NoSQL data model offers relaxation in one or more of the ACID properties, instead
follows CAP theorem and BASE.
3. NoSQL DBs possess greater flexibility for data manipulation (compared to SQL)
4. NoSQL data does not need fixed schema. The data model may drop support to joins in
Big Data environment.
Introduction
• Big Data uses distributed systems.
• A distributed system consists of multiple data nodes and distributed software
components.
• The tasks are executed in parallel.
Following are the features of distributed-computing architecture:
1. Increased reliability and fault tolerance.
2. Flexibility.
3. Sharding is storing the different parts of data onto different sets of data nodes, clusters
or servers.
4. Speed.
5. Scalability.
6. Resources sharing.
7. Open system makes the service accessible to all nodes.
8. Performance.
The following are the demerits of distributed computing,
1. Issues in troubleshooting in a larger networking infrastructure.
2. Additional software requirements.
3. Security risks for data and resources.

1. Overcoming Solution for Issues in Troubleshooting in a Larger Networking Infrastructure


• Effective Monitoring and Management
• Centralized Logging and Diagnostics
• Automation and Orchestration
2. Overcoming Solution for Additional Software Requirements
• Containerization
• Virtualization
• Microservices Architecture
3. Overcoming Solution for Security Risks for Data and Resources
• Encryption
• Access Controls, Security Policies and Training, Security Tools
• Security Audits
Software used for NoSQL big data management
1. NoSQL Databases:
• MongoDB: A popular document-oriented NoSQL database.
• Cassandra: A distributed column-family NoSQL database.
• Couchbase: A key-value and document-oriented NoSQL database.
• HBase: A distributed and scalable column-family store for Big Data.
• Neo4j: A graph database for managing highly interconnected data.
2. Big Data Processing Frameworks:
• Apache Hadoop: A framework for distributed storage and batch processing.
• Apache Spark: A fast and versatile data processing engine for real-time and batch
processing.
• Apache Flink: A stream processing framework for real-time analytics.
• Apache Kafka: A distributed data streaming platform for real-time data ingestion and
processing.
3. Data Ingestion and ETL Tools:
• Apache Nifi: An open-source data integration tool for automating data flows.
• Talend: A data integration and transformation tool for Big Data.
• Apache Flume: A distributed data collection and aggregation system.
4. Data Warehousing and Analytics:
• Amazon Redshift: A cloud-based data warehousing solution.
• Google BigQuery: A serverless, highly scalable data warehouse.
• Snowflake: A cloud-based data warehousing platform.
5. Data Visualization and BI Tools:
• Tableau: A popular data visualization tool.
• Power BI: Microsoft's business intelligence and data visualization tool.
• QlikView/Qlik Sense: Business intelligence and data discovery software.
6. Machine Learning and AI Frameworks:
• TensorFlow: An open-source machine learning framework.
• PyTorch: An open-source deep learning framework.
• Scikit-Learn: A machine learning library for Python.
7. Data Security and Governance:
• Apache Ranger: A framework for centralized security and governance for Big Data.
• Apache Sentry: A system for role-based access control in Big Data environments.
8. Monitoring and Management Tools:
• Cloudera Manager: A management and monitoring tool for Hadoop clusters.
• Hortonworks Data Platform (HDP): An open-source platform for Big Data
management.
• DataDog: A cloud-based monitoring and analytics platform for real-time data insights.
9. Containerization and Orchestration:
• Docker: A platform for containerization of applications.
• Kubernetes: An open-source container orchestration system.
10. Data Storage:
• Hadoop Distributed File System (HDFS): A distributed file system for storing Big
Data.
• Amazon S3: A scalable cloud-based object storage service.
• Google Cloud Storage: Google's object storage solution.

NoSQL Big Data Management


Use Case: Retail Analytics with NoSQL Big Data Management
Scenario: A retail company wants to analyze sales data, customer behavior, and inventory
management in real-time to optimize their operations and enhance the customer experience.
Data Ingestion: Data is ingested from various sources, such as point-of-sale (POS) systems, e-
commerce platforms, and inventory databases.
• Tools: Apache Nifi, Apache Kafka for real-time data streaming.
Data Storage: Data is stored in a NoSQL database for flexible and scalable data management.
• Database: MongoDB for document storage.
Real-Time Data Processing: Real-time data processing is performed to monitor sales and
inventory.
• Framework: Apache Spark for real-time analytics.
Machine Learning and Predictive Analytics: Machine learning models are applied to predict
customer preferences and optimize inventory.
• Framework: TensorFlow for model development.
Data Warehouse: Aggregated and processed data is loaded into a data warehouse for historical
analysis and reporting.
• Data Warehouse: Amazon Redshift for historical data storage.
Data Visualization and Reporting: Data is visualized and reported to provide insights to
decision-makers.
• Tools: Tableau for creating dashboards and reports.
Data Security and Governance: Data access controls and governance policies are enforced.
• Framework: Apache Ranger for access control.
Monitoring and Management: The entire system is monitored and managed to ensure stability
and performance.
• Tools: Cloudera Manager and DataDog for monitoring.
Containerization and Orchestration: The entire system can be containerized for scalability and
ease of management.
• Tools: Docker for containerization, Kubernetes for orchestration.
Data Archiving:Older data is archived for compliance and historical analysis.
• Storage: Amazon S3 for cost-effective data archiving.
Outcome:
• The retail company can monitor real-time sales data, predict inventory needs, optimize
pricing, and offer personalized recommendations to customers.
• Decision-makers can access interactive dashboards and historical reports to make
informed decisions.
• Data is securely managed, and the system can scale horizontally as data volumes
increase.

You might also like