0% found this document useful (0 votes)

14 views7 pages

Brief Introduction To Amazon

The document is abou Amazon E Commerce platform

Uploaded by

Larry osinaike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views7 pages

Brief Introduction To Amazon

The document is abou Amazon E Commerce platform

Uploaded by

Larry osinaike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Technical Framework for Big Data Analytics

OLUJIMI OSINAIKE NEXFORD

@ UNIVERSITY
Introduction

Amazon is a massive online retailer and cloud services provider, basically offering diverse

products and also renting out computing power. Amazon deal with an insane amount of data

from both their e-commerce platform and their cloud services (AWS).

Data Architecture Implementation

Amazon's data architecture is like a well-organized library, but instead of books, it's filled with

data.

Big data analytics frameworks often include layers for data ingestion, processing, and

visualization to handle the volume and velocity of big data (Gandomi & Haider, 2015)

Data Lake (Amazon S3): This is the main storage area, a giant pool where all the raw data is

dumped. It's like the library's storage room, holding everything from customer purchase history

to website clickstream data. S3 (Simple Storage Service) is used because it's super scalable and

cost-effective for storing massive amounts of data.

Specialized Databases: They use different types of databases for different purposes, like having

different sections in the library.

o DynamoDB: This is a NoSQL database, perfect for fast access to specific pieces of

information. Imagine it as the card catalog, allowing quick lookups of customer profiles,

product details, and order information. It's designed for high performance and scalability.
o Redshift: This is a data warehouse, designed for analyzing large datasets. Think of it as

the research section of the library, where analysts can run complex queries to understand

trends, customer behavior, and business performance. It's optimized for analytical

workloads.

o Other Databases: Amazon also uses other databases like relational databases (e.g.,

PostgreSQL, MySQL) for specific applications and data needs.

 Data Pipelines: Data pipelines are like the delivery trucks that move data from one place

to another. They use tools like AWS Glue and Apache Kafka to ingest, process, and

transform data before it's stored in the data lake or databases.

A well-structured technical framework must support distributed storage and real-time processing

using platforms like Hadoop and Spark (Hashem et al., 2015).

Support of their Data Value Chain: Data is the engine that drives Amazon's entire business. It's

how they make money and stay ahead of the competition.

 Personalized Recommendations: When you see "Customers who bought this item also

bought..." that's data in action. Amazon analyzes your past purchases, browsing history,

and other data to suggest products you might like.

 Targeted Advertising: Amazon uses data to show you ads that are relevant to your

interests. This makes the ads more effective and helps Amazon earn more revenue.

 Supply Chain Optimization: Amazon uses data to predict demand, manage inventory, and

optimize its logistics network. This helps them ensure they have the right products in

stock and can deliver them to customers quickly.

 Fraud Detection: Amazon uses data to identify and prevent fraudulent activities,

protecting both the company and its customers.

 Pricing Optimization: Amazon uses data to dynamically adjust prices based on demand,

competitor pricing, and other factors.

Distributed Data Processing Models: To handle the massive volume of data, Amazon uses

distributed processing, which is like having a team of workers instead of one person.

EMR (Elastic MapReduce): This is a managed Hadoop and Spark service. Hadoop and Spark are

open-source frameworks designed for processing large datasets in a distributed manner. EMR

allows Amazon to easily spin up clusters of computers to process data in parallel.

Spark: Spark is a fast, in-memory data processing engine that's often used with EMR. It's great

for iterative algorithms and real-time data processing.

Other AWS Services: Amazon also uses other AWS services like Kinesis (for real-time data

streaming) and Lambda (for serverless computing) to process data.

How it Works: Data is broken down into smaller chunks and processed simultaneously across

multiple computers. The results are then aggregated to provide insights.

Data Challenges Across the Value Chain: Dealing with big data isn't always easy. Amazon faces

several challenges.

 Volume: The sheer amount of data is overwhelming. They need to store, process, and

analyze petabytes of data every day.

 Velocity: Data is coming in at a rapid pace. They need to process data in real-time or near

real-time to make timely decisions.

 Variety: Data comes in many different formats (structured, semi-structured,

unstructured). They need to be able to handle all types of data.

 Veracity: Ensuring data quality and accuracy is crucial. They need to clean, validate, and

transform data to ensure its reliability.

 Security: Protecting sensitive customer data is paramount. They need to implement robust

security measures to prevent data breaches.

 Scalability: As the business grows, the data processing infrastructure needs to scale to

handle the increasing volume of data.

Challenges and Recommendations of Their Data Modeling: Data modeling is like creating the

blueprints for how data is organized.

 Challenges:

o Complexity: The relationships between different data points can be complex, making it

difficult to design effective data models.

o Evolving Business Needs: Business requirements change over time, which can require

frequent updates to data models.

o Data Silos: Data may be stored in different systems, making it difficult to integrate and

analyze.

 Recommendations:
o Flexible and Scalable Models: Use data models that can easily adapt to changing business

needs and scale to handle increasing data volumes. Consider using a data lake approach

with a schema-on-read strategy, allowing for flexibility.

o Data Governance: Implement strong data governance practices to ensure data quality,

consistency, and security.

o Data Cataloging: Use a data catalog to document and manage data assets, making it

easier for users to find and understand data.

o Continuous Model Refinement: Regularly review and refine data models to ensure they

meet business needs and optimize performance.

o Focus on Data Lineage: Track the origin and transformation of data to improve data

quality and facilitate troubleshooting.

o Embrace Automation: Automate data modeling tasks, such as data discovery, data

profiling, and model generation, to improve efficiency and reduce errors.

Scalability, fault tolerance, and low latency are key technical requirements for an effective big

data analytics infrastructure (Zikopoulos & Eaton, 2011).

Reference List

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics.

International Journal of Information Management, 35(2), 137–144.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The

rise of “big data” on cloud computing: Review and open research issues. Information Systems,

47, 98–115.

Zikopoulos, P. C., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class

Hadoop and streaming data. McGraw-Hill Osborne Media.

DataAnalytics AWS PDF
No ratings yet
DataAnalytics AWS PDF
133 pages
DocScanner 20 Oct 2024 2-19 PM
No ratings yet
DocScanner 20 Oct 2024 2-19 PM
16 pages
AWS ML Cheat Sheet Nov 2024
No ratings yet
AWS ML Cheat Sheet Nov 2024
100 pages
Amazon's Big Data Strategies Explained
No ratings yet
Amazon's Big Data Strategies Explained
13 pages
Awsdataanalyticsonawstechnicaliltinstructordeck2023 230304021823 0674c2bb
No ratings yet
Awsdataanalyticsonawstechnicaliltinstructordeck2023 230304021823 0674c2bb
146 pages
AIA 6550 Module 3 Milestone Technical Framework For Big Data Analytics
No ratings yet
AIA 6550 Module 3 Milestone Technical Framework For Big Data Analytics
15 pages
Amazon Use Big Data
No ratings yet
Amazon Use Big Data
2 pages
Data Analytics Assignment
No ratings yet
Data Analytics Assignment
17 pages
Understanding Big Data Analytics Concepts
No ratings yet
Understanding Big Data Analytics Concepts
39 pages
BIG DATA FRAMEWORK FOR AMAZON (An Orientation Template For New Employees)
No ratings yet
BIG DATA FRAMEWORK FOR AMAZON (An Orientation Template For New Employees)
7 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
Big Data & Unsupervised Learning Guide
No ratings yet
Big Data & Unsupervised Learning Guide
6 pages
Reinforcement Learning (RL) - Definition
No ratings yet
Reinforcement Learning (RL) - Definition
6 pages
Unit 4
No ratings yet
Unit 4
30 pages
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
No ratings yet
1) Discuss Big Data Architecture in Detail With Help of Neat and Clean Diagram
18 pages
DP 900 Day 4
No ratings yet
DP 900 Day 4
40 pages
Big Data Characteristics and Management
No ratings yet
Big Data Characteristics and Management
8 pages
Big Data - Comprehensive Summary
No ratings yet
Big Data - Comprehensive Summary
12 pages
Big Data Analytics
No ratings yet
Big Data Analytics
33 pages
Karthik (Project Details)
No ratings yet
Karthik (Project Details)
14 pages
Lecture 2
No ratings yet
Lecture 2
11 pages
Case Study BDA
No ratings yet
Case Study BDA
4 pages
Data Lake and Serverless Architecture Guide
No ratings yet
Data Lake and Serverless Architecture Guide
83 pages
Unit 1
No ratings yet
Unit 1
11 pages
AWS Data Infrastructure Guide
No ratings yet
AWS Data Infrastructure Guide
9 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
17 pages
Fundamentals of Big Data and Business Analytics
No ratings yet
Fundamentals of Big Data and Business Analytics
6 pages
Big Data One Shot
No ratings yet
Big Data One Shot
45 pages
Big Data Strategies for Amazon's Success
No ratings yet
Big Data Strategies for Amazon's Success
13 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Bda Ans
No ratings yet
Bda Ans
18 pages
Design Data Architecture 1st Unit
No ratings yet
Design Data Architecture 1st Unit
58 pages
Data Engineering and Data Engineer - Students
No ratings yet
Data Engineering and Data Engineer - Students
56 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
AWS Data Analytics - Technical - Student
No ratings yet
AWS Data Analytics - Technical - Student
160 pages
BDA - Lab-Manual - 1to4
No ratings yet
BDA - Lab-Manual - 1to4
17 pages
Darshan - BA Assignment
No ratings yet
Darshan - BA Assignment
10 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Bda Unit-1 Notes
No ratings yet
Bda Unit-1 Notes
10 pages
TA3 Big Data Analytics
No ratings yet
TA3 Big Data Analytics
13 pages
Key Drivers and Architecture of Big Data
No ratings yet
Key Drivers and Architecture of Big Data
5 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data Integration and Processing 15 Marks
No ratings yet
Big Data Integration and Processing 15 Marks
5 pages
Group 4
No ratings yet
Group 4
10 pages
SAS - Assignment-01-2001
No ratings yet
SAS - Assignment-01-2001
2 pages
DSBDA Insem
No ratings yet
DSBDA Insem
18 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
Managing Big Data
No ratings yet
Managing Big Data
2 pages
Big Data: Key Concepts and Applications
No ratings yet
Big Data: Key Concepts and Applications
25 pages
Abhishek Seminar 222
No ratings yet
Abhishek Seminar 222
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
Big Data Analytics
No ratings yet
Big Data Analytics
36 pages
AWS Tools for Data Engineers
No ratings yet
AWS Tools for Data Engineers
24 pages
D Report
No ratings yet
D Report
19 pages
What Is Iot: 5 V of Big Data
No ratings yet
What Is Iot: 5 V of Big Data
17 pages
Big Data: Challenges and Solutions
No ratings yet
Big Data: Challenges and Solutions
6 pages
Big Data Processing: Speed & Efficiency
No ratings yet
Big Data Processing: Speed & Efficiency
28 pages
IOTBDM - Mid Sem
No ratings yet
IOTBDM - Mid Sem
16 pages
Fraud Detection
No ratings yet
Fraud Detection
5 pages
Opportunity For Innovation
No ratings yet
Opportunity For Innovation
11 pages
Cybersecurity Implementation Guideline For The Board of Directors
No ratings yet
Cybersecurity Implementation Guideline For The Board of Directors
10 pages
Executive Summary
No ratings yet
Executive Summary
9 pages
CCIE EI v1.1 - Design - D 2 - Demo
0% (1)
CCIE EI v1.1 - Design - D 2 - Demo
10 pages
Offer - Olujimi Osinaike
No ratings yet
Offer - Olujimi Osinaike
2 pages
Hair Sanitizer Report
No ratings yet
Hair Sanitizer Report
15 pages
Bahasa Anak JakSel: A Sociolinguistic Study
No ratings yet
Bahasa Anak JakSel: A Sociolinguistic Study
11 pages
English 4 - Quarter 4 - Module 3 Fact Opinion
No ratings yet
English 4 - Quarter 4 - Module 3 Fact Opinion
14 pages
Midterm Correction Post Test Psych Nursing
No ratings yet
Midterm Correction Post Test Psych Nursing
25 pages
Goalkeeper Training Schedule
No ratings yet
Goalkeeper Training Schedule
35 pages
Bootstrap and React
No ratings yet
Bootstrap and React
20 pages
WCA Audit Document Checklist
No ratings yet
WCA Audit Document Checklist
2 pages
Id-Cooling Store: Supports: Supports: AM 4/3
No ratings yet
Id-Cooling Store: Supports: Supports: AM 4/3
10 pages
The Venus of Urbino
No ratings yet
The Venus of Urbino
6 pages
HSC Business Studies Essay Guide
No ratings yet
HSC Business Studies Essay Guide
10 pages
Guest Worker Programs and Circular Migration: What Works?
No ratings yet
Guest Worker Programs and Circular Migration: What Works?
17 pages
Abscesses - Infectious Diseases - MSD Manual Professional Editio
No ratings yet
Abscesses - Infectious Diseases - MSD Manual Professional Editio
3 pages
IR Playbook A Comprehensive Introduction To Interventional Radiology Entire PDF Ebook
100% (15)
IR Playbook A Comprehensive Introduction To Interventional Radiology Entire PDF Ebook
15 pages
l4FgLDfwsuEo2uIfHu9Nw 1
No ratings yet
l4FgLDfwsuEo2uIfHu9Nw 1
1 page
Golden Race 2.0 - Vbox - 20211026163628
No ratings yet
Golden Race 2.0 - Vbox - 20211026163628
35 pages
Spanish Nationality and Pronouns
No ratings yet
Spanish Nationality and Pronouns
24 pages
Local - CHAPTER 13 COST ACCOUNTING (Answer)
No ratings yet
Local - CHAPTER 13 COST ACCOUNTING (Answer)
3 pages
Evolutionary Existentialism
No ratings yet
Evolutionary Existentialism
6 pages
Characteristics of A Proactive Learner - Presentation
No ratings yet
Characteristics of A Proactive Learner - Presentation
23 pages
Sylobloc 45 Tds
100% (1)
Sylobloc 45 Tds
3 pages
C&P WITCHLINER Insulated U-Bolt (Not To Grip)
No ratings yet
C&P WITCHLINER Insulated U-Bolt (Not To Grip)
1 page
QQQ GCSE SimultaneousEqs
No ratings yet
QQQ GCSE SimultaneousEqs
4 pages
D&D Berserker Barbarian Profile
No ratings yet
D&D Berserker Barbarian Profile
1 page
Hume - 13 Principal Up Ani Shads
No ratings yet
Hume - 13 Principal Up Ani Shads
555 pages
Computer Network
No ratings yet
Computer Network
50 pages
Tilt-Up Wall Panel Design Analysis
No ratings yet
Tilt-Up Wall Panel Design Analysis
42 pages
Distorted Faith
No ratings yet
Distorted Faith
2 pages
Newhart & Patten 2023
No ratings yet
Newhart & Patten 2023
53 pages
On-the-Job Training Guide: Welder
No ratings yet
On-the-Job Training Guide: Welder
8 pages
English Word-Formation Analysis
No ratings yet
English Word-Formation Analysis
19 pages

Brief Introduction To Amazon

Uploaded by

Brief Introduction To Amazon

Uploaded by

Technical Framework for Big Data Analytics

OLUJIMI OSINAIKE NEXFORD

Data Architecture Implementation

cost-effective for storing massive amounts of data.

different sections in the library.

PostgreSQL, MySQL) for specific applications and data needs.

transform data before it's stored in the data lake or databases.

using platforms like Hadoop and Spark (Hashem et al., 2015).

how they make money and stay ahead of the competition.

and other data to suggest products you might like.

stock and can deliver them to customers quickly.

protecting both the company and its customers.

competitor pricing, and other factors.

allows Amazon to easily spin up clusters of computers to process data in parallel.

for iterative algorithms and real-time data processing.

streaming) and Lambda (for serverless computing) to process data.

multiple computers. The results are then aggregated to provide insights.

analyze petabytes of data every day.

real-time to make timely decisions.

 Variety: Data comes in many different formats (structured, semi-structured,

unstructured). They need to be able to handle all types of data.

transform data to ensure its reliability.

security measures to prevent data breaches.

handle the increasing volume of data.

blueprints for how data is organized.

difficult to design effective data models.

frequent updates to data models.

with a schema-on-read strategy, allowing for flexibility.

consistency, and security.

easier for users to find and understand data.

meet business needs and optimize performance.

quality and facilitate troubleshooting.

profiling, and model generation, to improve efficiency and reduce errors.

data analytics infrastructure (Zikopoulos & Eaton, 2011).

International Journal of Information Management, 35(2), 137–144.

Hadoop and streaming data. McGraw-Hill Osborne Media.

You might also like