0% found this document useful (0 votes)
14 views7 pages

Brief Introduction To Amazon

The document is abou Amazon E Commerce platform

Uploaded by

Larry osinaike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Brief Introduction To Amazon

The document is abou Amazon E Commerce platform

Uploaded by

Larry osinaike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Technical Framework for Big Data Analytics

OLUJIMI OSINAIKE NEXFORD


@ UNIVERSITY
Introduction

Amazon is a massive online retailer and cloud services provider, basically offering diverse

products and also renting out computing power. Amazon deal with an insane amount of data

from both their e-commerce platform and their cloud services (AWS).

Data Architecture Implementation

Amazon's data architecture is like a well-organized library, but instead of books, it's filled with

data.

Big data analytics frameworks often include layers for data ingestion, processing, and

visualization to handle the volume and velocity of big data (Gandomi & Haider, 2015)

Data Lake (Amazon S3): This is the main storage area, a giant pool where all the raw data is

dumped. It's like the library's storage room, holding everything from customer purchase history

to website clickstream data. S3 (Simple Storage Service) is used because it's super scalable and

cost-effective for storing massive amounts of data.

Specialized Databases: They use different types of databases for different purposes, like having

different sections in the library.

o DynamoDB: This is a NoSQL database, perfect for fast access to specific pieces of

information. Imagine it as the card catalog, allowing quick lookups of customer profiles,

product details, and order information. It's designed for high performance and scalability.
o Redshift: This is a data warehouse, designed for analyzing large datasets. Think of it as

the research section of the library, where analysts can run complex queries to understand

trends, customer behavior, and business performance. It's optimized for analytical

workloads.

o Other Databases: Amazon also uses other databases like relational databases (e.g.,

PostgreSQL, MySQL) for specific applications and data needs.

 Data Pipelines: Data pipelines are like the delivery trucks that move data from one place

to another. They use tools like AWS Glue and Apache Kafka to ingest, process, and

transform data before it's stored in the data lake or databases.

A well-structured technical framework must support distributed storage and real-time processing

using platforms like Hadoop and Spark (Hashem et al., 2015).

Support of their Data Value Chain: Data is the engine that drives Amazon's entire business. It's

how they make money and stay ahead of the competition.

 Personalized Recommendations: When you see "Customers who bought this item also

bought..." that's data in action. Amazon analyzes your past purchases, browsing history,

and other data to suggest products you might like.

 Targeted Advertising: Amazon uses data to show you ads that are relevant to your

interests. This makes the ads more effective and helps Amazon earn more revenue.

 Supply Chain Optimization: Amazon uses data to predict demand, manage inventory, and

optimize its logistics network. This helps them ensure they have the right products in

stock and can deliver them to customers quickly.


 Fraud Detection: Amazon uses data to identify and prevent fraudulent activities,

protecting both the company and its customers.

 Pricing Optimization: Amazon uses data to dynamically adjust prices based on demand,

competitor pricing, and other factors.

Distributed Data Processing Models: To handle the massive volume of data, Amazon uses

distributed processing, which is like having a team of workers instead of one person.

EMR (Elastic MapReduce): This is a managed Hadoop and Spark service. Hadoop and Spark are

open-source frameworks designed for processing large datasets in a distributed manner. EMR

allows Amazon to easily spin up clusters of computers to process data in parallel.

Spark: Spark is a fast, in-memory data processing engine that's often used with EMR. It's great

for iterative algorithms and real-time data processing.

Other AWS Services: Amazon also uses other AWS services like Kinesis (for real-time data

streaming) and Lambda (for serverless computing) to process data.

How it Works: Data is broken down into smaller chunks and processed simultaneously across

multiple computers. The results are then aggregated to provide insights.

Data Challenges Across the Value Chain: Dealing with big data isn't always easy. Amazon faces

several challenges.

 Volume: The sheer amount of data is overwhelming. They need to store, process, and

analyze petabytes of data every day.


 Velocity: Data is coming in at a rapid pace. They need to process data in real-time or near

real-time to make timely decisions.

 Variety: Data comes in many different formats (structured, semi-structured,

unstructured). They need to be able to handle all types of data.

 Veracity: Ensuring data quality and accuracy is crucial. They need to clean, validate, and

transform data to ensure its reliability.

 Security: Protecting sensitive customer data is paramount. They need to implement robust

security measures to prevent data breaches.

 Scalability: As the business grows, the data processing infrastructure needs to scale to

handle the increasing volume of data.

Challenges and Recommendations of Their Data Modeling: Data modeling is like creating the

blueprints for how data is organized.

 Challenges:

o Complexity: The relationships between different data points can be complex, making it

difficult to design effective data models.

o Evolving Business Needs: Business requirements change over time, which can require

frequent updates to data models.

o Data Silos: Data may be stored in different systems, making it difficult to integrate and

analyze.

 Recommendations:
o Flexible and Scalable Models: Use data models that can easily adapt to changing business

needs and scale to handle increasing data volumes. Consider using a data lake approach

with a schema-on-read strategy, allowing for flexibility.

o Data Governance: Implement strong data governance practices to ensure data quality,

consistency, and security.

o Data Cataloging: Use a data catalog to document and manage data assets, making it

easier for users to find and understand data.

o Continuous Model Refinement: Regularly review and refine data models to ensure they

meet business needs and optimize performance.

o Focus on Data Lineage: Track the origin and transformation of data to improve data

quality and facilitate troubleshooting.

o Embrace Automation: Automate data modeling tasks, such as data discovery, data

profiling, and model generation, to improve efficiency and reduce errors.

Scalability, fault tolerance, and low latency are key technical requirements for an effective big

data analytics infrastructure (Zikopoulos & Eaton, 2011).


Reference List

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics.

International Journal of Information Management, 35(2), 137–144.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The

rise of “big data” on cloud computing: Review and open research issues. Information Systems,

47, 98–115.

Zikopoulos, P. C., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class

Hadoop and streaming data. McGraw-Hill Osborne Media.

You might also like