0% found this document useful (0 votes)
47 views2 pages

NIFI Project

The project aims to design an Apache NiFi workflow to ingest historical cryptocurrency price data from the Binance API for multiple coins, transforming it into Parquet format and storing it in HDFS. Key requirements include API integration, data retrieval, extraction, transformation to Parquet, and organized HDFS storage. Additionally, Spark scripts will be utilized for data handling, analysis, visualization, and correlation between different cryptocurrencies.

Uploaded by

Tauseef Nawaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views2 pages

NIFI Project

The project aims to design an Apache NiFi workflow to ingest historical cryptocurrency price data from the Binance API for multiple coins, transforming it into Parquet format and storing it in HDFS. Key requirements include API integration, data retrieval, extraction, transformation to Parquet, and organized HDFS storage. Additionally, Spark scripts will be utilized for data handling, analysis, visualization, and correlation between different cryptocurrencies.

Uploaded by

Tauseef Nawaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Project Objective:

Design and implement a robust Apache NiFi workflow to ingest historical cryptocurrency price data
(OHLCV – Open, High, Low, Close, Volume, etc.) from the Binance API for multiple coins (BTC, ETH,
ADA, Tron, XRP and BNB), transform the raw data into Parquet format, and store it efficiently in an
HDFS environment, with separate folders for each cryptocurrency.

Columns of interest:

• timestamp - A timestamp for the minute covered by the row.


• Asset_ID - An ID code for the cryptoasset.
• Count - The number of trades that took place this minute.
• Open - The USD price at the beginning of the minute.
• High - The highest USD price during the minute.
• Low - The lowest USD price during the minute.
• Close - The USD price at the end of the minute.
• Volume - The number of cryptoasset units traded during the minute.

Key Workflow Requirements:

API Integration: Utilize Binance's REST API (/api/v3/klines or similar) to fetch historical candlestick
data.

Understand Binance's rate limiting policies and authentication needs to design appropriate flow
throttling.

Data Retrieval:

Implement batching logic to overcome the 1000-record limit per Binance API call, retrieving large
historical timeframes in iterative requests.

Construct API calls dynamically based on target cryptocurrency, desired timeframe, and
parameters for intervals (e.g., 1-day, 4-hour candles, etc.).

Data Extraction:

Use JSONPath expressions to isolate the necessary fields (timestamps, OHLCV) from Binance's API
responses.
Validate extracted data for correctness and handle potential inconsistencies.

Transformation to Parquet:

Define a suitable Avro schema for representing the candlestick data, ensuring it includes
timestamps and proper data types for price and volume columns.

Employ NiFi processors (or potentially a custom processor) to convert the extracted data into Avro
format conforming to the schema.

Generate Parquet files optimized for efficiency with columnar storage and proper compression.

HDFS Storage:

Create organized folder structures within HDFS to segregate data by cryptocurrency (e.g.,
/datalake/binance/BTC/, /datalake/binance/ETH/), potentially nested with date components for
easier querying.

Design appropriate file naming conventions for Parquet files within each folder.

SPARK

• Simple spark scripts to create dataframe that combines the different coins to do
analysis.
• Data handling and processing
• EDA
• Visualization
• Correlation between pairs

You might also like