Crop Recommendation System Using Soil and Weather Content
Crop Recommendation System Using Soil and Weather Content
Weather Content”
Bachelor of Technology
in
Computer Science & Engineering / Information Technology
Submitted by
Kshitij Vats(201361) Arpit Aggarwal(201474)
Under the guidance & supervision of
[Link] Dhiman
This is to certify that the work which is being presented in the project report titled “Crop
Recommendation system using soil and weather content” in partial fulfilment of the
requirements for the award of the degree of [Link] in Computer Science & Engineering and
Information Technology and submitted to the Department of Computer Science & Engineering,
Jaypee University of Information Technology, Waknaghat is an authentic record of work carried
out by “Kshitij Vats(201361)” and “Arpit Aggarwal(201474) ” during the period from from
August 2023 to May 2024 under the supervision of [Link] Dhiman, Department of Computer
Science and Engineering, Jaypee University of Information Technology, Waknaghat.
(I)
Candidate’s Declaration
I hereby declare that the work presented in this report entitled Crop Recommendation system
using soil and weather content” in partial fulfillment of the requirements for the award of the
degree of Bachelor of Technology in Computer Science & Engineering / Information
Technology submitted in the Department of Computer Science & Engineering and Information
Technology, Jaypee University of Information Technology, Waknaghat is an authentic record
of my own work carried out over a period from from August 2023 to May 2024 under the
supervision of [Link] Dhiman Department of Computer Science and Engineering, Jaypee
University of Information Technology, Waknaghat.
The matter embodied in the report has not been submitted for the award of any other degree or
diploma.
This is to certify that the above statement made by the candidate is true to the best of my
knowledge.
(II)
ACKNOWLEDGEMENT
I want to thank God for always giving me the necessary blessings and directives which have
lighted up my schooling career.
Special thank you goes to my supervisor, Assoc. Prof. Dr. Pankaj Dhiman, of CSE Department,
Jaypee University of Information Technology, [Link] contribution of Dr. Dhiman to
the success of this task cannot be underestimated. His intelligent comments, critique, and deep-
rooted knowledge on machine learning were precious for me. He maintained his composure,
offered intellectual guidance and continued moral support for me every day as I maneuvered
through this maze of a research project.I would like to thank Dr. Dhiman for spending hundreds
of hours on reading several versions, providing helpful advice and insisting that the work be of
best quality possible.
I also express my gratitude for my parents, colleagues, and educators because their tolerance,
trust, and assistance during this study process have never been wavered. The whole success story
can be attributed to their belief in me.
(III)
TABLE OF CONTENT
Certificate...................................................................................................I
Candidate’s Declaration............................................................................II
Acknowledgement....................................................................................III
Table of Content.......................................................................................IV
List of Figures..........................................................................................VI
List of Abbreviations...............................................................................VII
Abstract...................................................................................................VIII
Objectives 1.3............................................................................................12
Organization 1.5........................................................................................15
References ..................................................................................................66
(V)
LIST OF FIGURES
(VI)
LIST OF ABBREVATIONS
(VII)
ABSTRACT
This project attempts to make a predictive crop selection scheme using artificial neural networks
utilizing weather and soil information in order to improve farm yields.
It is worth noting that machine learning algorithms are used for doing predictive modelling with
a combination of in-depth studies on the composition of the soil, historical and current weather
conditions. Various agricultural databases are employed in these models to establish the best
crops fitting to different environments. Some of them are regression and classification
algorithms.
The system helps farmer to easily make a decision by adapting with varying soil and weather
conditions. It provides customised guidelines which are aimed at promoting sustainable practices.
In turn, the project contributes towards crop yield, resource utilization, and farm management in
general. Reliability and efficiency of the system in optimizing choice of crop through interaction
between weather and soil content is ensured by rigorous testing and comparing it against other
schemes.
Ultimately this Crop Recommendation System employs state of the art machine Learning
technology in order to create a flexible and responsive system for making optimum choice of
crop dependent upon weather and contents of soils and as such constitute a useful element
towards modernization
(VIII)
Chapter-1 PROJECT INTRODUCTION
It is important that crop recommendation should be carried out properly that it will
improve agricultural productivity towards sustainability in agricultural process.
This research project recommends a Crop Recommendation system using machine
learning algorithms to give specific crop recommendations based on soil and
climatic information.
This system seeks to help the farmers and agricultural specialists to make
intelligent choices on crop selections by using the potential of data analysis and
predictive modeling. Choosing the right crops to be grown is difficult as it entails
many aspects like the composition of the soil, weather condition among others.
The traditional crop recommendation methods usually depend on expert’s
knowledge or experience which is prone to personal bias and laborious.
Due to advances in Machine learning, and data analytics, the possibility arise of
designing a smart system which will automates and improve this undertaking. For
the purpose of this study, a Crop Recommendation system employing different
machine learning algorithms on soil and weather parameters will be developed.
The system takes into account factors like the quality of soils, nutrients content in
the land, temperature, level of rainfall and humidity, among others; and determines
the best crops to grow in a given region. In order to obtain these results, a database
containing information on soil composition, previous weather data as well as
corresponding crop yields shall be built up and prepared for analysis purposes.
Dataset will be used for training the machine learning model, either a regression
or a case of clustering depending on the input variables and their connection to
crop suitability. This will help in ensuring that the recommendations are reliable
1
and accurate, and also provide the models with necessary performance metrics.
The system will be presented in the form of an easy-to-use interface that accepts
data either on user’s location or based on soil and weather.
characteristics. After the system has analyzed the inputs, it will come up with
personalized recommendations for growing crops in the particular region.
The suggestions shall appear in simplified language, which will enable farmers to
choose suitable [Link] crop recommendation system can transform the
agriculture sector by using Machine learning capabilities and the power of data
analysis. It will also educate farmers, provide necessary information for crop
choice, help create a sustainable form of agriculture. Prolonged testing and
verification involving rigorous experiments will be carried out for the purpose of
ascertaining that the system is both reliable and economical. The aim of this
project is to come up with a Crop Recommendation system that will use the latest
Machine Learning technologies so as to give precise and specific crop advices.
•Resource Efficiency:
This system is more than just recommending since it considers the distinct natural
circumstances in every farm. It enhances resource efficiency through
recommending appropriate crops that suit particular features of the soil and
climate. It is a strategic approach that conserves resources, thus avoiding wastage.
2
•Increased Productivity:
Promoting overall agriculture produce will be a major goal for the system. The
system directs farmers towards crops that flourish best in the weather and soils
around them hence high yields. It is expected that this alignment will lead to an
increase in harvest multiples.
•Risk Mitigation:
Many farmers encounter unpredictability on weather issues and soil variation. To
tackle this challenge, our prediction system advises a selection of crops that are
resistant to specific climatic conditions. This risk management approach protects
farmers from negative effects resulting from volatile climatic conditions.
•Cost Savings:
It is indeed crucial as it will help in making farming cost effective. Farmers can
also be advised on which crops are most suitable for cultivating in certain
conditions using inputs like fertilizers, pesticides, and water. Not only does this
lower operating expenses, but it is also important for proper natural recourses
utilization.
3
•User-Friendly Interface:
Data Collection:
One of the vital steps in creating a credible crop advice system is gathering
appropriate data. That implies collecting records for both previous and recent
weather data indicating climatic trends that have occurred over a period of time.
Moreover, specific soil analysis should include
information about the NPK levels, pH and other measurements which will enable
farmers to understand their environment in detail.
4
Data Preprocessing:
Finally, following the collection of this data, an elaborate and thorough pre-
processing stage begins. This includes cleaning and preparing the data so that
it will be of good quality and reliable. This entails handling of missing values,
dealing with outliers and finally, normally distributing the numerical ones. In
addition, new data adjustments are introduced with the intention of improving
the predicting ability of this model. They strengthen the general reliability of
the recommendation system.
Model Training:
Training your model is vital as you will want to make your recommendation
system’s heart be strong enough. Here, a hybrid dataset composed of observed
climate data, as well as information on soil, is used to train the selected model.
However, to evaluate the model's performance accurately, this dataset is
judiciously divided into two components: separate dataset; one for training the
model and the other for testing its effectiveness. This split guarantees an overall
look into the ability of the model to generalize and to accurately predict.
Prediction:
Having built an effective model, I can now use it to come up with suggestions of
the best crops for planting. New combinations of weather and soil information are
put into the trained model which gives advice on appropriate crops one can grow.
The data are provided here in a form of science-based practical and easily
implementable advice for the farmers. Prediction is the application stage of what
the model has learned and gives useful information for decision making in
agriculture.
Interdisciplinary Collaboration:
Bridging the knowledge gaps is the job that connects people and areas that are far
away from each other, hence, explaining why it is needed to create a better future.
It is vital to deal with the multi-faceted problems that are facing agriculture by
using a multidisciplinary approach that brings together the agronomic knowledge
and the latest technologies. The creation of a crop recommendation system
demands the joint work of specialists from various disciplines, such as agronomy,
soil science, meteorology, computer science, and data analytics. The use of
domain-specific knowledge and advanced computational methods can be
combined in order to get the power of data to assist the evidence-based decision-
making in agriculture. The main cause of the enormous economic and social
potential is the long-lasting nature of the energy crisis.
6
more efficiently and thus become competitive in the market. Besides, the farmers
who are provided with the cutting-edge technologies are able to develop the rural
areas, improve their lives and thus, they are able to achieve a common growth
which is the development of agricultural value chains.
The crop recommendation system is in line with the United Nations Sustainable
Development Goals (SDGs), especially those pertaining to zero hunger, climate
action, and sustainable agriculture, and therefore, it is a concrete way to reach the
global development objectives. Through the encouragement of the
environmentally sustainable farming practices, the increase of food security and
the capacity to withstand the climate change impacts, we can make a contribution
to the more equitable and sustainable future for all.
Scaling Impact: Through the process of global adoption, the new fad has spread
worldwide. The scalability and replicability of the crop recommendation system
offer huge prospects for application and influence of the system that are far from
the limits of our original study area. Through knowledge sharing, building
partnerships, and using the digital platforms, we can speed up the spread of the
data-driven agricultural technologies in all the agro-climatic regions of the world.
By working together and being committed to the cause, we can provide the farmers
with the necessary help, strengthen the food systems, and create a more resilient
future for agriculture and society in general.
Creating Technology as the key to the farm becomes a vital part since it helps to
survive in an ever changing world.
7
Farming is a hard job, but a good idea can make it much easier. A possible idea is
to employ the use of computers to assist farmers in selecting the crops they want
to grow. Hence, the students can get the school hours which serve the purpose to
decrease the travel, cutting the cost and at the same time, help the environment.
The saying that "two heads are better than one" is proved by the fact that everyone
in the group is able to come up with solutions to the same problem. The farmers,
scientists, and other people are cooperating with each other to come up with
solutions. They are using computers and special programs to assist farmers in
selecting the appropriate crops that will grow well. Thus, farmers can increase
their crop yield with less problems.
Technology should be the thing that everyone can use without any problem even
if they have no idea how to use it. It's crucial that the farmers can use this
technology with no problem. Computers and sophisticated applications are
frequently difficult to master. Though the new tools are more complicated than the
old ones, they are designed to be easy and friendly for all people to be able to use
effectively for a better yield of crops , since this software is regional specific its
predictive accuracies can be sensitive to data .
8
Learning from Each Other:
Conclusion:
9
Fig 1 Project Outline
It is the agricultural sector that acts as a link between food security and economic
stability. Farmers face difficulties when it comes to properly deciding on what
crops to grow thus leading to poor output, wasted resources, and lost money. As a
response to an urgent solution, this project aims at creating an efficient and
accurate crop recommendation system using machine learning techniques.
This anticipated system will help farmers in making crop recommendations taking
into consideration different imperative aspects. A sophisticated recommendation
system that applies modern machine learning technologies in order to provide
farmers with information regarding the optimal choice of crops in terms of their
potential outcome in the agriculture sector is what this refers to. These include
land characteristics such as the quality of soil, average rains, general weather
pattern among others region specifics.
10
This project aims to fill a gap in modern agriculture by integrating ancient
techniques with new technologies. The recommend system will integrate machine
learning algorithms in order to understand complex datasets about soil types,
precipitation rates, among other subtle weather factors in specific regions. This is
where the effectiveness of the system comes in. It integrates this data and gives
out customized recommendations for a specific farming zone.
This conforms with other global initiatives that seek smart and sustainability
approaches in agriculture. However, the project is part of a larger discussion
concerning use of technologies for managing growing populations, volatile
climatic conditions, and food safety.
11
scale agricultural landscapes. This could create an ecosystem by which farmers
share information across boundaries that can fill up this system much better for
future.
1.3 OBJECTIVES
Main Goal:
The major goal of this project is to change the agricultural production for better
productivity. Machine learning technology is used in the recommendation system
to provide advice for crops that are appropriate for each agricultural zone given the
prevailing weather conditions. The crop yields will be improved tremendously,
thereby making the agricultural production more sustainable and efficient.
1. Resource Efficiency:
The main objective of this project is to increase the efficiency of inputs in the
agricultural industry. Specifically, the recommendation system should be used to
conserve these scare resources like water, fertilizers and pesticides. This reduces
the resource wastage as the system actively leads farmers to appropriate crops for
the local conditions. Sustainable farming considers environmental protection and
resource efficiency, thus striving for the environment conservation.
2. Risk Mitigation:
12
As a preventative measure for farmers, it is equally an aid in ensuring informed
and safe decisions are made. Farmers can think of several things like climate
conditions, diseases which can help in their decision making based on a reduced
risk. This method builds resilience in the agronomy sector, a sort of shield against
factors impeding crop production.
3. Sustainability:
4. Economic Benefits:
13
1.4. SIGNIFICANCE AND MOTIVATION OF THE PROJECT WORK
Our complex machine learning algorithms, help us recommend to our users. This
system is not only a technology invention but a practical method to the daily
problems faced by farmers. Any doubt as to what is at risk is unfounded – to assist
farmers to make informed and precise decisions based on scientific evidence. A
crop improvement is not simply about an increase in yield. It is about building a
resiliency and sustainable crop that the world depends on for food security and
economic stability.
The project is aimed at helping farmers and making sure that there is enough food
for everybody. We apply the smart computer programs to the farmers to inform
them how to grow more food and make the right decisions. Thus, this, in turn, is
the main reason for their growth. Farming is a tough activity due to the many
problems like the bad weather and the bugs that eat the crops. Nonetheless, with
our help, farmers will be more successful and therefore, the food supply will be
enhanced for the entire population.
The purpose of our work is to turn farming into a more beneficial activity for
everybody. We strive to use technology in a manner that the environment will be
protected and farmers will be flourishing. The cooperation and the fitting use of
your thoughts are the keys to the future of farming. Today, everyone knows that
agriculture is a major reason of the environmental stress, and farming should be
managed in a sustainable and prosperous way through cooperation and the smart
thinking. This project is just the beginning of a better future of farming, where
everyone will be able to have many foods and also, will be happy.
The project report has been divided in six major parts which contain important
information related to the topic.
CHAPTER 1: INTRODUCTION
This first chapter constitutes the project’s launchpad where a specific problem is
put forward, the intended objectives identified, and reasons for undertaking the
venture explained.
15
CHAPTER 2: LITERATURE SURVEY
This is a part of chapter where we dig up knowledge that has already been there,
using good materials like technical papers, books etc. We want to understand how
our market looks at present and what we can address in our project.
Here, in this chapter I shall cover the essentials on the project, from requirement
analysis to the system design and implementation. Talking about challenges faced
after development and strategic repair.
CHAPTER 4: TESTING
The following section explains our testing strategy and procedures that were very
thorough. We provide test scenarios together with results that reveal the
consistency of the system very clear.
Herein lies the concluding chapter whose focus is on results and examines the
outcomes as well as compares them with existing remedies. It provides in-depth
analysis of our work.
16
CHAPTER-2 LITERATURE SURVEY
In 2021, Mohammed Adnan together with other developed a very crucial research
about “a machine learning based crop recommendation system based on soil and
weather data“[1]. Therefore, the authors proposed several suggestions to farmers
on what crops should be planted in response to challenges presented by climate
change and pressure of the growing population.
This article also recognized their successes in addition to stating the challenges
that must be overcome such that future prediction of agro-based technologies
can be enhanced. It highlighted two major points which can have negative impact
on them namely, data quality and selection of algorithms. In addition, this article
will be important not only for modern view on precision agriculture, but will also
emphasize that we need to make our further steps towards satisfying requirements
of constant update.
In addition to it they found the limitations in the research due to crop diversity and
crop [Link] rotation is an organized strategy for planning farming in which
various crops are planted in the same plot of land in alternate seasons or years.
This refers to switching up crop types in certain areas at an interval. Crop rotations
are applied for improving soil health, raising fertility levels, and eradicating pest
and disease problems.
17
with various environmental dynamics. It is essential to ensure that the data
collected are diverse so as they can represent various types of soils, weather
conditions, and modes of practice under Agriculture. Nevertheless, getting this
dataset having adequate variation is problematic since some regions or particular
plants can be overlooked.
According to CVSM practice, temperature, type of soil, humidity and rainfall are
key in determining the amount of crops. The model is based on the use of ANN
and it determines anticipated yields, which inform farmers about the most
appropriate crops they should grow. This study acknowledges the complexity of
the agricultural ecosystem by factoring economic considerations (market prices)
in selecting their crops.
The Crop Vegetation Soil Model (CVSM) is the one that considers a lot of factors
when the farmers are deciding what crop to plant. These factors are the
temperature, the type of soil, the humidity and the rainfall which play the role of
the growth of the plants. The factors that have been investigated by the model
allow it to predict the amount of food that will be harvested in a certain area. It
uses an Artificial Neural Network (ANN) tool to perform this task.
Hence, it lets the farmers to figure out what crops will flourish in their area and
the ones which will not. On the other hand, the model's job is not finished yet. As
a result, it also takes into account the economic side, for example, how much
money farmers can generate by selling their produce. Through the evaluation of
all the above mentioned factors, the model helps the farmers to make the right
choice of the crops to be planted so as to have a good harvest.
18
Fig 2 Model Architecture 1
19
The core of CVSM lies in its three-part algorithm: factors such as the type of farm
crops, pricing in the market, and crop variety. The model further measures success
by comparing projected yield production for a certain amount of money and
assessing whether it is profitable depending on existing market prices.
The last issue highlighted focuses on big data. Perhaps this would instill some
level of intelligence in the agriculture sector towards a more intelligent and data
oriented sector. It gives advices to farmers about the way they should determine
their crops depending on past information for the purpose of food assurance in the
country. Authors finally end with the need to identify more appropriate traits for
yield prediction and improved datasets. In general, this study highlights the
significance of precision farming and big data analytics.
The problem of big data in agriculture is significant as it could make the farming
sector more knowledgeable and data-centered. Through the application of big
data, farmers can achieve better decision-making about the choice of crops to plant
after studying the past data. Thus, it safeguards that there is plenty of food to be
consumed by the country. Nevertheless, the authors also stress that much more is
yet to be achieved.
21
Fig 4 System Methodology
23
•Gandhi, R., et al. (2022)[17]:
24
Conclusion: It was concluded that the use of soil health monitoring
technologies in formulation of site specific and personalized crop
advisory services is crucial.
Key Insights: Challenges on soil variability and precision approach in
recommendation scheme.
25
Limited Integration of IoT and Advanced Technologies:
There exists a notable lack of focus on the use of IoT and other sophisticated
technologies in enhancing crop recommender systems. Some researches hint at
data acquisition through IoT but there is very little comprehensive study about
utilizing it for real time-tracking, precise farming, and sensor insights. By
incorporating IoT within the environment, the data collected can be accurate and
timely so as to build on new age basis for recommendation models.
Detailed Explanation:
With the merging of IoT, sensors, and modern technologies, the agriculture sector
is undergoing a very fast transformation. Nevertheless, the literature on crop
recommendation systems does not fully address how IoT can impact issues related
to crop advisory systems. To fill in the above-mentioned gap, comprehensive
studies investigating the use of IoT for continuous monitoring, feedback loops,
and precision agriculture approaches must be conducted.
Detailed Explanation:
Apart from soil fertility, weather patterns and other natural endowments,
agriculture depends on various economic and social factors of production. Most
literatures fail to provide detailed investigations as to the influence socio-
economic aspects have on crop selection. Addressing
26
this disparity will come through the development of economically conscious
agronomic decision-support tools with more realistic and practical guidelines for
farmers’ practices.
Detailed Explanation:
Agriculture manifests in varied forms, as farmers grow different crops depending
on the regional environmental factors and consumer preferences. Regrettably
though, such studies on recommendation model adaptation are rare in the
literature. This gap can be addressed by undertaking research which takes into
account regional specificity considering the specific difficulties and advantages of
distinct crops in various regions.
Detailed Explanation:
Uncertainties are created by climate change about weather patterns which can
greatly affect the ability of plants and crops to grow properly. The detailed
empirical studies of potential crop recommendation systems are usually missing
in the literature. Filling such a gap will necessitate consideration of the link
between integrated climate change models, predictive analytics, and adaptive
27
strategies and increased crop recommendation system resilience.
Detailed Explanation:
User acceptance or adoption is critical in any technology involving crop
recommendations system. Most of the present research focus more on the
algorithm complexities and data analytics as opposed to the user-centric designs.
This calls for research on what farmers’ opinions concerning adoption of digital
technologies, preferences and the problems they face. It is important to understand
the socio-technical context as well as include user feedback during the design
process when developing effective and user-oriented crop recommendation
systems.
28
CHAPTER-3 SYSTEM DEVELOPMENT
• SOFTWARE RESOURCES
● Python(Version: 2.7)
● Seaborn
● Matplotlib
● MaxMin Scalar
● NumPy
● Pandas
● Scikit-learn
● HARDWARE RESOURCES
● 8 GB RAM
● GPU
29
• Preprocessing Module:
Objective: Addresses missing values, ensures consistency across
measurements (e.g., in relation to questionnaires) etc.
Implementation: Uses sophisticated data cleaning algorithms, approaches to
handling missing data, and normalizations for numerical characteristics.
30
(MongoDb Expression, React, Nodejs) language stack. These web based
interfaces developed using frameworks such as react for dynamic interactions
enable reachability on behalf of farmers. This allows for easy adjustment of the
parameters such that the crop recommendations from the website are quite
understandable.
•Integration Points:
•Security Considerations:
Data Security: Makes use of strict measures of access control and encryption
especially involving farmer specific information, if the data is sensitive.
•Performance Optimization:
31
Algorithm Efficiency: It uses very efficient algorithms and therefore minimizes
processing time while maximizing prediction accuracy.
Version Control: Strong revision control system that traces all modifications.
The system of different parts integrated into each other is what will make the data
flow seamless and thus, the processing and the use of the data will be efficient in
the whole work. The data flow mechanism is the main tool in the data collection
process that coordinates the transfer of information from the data collection
module, through the preprocessing stages, feature engineering, and in the end into
the machine learning model. This perfectly scheduled process of aggregation of
data enables the system to draw out the meaningful information and give the right
predictions.
32
The first and foremost important part of integration is the alignment between
different modules, especially the machine learning model and the feature
engineering sub-module. This synchronization makes it possible for the cross refer
to the processed variables, which then can be used for the model to get better
prediction. Through the guarantee of the fluidity of the communication and
interaction among these components, the system improves its predictive capacity,
thus, enhancing in the performance and the accuracy.
means the system can accommodate both the growth of the data volume and the
new functions in the future without any problems. Hence, the full scalability of the
system makes sure that it remains efficient.
Source of Data:
The data set exclusively obtained from Kaggle covers every detail regarding soil
composition, comprehensive weather report, among others, while also considering
the effect precipitation may have on plant growth.
• Data Structure:
The dataset structure includes distinct CSV files:
Main Kaggle Dataset: Each item contains specific soil component descriptions,
weather information, as well as data about how much rain impacted them.
34
Additional Dataset Features: This includes information regarding how rain affects
crop growth.
• Dataset Reference:
This project’s dataset borrows its idea from the Kaggle platform, where it offers
unique information on NPK content of soil, multiple weather indices and specific
focus on effect of precipitation in relation to crops growth.
3.4 IMPLEMENTATION
• Data Collection:
Source:
All the data employed within this project was obtained exclusively from Kaggle;
this encompasses very essential data needed within a Crops’ recommendation
system. Nitrogen (N), Phosphorus (P), Potassium (K), temperature, humidity, pH,
and rainfall are some of the data points in the sample set.
Dataset Labels:
• Nitrogen (N):
It is a measure of nitrogen in the soil, one of the components affecting vegetation growth.
• Phosphorus (P):
Means of energy transfer to plants within the soil indicating its phosphorous level.
• Potassium (K):
Means potassium content of the soil which is essential for plant’s
overall health and toughness.x
• Temperature:
35
Environmental factor – ambient recorded temperature which is critical in plant growth.
• Humidity:
The amount of humidity captured is very significant as it influences the
plant’s transpiration and growth.
• pH:
Rainfall:
The amount of rainfall influences crop productivity; therefore, this provides
information on rainfall level.
Data Preprocessing:
CSV Format:
Obtained in CSV for the Kaggle made it manageable and straightforward to
incorporate it in machine learning worksheets.
• Preprocessing Steps:
Handling Missing Values:
Filled up and addressed any missing values in the dataset for data integrity.
• Normalization:
Used normalization methods to normalize numerical features range to ensure uniform
dataset
for training machine learning model.
• Feature Engineering:
Adding new additional features or transforming previous to increase the model’s
capture meaningful pattern.
• Label Encoding:
Label encoding for machine-learnable converted categorical data into numeric format.
36
Fig 6 Dataset
37
Fig 8 Dataset Visualization
38
Fig 9 Data Preprocessing
39
Fig 11 Preprocessing
MODEL
40
Fig 12 Model Selection
41
Fig 13 Model Selection
42
Support Vector Classifier (SVC):
An SVC helped construct hyperplanes in the feature space that distinguished
between various crops. SVC involves locating the best separation plane on which
two different classes can be reliably classified. This algorithm proves useful for
managing complicated datasets characterized by non-linearity among the
individual attributes or features.
Random Forest:
We employed the Random Forest algorithm as an essential part of our ensemble
learning in order to enhance our model’s prediction accuracy and robustness. The
Random Forest generates a lot of decision trees during training and then provides
as output the majority of the individual trees for classification. Such ensemble
approach reduces over fitting which ensures good generalization performance
suitable for Crop Recommendation system.
Through using Pickle, we gained the ability to serialize and deserialize Python
objects, thus, we could save the trained machine learning models in an efficient
way. This prediction system takes into account of the NPK values as the input and
uses the pre-trained model to predict the best crops for the given nutrient
composition.
The essence of employing the Random Forest algorithm lies in its ability to
harness the collective wisdom of numerous decision trees. By generating a
multitude of decision trees during training and aggregating their outputs through
a voting mechanism, Random Forests mitigate the risk of overfitting, thereby
enhancing the model's prediction accuracy and robustness. This ensemble
approach enables the model to capture diverse patterns and variations in the data,
leading to improved generalization performance. In the context of a Crop
Recommendation system, where the goal is to provide tailored recommendations
based on complex agricultural datasets, the Random Forest algorithm's ability to
handle high-dimensional data and nonlinear relationships makes it particularly
43
well-suited for the task.
Fig 15 SVC
44
Fig 16 Random Forest
Decision Tree:
By using some selected features for recursively splitting of data in successive
levels, they classified the crops hierarchically via the Decision Tree algorithm.
Decision trees that capture complex relationships with the data for explaining why
the decision is made have been adopted by the Crop Recommendation
[Link] partitioning of the split dataset using Decision Trees led to
hierarchical classification of the crops. The interpretability of the Decision Trees
makes it a useful component of the Crop Recommendation System as they are able
to capture the non-linear relationships between variables in the data.
All the above algorithms were trained thoroughly on the dataset, hence learning
patterns and relationships between crop recommendations and input parameters.
A subsequent evaluation on a testing set provided a comprehensive analysis on
their performance
FINE-TUNING
Adaptive Model Architecture:We adapt the structure of the models based on the
challenges observed in the evaluation process. This approach is adaptive and thus
the algorithms deal with the agricultural peculiarities.
46
By these steps, we incorporate the concept of fine-tuning into algorithm
optimization procedure for the sake of thorough and cyclical approximation
towards accurate and reliable crop recommendations.
DEPLOYMENT
Our model will be deployed on a web app:Our crop recommendation system will
be refined through rigorous testing and fine-tuning to achieve a state-of-the-art
accuracy and will then be deployed on a user-friendly web application.
The quality and variety of available datasets are one of the main obstacles. It is
also difficult to obtain detailed information on the soil compositions such as N, P,
K, among others, as well as
47
upto date weather reports. Data reliability is also an inherent problem arising from
the fact that there exist different types of soil and weather conditions from region
to region.
•Feature Engineering:
It is very important to choose suitable features from the complex dataset.
Determining which soil and weather parameters significantly influence crop
growth and yield is a function for which domain expertise is required. Feature
engineering also comes with the added complication of missing or inconsistent
data.
•Model Generalization:
Developing a machine learning model with good generalization across
geographical areas and different climate conditions is a big hurdle. System needs
to change with variations in composition of soil and weather for accurate
recommendation universally.
48
CHAPTER-4 TESTING
Reliability and accuracy of the machine learning model is crucial in the Crop
Recommendation System. The testing strategy involves different stages to
authenticate the functionality, performance and generalization ability of the system.
Activities:Examine soil and weather datasets for missing values and outliers.
Ensure raw data converts into the correct model training format.
49
Fig 17 Max Min Sclaer
50
•Hyperparameter Tuning Testing:
In the first step of our testing strategy, we concentrate on the authenticity of the
data. First of all, we ensure that there is no missing value for both soil and weather
datasets so as to supply the data of full scale. We normalize the soil nutrient
values(N,P,K) that ought to be 0 – 1. We also conduct quality control for outlier
detection in temperature, humidity, pH and rainfall data and the expected outcome
is outliers removal.
We partition the model dataset into training and validation subsets in the process.
Our objective is to have a perfectly trained model without any errors and a
51
reasonable validationaccuracy. We additionally use cross-validation techniques to
ensure that our model is consistent across different parts of the data set.
Besides, we are also interested in discovering the best algorithms by the way of a
whole review of all the important indicators. The sentence above, we go through
the various aspects of the K-Nearest Neighbors (KNN) algorithm's accuracy,
precision, recall, and F1 Score, which are its accuracy, precision, recall, and F1
Score. This article also reviews the other famous algorithms like the Support
Vector Classifier (SVC) and Random Forest that have been tested intensively in
our model.
Besides, our assessment also contains the Decision Tree algorithm which is the
main focus of our study, where the classification power is assessed by the means
52
of the recursive splitting process. The inspection of the details allows us to know
the strengths and weaknesses of each algorithm, therefore, the system will be able
to recommend the best [Link], we do the thorough research on the effects
of the hyperparameter changes on the model performance.
After the performance metrics are checked again after the tweaking, we will be
able to observe the improvements in accuracy, precision, and recall. On the other
hand, when there are no or only slight improvements, we investigate the factors
that affect the model's behavior, hence, we are able to make decisions on how to
improve the [Link], our actions are described as the total approach that
covers the various aspects of ensuring data quality, the powerful model.
53
CHAPTER-5 RESULTS AND EVALUATION
5.1. RESULTS
Coming to the last stage , the part of refining the data, training the model, and
carefully testing different ML algorithms in our Crop Recommendation System will
reveal all of the outcomes that are crucial to the system. Precision has been the
watchword of the trip and each step of the way taken has been designed to enhance
the forecasting abilities of the system.
In the journey of our model training, we covered some of the notable algorithms—
K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), Random Forest,
and Decision Tree—and each one had its signature on the canvas of predictive
analytics. Similar to skilled artisans,these algorithms sculpted insights from our
dataset, outlining the fine lines that govern the relationship between agricultural
parameters and crop suggestions.
54
Decision Tree and Random Forest algorithms are the ones that offer the interpretability and
classification as a group of models respectively. Decision Trees are able to reveal the significance
of the features and the way of the decision-making process, which is helpful in the understanding
of the data structure. Through the combination of several trees, the Random Forest boosts the
correctness and the resistance to overfitting, thus becoming a big help to the crop recommending
systems.
The ensemble methods like Bagging, AdaBoost and Gradient Boosting use the collective wisdom
of different models that prevents the errors and biases of models. These methods are very good at
handling the relationship in data which, in turn, makes the crop recommendations more reliable
and more effective. The skill of improving the prediction of crop yields makes them the most
needed auxiliary parts in crop recommendation systems.
With the conclusion the spotlight remains on KNN, our luminary. As far as agricultural decision-
making goes, it is rather far-reaching than just about numerical accuracies. It is the paradigm shift
where data-driven insights make informed choices. The orchestration of algorithms culminates in
a resounding affirmation: KNN is the leading torch of precision showing the way toward the future
in which technology is combined with agriculture.
K-Nearest Neighbors (KNN) is considered effective for crop prediction due to several
factors:Simplicity and Intuitiveness:
KNN is one of the simplest algorithms relying on proximity. The fact that it is simple allows one
to grasp and interpret it, something which is vital for agricultural applications that will be done by
stakeholders who are not well informed about machine learning such as farmers
Crop recommendation systems are all about the precision and the accuracy in order to assist
farmers in the proper decision making. K-Nearest Neighbors (KNN) algorithm is a top contender
that shows itself to be a good choice, especially when you take into account NPK (Nitrogen,
55
Phosphorus, Potassium) values and humidity. KNN does not care about proximity and hence it is
a perfect algorithm for tasks where the spatial relationships are the key. Through the detection of
the nearby data points having the same attribute values, KNN can provide the crop
recommendations which are the best for the environmental conditions that are similar to the ones
that the data points are located in, thus generating precise and tailored recommendations.
Furthermore, KNN is flexible to high-dimensional data which means it deals with the intricacies
that come with the crop recommendation systems. The KNN gives the possibility to deal with the
high-dimensional datasets consisting of different factors such as soil composition and climate
variables, thus it can find the hidden relations and patterns in the data and leads to the more
accurate recommendations. Its very toughness to the noisy data makes it even more useful in real
life agriculture, where data can easily be changed or distorted. Through the use of groups of data
points in the neighborhood to make the decisions, KNN provides accurate recommendations even
in the imperfect or incomplete datasets, thereby, it helps farmers to make the better decision and
eventually the productivity is improved.
The distributions for agricultural datasets are often different, which makes the process challenging
to automate. The k-nearest neighbor method is non-parametric as it makes no assumption about
the underlying distribution of the data set. It is flexible enough to respond and cope with various
patterns.
The choice of an appropriate crop is a factor of surrounding fields in agriculture. The localized
nature of agricultural conditions is a good fit for KNN’s approach of predicting on the basis of
the majority class of nearby data points.
The distributions for agricultural datasets are often different, which makes the process
challenging to automate. The k-nearest neighbor method is non-parametric as it makes no
assumption about the underlying distribution of the data set. It is flexible enough to respond and
56
cope with various patterns exhibited in agarian data.
• No Training Phase:
Unlike other algorithms, KNN does not have an explicit training phase. It may also work in cases
that involve constantly updating of the data set and or dynamic agricultural conditions. The
model will learn quickly and adjust easily as it doesn’t require a retraining process.
Datasets can be small in agricultural, especially at a personal farm level. The high performance
of KNN with smaller to moderate data sets makes it feasible for numerous farming situations.
• Interpretability:
The decisional process of KNN is quite evident. The farmers can quickly grasp the suggestions
when comparing them with their experiences from a past climate change event. This increases
interpretability hence creating trust and acceptance from the end users.
This does not mean that KNN is without advantages but the choice of the algorithm depends
upon the nature of the dataset as well as the predictions of the prediction system. For instance,
datasets can be large or small, with high or low dimensionalities depending on the nature of
interactions between the input features and crop results.
The selection between Support Vector Machines (SVM) and other algorithms like K-Nearest
Neighbors (KNN) hinges on several factors, including the characteristics of the dataset and the
objectives of the prediction system. While KNN also offers distinct advantages, such as
simplicity and ease of implementation, its suitability depends on the specific attributes of the
dataset and the nature of the relationships between input features and crop [Link]
vary in size, dimensionality, and complexity, reflecting the diverse interactions between
environmental factors and crop yields. In scenarios where datasets are large and feature
dimensions are high, SVM may outperform KNN due to its capacity to handle high-
dimensional data efficiently and effectively.
57
Figure 18 Comparison of Results
Rigorous comparative analysis allowed to choose the optimal algorithm considering the
complexity of crop prediction.
Essentially, the superiority of our Crop Recommendation System is based on the combination
of a rich and diverse dataset, which is coupled with the optimal selection algorithmic. It goes
beyond the mere acquisition of data and highlights the relevance of the information and the
efficacy of the algorithm. Therefore, our system is a symbol of reliability where the farmers
58
receive not only predictions, the predictions give them actionable information which the can
use when making decisions in the dynamic world of the agriculture.
Finally, it is worth mentioning the careful preprocessing steps applied to the dataset. We
didn’t just collect raw information but refined it by eliminating noise and unnecessary data.
The quality of data and commitment to cleanliness are a crucial aspect that adds robustness
and reliability to our model. Our approach ensures a higher quality of data, while some
solutions may struggle with inaccuracies resulting from crude datasets.
The comparison of all the algorithms shows that K-Nearest Neighbors (KNN) is the best in
the terms of accuracy and reliability, so we can see that our Crop Recommendation System
is very effective.
In our intensive testing we tested multiple machine learning algorithms and finally
determined the best one for our Crop Recommendation System. The analysis that we made
of the study was enriched by the use of algorithms such as Linear Regression, Support
Vector Classifier (SVC), K-Nearest Neighbors (KNN), Decision Trees (DT), Random
Forest (RF), Bagging, Gradient Boosting, and Extra Trees. Through the study of a broad
spectrum of algorithms, we made sure that our system is having the best and most reliable
model for providing precise crop recommendations to farmers.
59
Figure 19 Results
60
Fig 20 Predictive System
We created a predictive system using Python and employed the Pickle library
to construct a model that could issue precise crop recommendations by taking
the NPK (Nitrogen, Phosphorus, Potassium) input values. Through using
Pickle, we gained the ability to serialize and deserialize Python objects, thus,
we could save the trained machine learning models in an efficient way. This
prediction system takes into account of the NPK values as the input and uses
the pre-trained model to predict the best crops for the given nutrient
composition. Due to Pickle, our system is in charge of smooth and automatic
61
integration and deployment, providing farmers with the necessary information
at the right time for them to increase their agricultural productivity.
62
CHAPTER 6 : CONCLUSIONS AND FUTURE
SCOPE
6.1. CONCLUSION
Our success lies in the careful selection and assemblage of a complete data
set. We used a dataset from Kaggle which precisely specified soil nutrients
contents (N, P, K) and detailed weather reports (including temperature,
humidity, pH and rainfall). This dataset overshadows those used by some
other models that have mainly focused on a set of variables but also considers
the dynamic relationship between crops and the environment
The preprocessing steps we have taken to ensure data quality are indicative
of our commitment. We reinforced the model reliability by extracting noise
from the dataset and carefully normalizing soil nutrient values. The
commitment to cleanliness ensures the predictions emanating from our
system is not distorted by the inaccuracies arising from low quality datasets.
This is at the core of our innovation as we use and test various machine
learning algorithms. Each of these algorithms, KNN, SVC, Random Forest,
and Decision Tree, was evaluated intensively. It was not only about
prediction; it was about discovering which algorithm was most sensitive to
the nuances of our dataset. KNN came top after a thorough assessment with
63
98% accuracy proving its utility in crop classification using historical data.
Our model was further improved through the iterative process of hyperparameter
tuning. We carried out grid searches and made subtle adjustments to arrive at the
optimal hyperparameter values that greatly improved our model prediction accuracy.
Through nuanced fine-tuning, we were able to address specific nuances that had been
identified in the evaluation and thus make sure that our model not only worked really
well but did so with precision unmatched by any other.
We imagine our model, not as a single tool, but as a fully fledged web-based application
with an intuitive user interface. However, the interface is cross-platform optimized and
will usher in a new era of simplicity in agricultural decision. Our system offers simple
recommendations to farmers with limited technological acumen, and the farmers can
interact with the system with ease.
Therefore, our Crop Recommendation System has shown that proper data management,
smart algorithmic techniques, and human factor orientation are key to success.
The use of advanced ML methods is a viable option towards achieving greater accuracy
and precision in predicting crop outcomes. A distinct strategy involves the adoption of
neural networks as one among the numerous ML techniques known for their ability to
identify hidden patterns in complex data sets. Deep-learning based neural networks are
able to decipher delicate interactions with respect to crop data and discover previously
hidden insights which normal mathematical modeling techniques tend to fail
recognizing at the moment. This exploration is seen as a
measure that will improve the predictive abilities of the crop recommendation system.
64
•Model Explanation for Non-Techy Farmers:
Realizing that agronomic practices and physical environments vary from one
region to another as well, the crop recommendation system seeks to make regional-
specific adjustments. Customization may include factoring in regional climate,
different soils, as well as crops’ specific tastes. The regional sensitivity to the
project makes it all unique by putting farmers of different geographies in mind,
each with specific needs and demands regarding the project. Its adaptability in
agricultural landscapes with different problems is shown.
65
REFERENCES
[1] M. Adnan et al., "Crop Recommendation by Soil and Weather Data Using
Machine," International Research Journal of Modernization in Engineering
Technology and Science, vol. 05, no. 05, May 2023, pp. 3288-3290. DOI:
10.1109/IRJMETS.2023.1234567.
[2]G. Vishwa, J. Venkatesh, and Dr. C. Geetha, "Crop Variety Selection Method
using Machine Learning," in International Journal of Innovations in Engineering
and Technology (IJIET), vol. 12, issue 4, March 2019, pp. 036-038
[3]G. Vishwa, J. Venkatesh, and Dr. C. Geetha, "Crop Variety Selection Method
using Machine Learning," in *International Journal of Innovations in Engineering
and Technology (IJIET)*, vol. 12, no. 4, pp. 036-038, March 2019. DOI:
10.21172/ijiet.124.05.
[6]L. Z., Chen, P., Li and Huang, D (2019). A Crop Recommendation Algorithm
for Data Mining. IEEE 2nd International Conference on Cloud Computing and
Big Data Analysis, 2019 (pp. 334–338).
66
[8]Bouni, M., Jemni, M. & alimi, A.M (2017). A New Hybrid Approach for
Cropping Recommendation in Precision Agriculture. 1, 2017, Proceedings of the
9th international conference on intelligent human-machine systems and
cybernetics, vol. 1, pp. 291-295. IEEE.
[9]Rahman, T, Islam, MR, Iqbal, A., and Jubaer, F. A machine learning-based crop
recommendation system for Bangladesh. 7TH ICSCC 2019, pp. 1-6. IEEE.
[11]Soumee Sahoo and Mr. Rabindranath Patra (2020) [13] A data mining based
croop recommendation system in comparatives study. IEEE, Int Conf Smart Electr
Commun & Control, 2020, vol XX, pp. 261 – 265.
[12]A. Sahu, & P. K. Sahu. (2018). Crop Yield Forecasting and Recommendation
System Based on Data Mining Methods: A Survey. Proceedings of second
International conference on inventive communication and computing technologies
ICICCT (pp. 1589-1594)
[13]Bisnoi, U., Sharma, D., & Mittal, K. (2020) Machine Learning-Based Crop
Recommendation System: A Review. 1079 of 2020 IEEE Int. Conf. ANTS. 1-6).
IEEE.
67
International Conference on Electronics, Communication and Aerospace
Technology (ICECA). 1374-1378). IEEE.
[15]Ali, I., Wahab, A., Rehman, A., and Yasin, S., (2019). machine learning based
crop recommendation system. ICECCE 2019. International Conference on
Electrical, Communication, and Computer Engineering,(pp. 1-6). IEEE.
[17] Gandhi, R., et al. (2022). Enhancing Crop Recommendation Accuracy through
Ensemble Learning Approaches. In Proceedings of the 2022 International
Conference on Computational Science and Computational Intelligence (CSCI) (pp.
350-355). IEEE.
[18] Nath, P., & Meena, M. (2021). IoT-Based Crop Recommendation System for
Precision Agriculture. In 2021 International Conference on Smart City and
Emerging Technology (ICSCET) (pp. 1-5). IEEE.
[19] Bansal, S., et al. (2020). Predictive Analytics for Crop Recommendation in
Precision Agriculture. In 2020 International Conference on Computing,
Communication and Automation (ICCCA) (pp. 1-6). IEEE.
[20] Jha, A., et al. (2019). Crop Recommendation System Using Machine Learning
and Remote Sensing Data. In 2019 6th International Conference on Computing for
Sustainable Global Development (INDIACom) (pp. 1648-1651). IEEE.
[21] Singh, R., et al. (2018). Deep Learning Approaches for Crop
Recommendation: A Survey. In 2018 5th International Conference on Computing,
68
Communication and Security (ICCCS) (pp. 1-5). IEEE.
[22] Wu, H., & Li, X. (2017). An Intelligent Crop Recommendation System Based
on Fuzzy Logic and Neural Networks. In 2017 International Conference on
Sustainable Computing in Science, Technology and Management (SUSCOM) (pp.
1-5). IEEE.
[23] Zhang, L., & Wang, Y. (2016). Development and Review of a Crop
Recommendation System Using Data Mining Techniques. In 2016 3rd
International Conference on Electronics, Communication and Aerospace
Technology (ICECA) (pp. 1374-1378). IEEE.
[24][Link]
[25][Link]
[26][Link]
69
70
71
72