Crop Production Analysis
Crop Production Analysis
Production in India
I will be analyzing the Agriculture Crop Production dataset and try to answer some
interesting questions. I have downloaded the dataset from Kaggle datasets.The libraries for
data analysis and visualization that I have used in this project are Numpy, Pandas,
Matplotlib and Seaborn.
Required Library
In [1]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns
5 import warnings
6 warnings.filterwarnings('ignore')
executed in 8.32s, finished 17:30:52 2024-07-02
Load Dataset
In [2]: 1 df = pd.read_csv('C:\\Users\\KALI LINNUX\\Desktop\\Data Analysis\\crop
executed in 230ms, finished 17:30:55 2024-07-02
Top 5 Row
In [3]: 1 df.head()
executed in 19ms, finished 17:30:56 2024-07-02
Andaman and
0 NICOBARS 2000 Kharif Arecanut 1254.0 2000.0
Nicobar Islands
Andaman and
2 NICOBARS 2000 Kharif Rice 102.0 321.0
Nicobar Islands
Last 5 Row
In [4]: 1 df.tail()
executed in 14ms, finished 17:30:57 2024-07-02
Whole
246088 West Bengal PURULIA 2014 Sugarcane 324.0 16250.0
Year
Random 5 Rows
In [5]: 1 df.sample(5)
executed in 24ms, finished 17:30:58 2024-07-02
Whole Sweet
165252 Rajasthan ALWAR 2009 20.0 NaN
Year potato
Madhya Whole
111733 JABALPUR 2002 Sannhamp 39.0 89.0
Pradesh Year
Himachal Whole
72190 KANGRA 2001 Garlic 49.0 65.0
Pradesh Year
Madhya Whole
114961 MANDSAUR 2009 Garlic 6519.0 27568.0
Pradesh Year
Shape of dataset
In [6]: 1 df.shape
executed in 5ms, finished 17:30:59 2024-07-02
Out[6]: (246091, 7)
Out[7]: 1722637
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246091 entries, 0 to 246090
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 State_Name 246091 non-null object
1 District_Name 246091 non-null object
2 Crop_Year 246091 non-null int64
3 Season 246091 non-null object
4 Crop 246091 non-null object
5 Area 246091 non-null float64
6 Production 242361 non-null float64
dtypes: float64(2), int64(1), object(4)
memory usage: 13.1+ MB
Out[10]: State_Name 0
District_Name 0
Crop_Year 0
Season 0
Crop 0
Area 0
Production 3730
dtype: int64
In [11]: 1 3730/ 246091
executed in 6ms, finished 17:31:06 2024-07-02
Out[11]: 0.015156994770227274
In [13]: 1 df.describe()
executed in 36ms, finished 17:31:08 2024-07-02
In [14]: 1 df.shape
executed in 4ms, finished 17:31:09 2024-07-02
Out[14]: (242361, 7)
Univarate Analysis
STATE
In [15]: 1 # STATES
2 states = df.State_Name.str.strip().unique()
3 states
executed in 84ms, finished 17:31:12 2024-07-02
Out[16]: 33
In [17]: 1 df.State_Name.value_counts()
executed in 19ms, finished 17:31:15 2024-07-02
Out[17]: State_Name
Uttar Pradesh 33189
Madhya Pradesh 22604
Karnataka 21079
Bihar 18874
Assam 14622
Odisha 13524
Tamil Nadu 13266
Maharashtra 12496
Rajasthan 12066
Chhattisgarh 10368
West Bengal 9597
Andhra Pradesh 9561
Gujarat 8365
Telangana 5591
Uttarakhand 4825
Haryana 4540
Kerala 4003
Nagaland 3904
Punjab 3143
Meghalaya 2867
Arunachal Pradesh 2545
Himachal Pradesh 2456
Jammu and Kashmir 1632
Tripura 1412
Manipur 1266
Jharkhand 1266
Mizoram 954
Puducherry 872
Sikkim 714
Dadra and Nagar Haveli 263
Goa 207
Andaman and Nicobar Islands 201
Chandigarh 89
Name: count, dtype: int64
In [19]: 1 (len(df.District_Name.unique()),
2 df.District_Name.unique())
executed in 35ms, finished 17:31:19 2024-07-02
Out[19]: (646,
array(['NICOBARS', 'NORTH AND MIDDLE ANDAMAN', 'SOUTH ANDAMANS',
'ANANTAPUR', 'CHITTOOR', 'EAST GODAVARI', 'GUNTUR', 'KADAPA',
'KRISHNA', 'KURNOOL', 'PRAKASAM', 'SPSR NELLORE', 'SRIKAKULAM',
'VISAKHAPATANAM', 'VIZIANAGARAM', 'WEST GODAVARI', 'ANJAW',
'CHANGLANG', 'DIBANG VALLEY', 'EAST KAMENG', 'EAST SIANG',
'KURUNG KUMEY', 'LOHIT', 'LONGDING', 'LOWER DIBANG VALLEY',
'LOWER SUBANSIRI', 'NAMSAI', 'PAPUM PARE', 'TAWANG', 'TIRAP',
'UPPER SIANG', 'UPPER SUBANSIRI', 'WEST KAMENG', 'WEST SIANG',
'BAKSA', 'BARPETA', 'BONGAIGAON', 'CACHAR', 'CHIRANG', 'DARRANG',
'DHEMAJI', 'DHUBRI', 'DIBRUGARH', 'DIMA HASAO', 'GOALPARA',
'GOLAGHAT', 'HAILAKANDI', 'JORHAT', 'KAMRUP', 'KAMRUP METRO',
'KARBI ANGLONG', 'KARIMGANJ', 'KOKRAJHAR', 'LAKHIMPUR', 'MARIGAO
N',
'NAGAON', 'NALBARI', 'SIVASAGAR', 'SONITPUR', 'TINSUKIA',
'UDALGURI', 'ARARIA', 'ARWAL', 'AURANGABAD', 'BANKA', 'BEGUSARAI',
'BHAGALPUR', 'BHOJPUR', 'BUXAR', 'DARBHANGA', 'GAYA', 'GOPALGANJ',
'JAMUI', 'JEHANABAD', 'KAIMUR (BHABUA)', 'KATIHAR', 'KHAGARIA',
'KISHANGANJ', 'LAKHISARAI', 'MADHEPURA', 'MADHUBANI', 'MUNGER',
'MUZAFFARPUR', 'NALANDA', 'NAWADA', 'PASHCHIM CHAMPARAN', 'PATNA',
'PURBI CHAMPARAN', 'PURNIA', 'ROHTAS', 'SAHARSA', 'SAMASTIPUR',
'SARAN', 'SHEIKHPURA', 'SHEOHAR', 'SITAMARHI', 'SIWAN', 'SUPAUL',
'VAISHALI', 'CHANDIGARH', 'BALOD', 'BALODA BAZAR', 'BALRAMPUR',
'BASTAR', 'BEMETARA', 'BIJAPUR', 'BILASPUR', 'DANTEWADA',
'DHAMTARI', 'DURG', 'GARIYABAND', 'JANJGIR-CHAMPA', 'JASHPUR',
'KABIRDHAM', 'KANKER', 'KONDAGAON', 'KORBA', 'KOREA', 'MAHASAMUN
D',
'MUNGELI', 'NARAYANPUR', 'RAIGARH', 'RAIPUR', 'RAJNANDGAON',
'SUKMA', 'SURAJPUR', 'SURGUJA', 'DADRA AND NAGAR HAVELI',
'NORTH GOA', 'SOUTH GOA', 'AHMADABAD', 'AMRELI', 'ANAND',
'BANAS KANTHA', 'BHARUCH', 'BHAVNAGAR', 'DANG', 'DOHAD',
'GANDHINAGAR', 'JAMNAGAR', 'JUNAGADH', 'KACHCHH', 'KHEDA',
'MAHESANA', 'NARMADA', 'NAVSARI', 'PANCH MAHALS', 'PATAN',
'PORBANDAR', 'RAJKOT', 'SABAR KANTHA', 'SURAT', 'SURENDRANAGAR',
'TAPI', 'VADODARA', 'VALSAD', 'AMBALA', 'BHIWANI', 'FARIDABAD',
'FATEHABAD', 'GURGAON', 'HISAR', 'JHAJJAR', 'JIND', 'KAITHAL',
'KARNAL', 'KURUKSHETRA', 'MAHENDRAGARH', 'MEWAT', 'PALWAL',
'PANCHKULA', 'PANIPAT', 'REWARI', 'ROHTAK', 'SIRSA', 'SONIPAT',
'YAMUNANAGAR', 'CHAMBA', 'HAMIRPUR', 'KANGRA', 'KINNAUR', 'KULLU',
'LAHUL AND SPITI', 'MANDI', 'SHIMLA', 'SIRMAUR', 'SOLAN', 'UNA',
'ANANTNAG', 'BADGAM', 'BANDIPORA', 'BARAMULLA', 'DODA',
'GANDERBAL', 'JAMMU', 'KARGIL', 'KATHUA', 'KISHTWAR', 'KULGAM',
'KUPWARA', 'LEH LADAKH', 'POONCH', 'PULWAMA', 'RAJAURI', 'RAMBAN',
'REASI', 'SAMBA', 'SHOPIAN', 'SRINAGAR', 'UDHAMPUR', 'BOKARO',
'CHATRA', 'DEOGHAR', 'DHANBAD', 'DUMKA', 'EAST SINGHBUM', 'GARHW
A',
'GIRIDIH', 'GODDA', 'GUMLA', 'HAZARIBAGH', 'JAMTARA', 'KHUNTI',
'KODERMA', 'LATEHAR', 'LOHARDAGA', 'PAKUR', 'PALAMU', 'RAMGARH',
'RANCHI', 'SAHEBGANJ', 'SARAIKELA KHARSAWAN', 'SIMDEGA',
'WEST SINGHBHUM', 'BAGALKOT', 'BANGALORE RURAL', 'BELGAUM',
'BELLARY', 'BENGALURU URBAN', 'BIDAR', 'CHAMARAJANAGAR',
'CHIKBALLAPUR', 'CHIKMAGALUR', 'CHITRADURGA', 'DAKSHIN KANNAD',
'DAVANGERE', 'DHARWAD', 'GADAG', 'GULBARGA', 'HASSAN', 'HAVERI',
'KODAGU', 'KOLAR', 'KOPPAL', 'MANDYA', 'MYSORE', 'RAICHUR',
'RAMANAGARA', 'SHIMOGA', 'TUMKUR', 'UDUPI', 'UTTAR KANNAD',
'YADGIR', 'ALAPPUZHA', 'ERNAKULAM', 'IDUKKI', 'KANNUR',
'KASARAGOD', 'KOLLAM', 'KOTTAYAM', 'KOZHIKODE', 'MALAPPURAM',
'PALAKKAD', 'PATHANAMTHITTA', 'THIRUVANANTHAPURAM', 'THRISSUR',
'WAYANAD', 'AGAR MALWA', 'ALIRAJPUR', 'ANUPPUR', 'ASHOKNAGAR',
'BALAGHAT', 'BARWANI', 'BETUL', 'BHIND', 'BHOPAL', 'BURHANPUR',
'CHHATARPUR', 'CHHINDWARA', 'DAMOH', 'DATIA', 'DEWAS', 'DHAR',
'DINDORI', 'GUNA', 'GWALIOR', 'HARDA', 'HOSHANGABAD', 'INDORE',
'JABALPUR', 'JHABUA', 'KATNI', 'KHANDWA', 'KHARGONE', 'MANDLA',
'MANDSAUR', 'MORENA', 'NARSINGHPUR', 'NEEMUCH', 'PANNA', 'RAISEN',
'RAJGARH', 'RATLAM', 'REWA', 'SAGAR', 'SATNA', 'SEHORE', 'SEONI',
'SHAHDOL', 'SHAJAPUR', 'SHEOPUR', 'SHIVPURI', 'SIDHI', 'SINGRAUL
I',
'TIKAMGARH', 'UJJAIN', 'UMARIA', 'VIDISHA', 'AHMEDNAGAR', 'AKOLA',
'AMRAVATI', 'BEED', 'BHANDARA', 'BULDHANA', 'CHANDRAPUR', 'DHULE',
'GADCHIROLI', 'GONDIA', 'HINGOLI', 'JALGAON', 'JALNA', 'KOLHAPUR',
'LATUR', 'MUMBAI', 'NAGPUR', 'NANDED', 'NANDURBAR', 'NASHIK',
'OSMANABAD', 'PALGHAR', 'PARBHANI', 'PUNE', 'RAIGAD', 'RATNAGIRI',
'SANGLI', 'SATARA', 'SINDHUDURG', 'SOLAPUR', 'THANE', 'WARDHA',
'WASHIM', 'YAVATMAL', 'BISHNUPUR', 'CHANDEL', 'CHURACHANDPUR',
'IMPHAL EAST', 'IMPHAL WEST', 'SENAPATI', 'TAMENGLONG', 'THOUBAL',
'UKHRUL', 'EAST GARO HILLS', 'EAST JAINTIA HILLS',
'EAST KHASI HILLS', 'NORTH GARO HILLS', 'RI BHOI',
'SOUTH GARO HILLS', 'SOUTH WEST GARO HILLS',
'SOUTH WEST KHASI HILLS', 'WEST GARO HILLS', 'WEST JAINTIA HILLS',
'WEST KHASI HILLS', 'AIZAWL', 'CHAMPHAI', 'KOLASIB', 'LAWNGTLAI',
'LUNGLEI', 'MAMIT', 'SAIHA', 'SERCHHIP', 'DIMAPUR', 'KIPHIRE',
'KOHIMA', 'LONGLENG', 'MOKOKCHUNG', 'MON', 'PEREN', 'PHEK',
'TUENSANG', 'WOKHA', 'ZUNHEBOTO', 'ANUGUL', 'BALANGIR',
'BALESHWAR', 'BARGARH', 'BHADRAK', 'BOUDH', 'CUTTACK', 'DEOGARH',
'DHENKANAL', 'GAJAPATI', 'GANJAM', 'JAGATSINGHAPUR', 'JAJAPUR',
'JHARSUGUDA', 'KALAHANDI', 'KANDHAMAL', 'KENDRAPARA', 'KENDUJHAR',
'KHORDHA', 'KORAPUT', 'MALKANGIRI', 'MAYURBHANJ', 'NABARANGPUR',
'NAYAGARH', 'NUAPADA', 'PURI', 'RAYAGADA', 'SAMBALPUR', 'SONEPUR',
'SUNDARGARH', 'KARAIKAL', 'MAHE', 'PONDICHERRY', 'YANAM',
'AMRITSAR', 'BARNALA', 'BATHINDA', 'FARIDKOT', 'FATEHGARH SAHIB',
'FAZILKA', 'FIROZEPUR', 'GURDASPUR', 'HOSHIARPUR', 'JALANDHAR',
'KAPURTHALA', 'LUDHIANA', 'MANSA', 'MOGA', 'MUKTSAR', 'NAWANSHAH
R',
'PATHANKOT', 'PATIALA', 'RUPNAGAR', 'S.A.S NAGAR', 'SANGRUR',
'TARN TARAN', 'AJMER', 'ALWAR', 'BANSWARA', 'BARAN', 'BARMER',
'BHARATPUR', 'BHILWARA', 'BIKANER', 'BUNDI', 'CHITTORGARH',
'CHURU', 'DAUSA', 'DHOLPUR', 'DUNGARPUR', 'GANGANAGAR',
'HANUMANGARH', 'JAIPUR', 'JAISALMER', 'JALORE', 'JHALAWAR',
'JHUNJHUNU', 'JODHPUR', 'KARAULI', 'KOTA', 'NAGAUR', 'PALI',
'PRATAPGARH', 'RAJSAMAND', 'SAWAI MADHOPUR', 'SIKAR', 'SIROHI',
'TONK', 'UDAIPUR', 'EAST DISTRICT', 'NORTH DISTRICT',
'SOUTH DISTRICT', 'WEST DISTRICT', 'ARIYALUR', 'COIMBATORE',
'CUDDALORE', 'DHARMAPURI', 'DINDIGUL', 'ERODE', 'KANCHIPURAM',
'KANNIYAKUMARI', 'KARUR', 'KRISHNAGIRI', 'MADURAI', 'NAGAPATTINA
M',
'NAMAKKAL', 'PERAMBALUR', 'PUDUKKOTTAI', 'RAMANATHAPURAM', 'SALE
M',
'SIVAGANGA', 'THANJAVUR', 'THE NILGIRIS', 'THENI', 'THIRUVALLUR',
'THIRUVARUR', 'TIRUCHIRAPPALLI', 'TIRUNELVELI', 'TIRUPPUR',
'TIRUVANNAMALAI', 'TUTICORIN', 'VELLORE', 'VILLUPURAM',
'VIRUDHUNAGAR', 'ADILABAD', 'HYDERABAD', 'KARIMNAGAR', 'KHAMMAM',
'MAHBUBNAGAR', 'MEDAK', 'NALGONDA', 'NIZAMABAD', 'RANGAREDDI',
'WARANGAL', 'DHALAI', 'GOMATI', 'KHOWAI', 'NORTH TRIPURA',
'SEPAHIJALA', 'SOUTH TRIPURA', 'UNAKOTI', 'WEST TRIPURA', 'AGRA',
'ALIGARH', 'ALLAHABAD', 'AMBEDKAR NAGAR', 'AMETHI', 'AMROHA',
'AURAIYA', 'AZAMGARH', 'BAGHPAT', 'BAHRAICH', 'BALLIA', 'BANDA',
'BARABANKI', 'BAREILLY', 'BASTI', 'BIJNOR', 'BUDAUN',
'BULANDSHAHR', 'CHANDAULI', 'CHITRAKOOT', 'DEORIA', 'ETAH',
'ETAWAH', 'FAIZABAD', 'FARRUKHABAD', 'FATEHPUR', 'FIROZABAD',
'GAUTAM BUDDHA NAGAR', 'GHAZIABAD', 'GHAZIPUR', 'GONDA',
'GORAKHPUR', 'HAPUR', 'HARDOI', 'HATHRAS', 'JALAUN', 'JAUNPUR',
'JHANSI', 'KANNAUJ', 'KANPUR DEHAT', 'KANPUR NAGAR', 'KASGANJ',
'KAUSHAMBI', 'KHERI', 'KUSHI NAGAR', 'LALITPUR', 'LUCKNOW',
'MAHARAJGANJ', 'MAHOBA', 'MAINPURI', 'MATHURA', 'MAU', 'MEERUT',
'MIRZAPUR', 'MORADABAD', 'MUZAFFARNAGAR', 'PILIBHIT', 'RAE BAREL
I',
'RAMPUR', 'SAHARANPUR', 'SAMBHAL', 'SANT KABEER NAGAR',
'SANT RAVIDAS NAGAR', 'SHAHJAHANPUR', 'SHAMLI', 'SHRAVASTI',
'SIDDHARTH NAGAR', 'SITAPUR', 'SONBHADRA', 'SULTANPUR', 'UNNAO',
'VARANASI', 'ALMORA', 'BAGESHWAR', 'CHAMOLI', 'CHAMPAWAT',
'DEHRADUN', 'HARIDWAR', 'NAINITAL', 'PAURI GARHWAL', 'PITHORAGAR
H',
'RUDRA PRAYAG', 'TEHRI GARHWAL', 'UDAM SINGH NAGAR', 'UTTAR KASH
I',
'24 PARAGANAS NORTH', '24 PARAGANAS SOUTH', 'BANKURA', 'BARDHAMA
N',
'BIRBHUM', 'COOCHBEHAR', 'DARJEELING', 'DINAJPUR DAKSHIN',
'DINAJPUR UTTAR', 'HOOGHLY', 'HOWRAH', 'JALPAIGURI', 'MALDAH',
'MEDINIPUR EAST', 'MEDINIPUR WEST', 'MURSHIDABAD', 'NADIA',
'PURULIA'], dtype=object))
In [20]: 1 df.District_Name.value_counts()
executed in 18ms, finished 17:31:22 2024-07-02
Out[20]: District_Name
TUMKUR 931
BELGAUM 924
BIJAPUR 905
HASSAN 895
BELLARY 887
...
HYDERABAD 8
KHUNTI 6
RAMGARH 6
NAMSAI 1
MUMBAI 1
Name: count, Length: 646, dtype: int64
In [21]: 1 sns.boxplot(df.District_Name.value_counts().values)
executed in 248ms, finished 17:31:23 2024-07-02
Crop Year
19
1997
2015
In [23]: 1 df.Crop_Year.unique()
executed in 7ms, finished 17:31:29 2024-07-02
Out[23]: array([2000, 2001, 2002, 2003, 2004, 2005, 2006, 2010, 1997, 1998, 1999,
2007, 2008, 2009, 2011, 2012, 2013, 2014, 2015], dtype=int64)
In [24]: 1 df.Crop_Year.value_counts()
executed in 10ms, finished 17:31:30 2024-07-02
Out[24]: Crop_Year
2003 17139
2002 16536
2007 14269
2008 14230
2006 13976
2004 13858
2010 13793
2011 13791
2009 13767
2000 13553
2005 13519
2013 13475
2001 13293
2012 13184
1999 12441
1998 11262
2014 10815
1997 8899
2015 561
Name: count, dtype: int64
Season
In [25]: 1 # Season
2 print(df.Season.nunique())
3 df.Season = df.Season.str.strip()
executed in 76ms, finished 17:31:32 2024-07-02
In [27]: 1 df.Season.max()
executed in 20ms, finished 17:31:34 2024-07-02
Out[27]: 'Winter'
In [28]: 1 df.Season.value_counts()
executed in 32ms, finished 17:31:35 2024-07-02
Out[28]: Season
Kharif 94283
Rabi 66160
Whole Year 56127
Summer 14811
Winter 6050
Autumn 4930
Name: count, dtype: int64
- Dataset talks of six different seasons - Kharif, Annual, Autumn, Rabi, Summer and Winter.
- Frequency wise, we have more data points from Kharif, Rabi and Annual crop types.
In [29]: 1 sns.boxplot(df.Season.value_counts().values)
executed in 134ms, finished 17:31:37 2024-07-02
Crop
In [30]: 1 print(df.Crop.nunique())
executed in 20ms, finished 17:31:39 2024-07-02
124
In [31]: 1 Crop = df.Crop.unique()
2 Crop
executed in 20ms, finished 17:31:41 2024-07-02
Out[32]: Crop
Rice 15082
Maize 13787
Moong(Green Gram) 10106
Urad 9710
Sesamum 8821
Groundnut 8770
Wheat 7878
Sugarcane 7827
Rapeseed &Mustard 7533
Arhar/Tur 7476
Gram 7227
Jowar 6990
Onion 6984
Potato 6914
Dry chillies 6421
Sunflower 5483
Bajra 5379
Small millets 4593
Peas & beans (Pulses) 4447
Cotton(lint) 4382
Linseed 4351
Turmeric 4168
Masoor 4152
Sweet potato 4122
Barley 4116
Name: count, dtype: int64
In [33]: 1 sns.boxplot(df.Crop.value_counts().values)
executed in 157ms, finished 17:31:44 2024-07-02
In [34]: 1 # Area
2 print(df.Area.max())
3 print(df.Area.min())
executed in 8ms, finished 17:31:46 2024-07-02
8580100.0
0.1
In [35]: 1 sns.boxplot(df.Area)
executed in 685ms, finished 17:31:47 2024-07-02
Production
In [36]: 1 df.Production.describe()
executed in 18ms, finished 17:31:48 2024-07-02
Total Production
Out[76]: 0 2.508000e+06
1 2.000000e+00
2 3.274200e+04
3 1.128160e+05
4 1.188000e+05
...
246086 2.451060e+05
246087 2.903010e+05
246088 5.265000e+06
246089 1.669041e+11
246090 1.540000e+04
Name: Total Production, Length: 242361, dtype: float64
Remove Outliers
Productivity
In [100]: 1 df['Productivity']
executed in 7ms, finished 17:54:30 2024-07-02
Out[100]: 1 0.500000
4 0.229167
11 0.500000
13 0.267038
16 1.000000
...
246081 0.800000
246082 0.685185
246083 0.513636
246087 0.738437
246090 0.502857
Name: Productivity, Length: 103594, dtype: float64
In [101]: 1 Q1 =df ["Productivity"].quantile (0.40)
2 Q3 =df["Productivity"].quantile (0.60)
3
4 IQR= Q3-Q1
5
6 df=df[(df["Productivity"] >= Q1 -1.5*IQR) & (df ["Productivity"] <= Q3
executed in 20ms, finished 17:54:32 2024-07-02
In [102]: 1 df['Productivity']
executed in 9ms, finished 17:54:34 2024-07-02
Out[102]: 1 0.500000
4 0.229167
11 0.500000
13 0.267038
16 1.000000
...
246081 0.800000
246082 0.685185
246083 0.513636
246087 0.738437
246090 0.502857
Name: Productivity, Length: 86161, dtype: float64
In [103]: 1 plt.hist(df['Productivity'])
2 plt.show()
executed in 132ms, finished 17:54:38 2024-07-02
In [104]: 1 sns.histplot(df['Productivity'], kde = True)
executed in 583ms, finished 17:54:49 2024-07-02
Area
In [105]: 1 df.Area
executed in 8ms, finished 17:55:10 2024-07-02
Out[105]: 1 2.0
4 720.0
11 2.0
13 719.0
16 1.0
...
246081 1885.0
246082 54.0
246083 220.0
246087 627.0
246090 175.0
Name: Area, Length: 86161, dtype: float64
In [106]: 1 plt.hist(df['Area'])
executed in 146ms, finished 17:55:14 2024-07-02
Production
In [119]: 1 df.Production
executed in 6ms, finished 18:00:20 2024-07-02
Out[119]: 1 1.0
4 165.0
11 1.0
13 192.0
16 1.0
...
246081 1508.0
246082 37.0
246083 113.0
246087 463.0
246090 88.0
Name: Production, Length: 86161, dtype: float64
In [120]: 1 plt.hist(df['Production'])
executed in 147ms, finished 18:00:33 2024-07-02
In [40]: 1 df['crop_category'].value_counts()
executed in 18ms, finished 17:32:10 2024-07-02
Out[40]: crop_category
Cereal 63283
Pulses 40898
Oilseeds 33801
Vegetables 23154
spices 21638
Nuts 11472
Commercial 10561
fibres 9785
Beans 9115
Fruits 6153
Name: count, dtype: int64
- A new variable 'crop_category' is created.
In [42]: 1 df.Production_per_unit_area
executed in 10ms, finished 17:32:12 2024-07-02
Out[42]: 0 1.594896
1 0.500000
2 3.147059
3 3.642045
4 0.229167
...
246086 2.617647
246087 0.738437
246088 50.154321
246089 2.141848
246090 0.502857
Name: Production_per_unit_area, Length: 242361, dtype: float64
- A new new variable 'prod_per_unit_area' for Production per unit area is created.
Visualising Data
1. State wise Production
In [43]: 1 prod = df.groupby(by = df.State_Name)['Production'].sum().reset_index()
2 prod
executed in 28ms, finished 17:32:15 2024-07-02
15 Kerala 9.788005e+10
3 Assam 2.111752e+09
17 Maharashtra 1.263641e+09
14 Karnataka 8.634298e+08
24 Punjab 5.863850e+08
9 Gujarat 5.242913e+08
8 Goa 5.057558e+08
23 Puducherry 3.847245e+08
10 Haryana 3.812739e+08
4 Bihar 3.664836e+08
28 Telangana 3.351479e+08
25 Rajasthan 2.813203e+08
22 Odisha 1.609041e+08
31 Uttarakhand 1.321774e+08
6 Chhattisgarh 1.009519e+08
21 Nagaland 1.276595e+07
29 Tripura 1.252292e+07
19 Meghalaya 1.211250e+07
13 Jharkhand 1.077774e+07
18 Manipur 5.230917e+06
26 Sikkim 2.435735e+06
20 Mizoram 1.661540e+06
5 Chandigarh 6.395650e+04
In [44]: 1 plt.figure(figsize=(15,15))
2 x = prod['Production'].head(10)
3 y = prod['State_Name'].head(10)
4 plt.barh( y,x)
5 plt.title('State wise production', fontsize= 30 , color = 'r')
6 plt.xlabel('Production', fontsize = 30 , color = 'g')
7 plt.ylabel('State_Name', fontsize = 30, color = 'g')
8 plt.show()
executed in 288ms, finished 17:32:17 2024-07-02
In [45]: 1 df.Production.describe()
executed in 19ms, finished 17:32:18 2024-07-02
28 Coconut 1.299816e+11
95 Rice 1.605470e+09
87 Potato 4.248263e+08
33 Cotton(lint) 2.970000e+08
59 Maize 2.733418e+08
49 Jute 1.815582e+08
7 Banana 1.461327e+08
In [47]: 1 plt.figure(figsize=(15,10))
2 sns.barplot(x='Crop', y='Production', data = crop1,palette='viridis')
3 plt.yscale('log')
4 plt.title('Overall Crops v/s Production ')
5 plt.show()
executed in 662ms, finished 17:32:20 2024-07-02
14 2011 1.430890e+10
16 2013 1.290359e+10
9 2006 8.681913e+09
17 2014 8.664541e+09
7 2004 8.189462e+09
15 2012 8.171055e+09
8 2005 8.043757e+09
6 2003 7.917974e+09
11 2008 7.717018e+09
5 2002 7.696955e+09
12 2009 7.660494e+09
4 2001 7.465541e+09
3 2000 7.449709e+09
10 2007 6.879442e+09
2 1999 6.434666e+09
13 2010 6.307609e+09
1 1998 5.825321e+09
0 1997 8.512329e+08
18 2015 6.935065e+06
In [49]: 1 plt.figure(figsize=(15,15))
2 plt.bar(year_wise_prod.Crop_Year, year_wise_prod.Production, width=0.8)
3
4 plt.xticks(year_wise_prod.Crop_Year, rotation = 60)
5 plt.xlabel("Crop Year", fontsize = 30, color = 'g')
6 plt.ylabel("Production", fontsize = 30, color = 'g')
7 plt.title('Crops Year vs Production', fontsize = 30, color = 'r')
executed in 332ms, finished 17:32:21 2024-07-02
1 Kharif 4.029970e+09
2 Rabi 2.051688e+09
5 Winter 4.345498e+08
3 Summer 1.706579e+08
0 Autumn 6.441377e+07
In [51]: 1 plt.figure(figsize=(15,10))
2 sns.barplot(x='Season', y='Production',data = season_wise_prod, palette
3 plt.yscale('log')
4 plt.title('Season v/s Production', fontsize = 30,color = 'r')
5 plt.show()
executed in 477ms, finished 17:32:22 2024-07-02
- Top crop categories which shows high production values are Whole Year(Annual growing
plants), Kharif and Rabi crops.
- Cereals - 27.5%
- Pulses - 17.8%
- Oilseeds - 14.7%
State wise crop production with different categories of
crops
In [54]: 1 state_wise = pd.crosstab(df['State_Name'],df['crop_category'])
2 state_wise
executed in 74ms, finished 17:32:26 2024-07-02
Out[54]: crop_category Beans Cereal Commercial Fruits Nuts Oilseeds Pulses Vegetables fibres
State_Name
Andaman and
Nicobar 0 20 15 16 37 11 9 20 0
Islands
Andhra
386 2264 474 502 674 1101 1336 1046 333
Pradesh
Arunachal
26 1021 168 0 26 343 67 257 0
Pradesh
Bihar 280 6108 756 226 130 2504 3731 1775 924
Chandigarh 0 39 0 0 0 7 14 26 0
Chhattisgarh 646 1805 316 264 261 1496 2087 1143 535
Dadra and
0 116 12 9 9 30 64 0 13
Nagar Haveli
Goa 0 62 22 16 47 0 32 0 0
Gujarat 403 2466 372 157 683 1029 1521 473 327
Himachal
179 726 67 0 54 236 530 214 37
Pradesh
Jammu and
12 562 42 24 7 233 307 196 44
Kashmir
Karnataka 1096 5295 615 598 1470 3135 2776 1763 605
Madhya
962 5115 826 659 768 3281 3993 2738 922
Pradesh
Meghalaya 113 606 182 162 143 329 314 399 177
Rajasthan 871 2634 518 257 444 1713 2174 1048 672
Tamil Nadu 479 2680 623 992 1076 1235 1466 1827 556
Uttar Pradesh 1112 9719 1741 269 958 4028 6549 3734 724
State_Name
West Bengal 254 2217 356 0 730 1542 1633 619 710
Out[55]: Crop
Rice 15082
Maize 13787
Moong(Green Gram) 10106
Urad 9710
Sesamum 8821
Groundnut 8770
Wheat 7878
Sugarcane 7827
Rapeseed &Mustard 7533
Arhar/Tur 7476
Name: count, dtype: int64
In [56]: 1 rice_df = df[df['Crop'] == 'Rice']
2 rice_df.head(10)
executed in 37ms, finished 17:32:27 2024-07-02
Andaman
2 and Nicobar NICOBARS 2000 Kharif Rice 102.00 321.00 Cere
Islands
Andaman
12 and Nicobar NICOBARS 2001 Kharif Rice 83.00 300.00 Cere
Islands
Andaman
18 and Nicobar NICOBARS 2002 Kharif Rice 189.20 510.84 Cere
Islands
Andaman
27 and Nicobar NICOBARS 2003 Kharif Rice 52.00 90.17 Cere
Islands
Andaman
36 and Nicobar NICOBARS 2004 Kharif Rice 52.94 72.57 Cere
Islands
Andaman
45 and Nicobar NICOBARS 2005 Kharif Rice 2.09 12.06 Cere
Islands
Andaman
64 and Nicobar NICOBARS 2010 Autumn Rice 3.50 10.00 Cere
Islands
Answers
- Rice is seen to have more frequency.
- Year wise 2004 is the year when production reached the peak production.
2. Which states ranks high in area wise crop
production in India? Substantiate with facts and
Figures.
In [62]: 1 df_area = df.groupby('State_Name')['Area'].sum().reset_index().sort_val
2 df_area.head(10)
executed in 55ms, finished 17:33:44 2024-07-02
17 Maharashtra 3.221860e+08
25 Rajasthan 2.687882e+08
14 Karnataka 2.029086e+08
9 Gujarat 1.549261e+08
4 Bihar 1.282695e+08
24 Punjab 1.267152e+08
- Top cultivating states based on the Cultivation area are: Uttar Pradesh, Madhya Pradesh
and Maharashtra.
In [64]: 1 df_area_5 = df_area.head(5)
2
3 fig, ax = plt.subplots(figsize=(25,30), sharey='col')
4 count = 1
5
6 for state in df_area_5.State_Name.unique():
7 plt.subplot(len(df_area_5.State_Name.unique()),1,count)
8 sns.lineplot(x=df[df.State_Name==state]['Crop_Year'],y=df[df.State_
9 plt.subplots_adjust(hspace=0.6)
10 plt.title(state)
11 count+=1;
executed in 1.48s, finished 17:33:51 2024-07-02
- Uttar Pradesh - High Production was seen in 2005 and after that it’s been blueucing
gradually.
- Madhya Pradesh - 1998 showed a high production and then there was gradual blueuction
but it picked up and 2012 also showed a peak in Production.
- Maharashtra - Production went down drastically in 2006 and again the levels went up and
hit a high peak after 2007.
- Rajasthan - Production hit a all time low in the year 2002 and then picked up by 2010.
- West Bengal - Production hit a peak around 2006 but it has hit a low after 2007 and never
recoveblue back
23 Puducherry 763.596415
15 Kerala 381.272231
24 Punjab 345.754577
8 Goa 199.160564
3 Assam 148.630468
28 Telangana 101.211017
- Most efficieent states in terms of production per unit area are - Puducherry, Kerala and
Punjab.
In [67]: 1 df.crop_category.unique()
executed in 11ms, finished 17:33:58 2024-07-02
Most efficieent states in terms of production per unit area for various categories of crops
are-
- Cereals - Chandigarh
- Pulses - Kerala
- Fruits - Gujrat
- Vegetables - Gujrat
Univarate-Analysis
Visualisation of Data
1. State wise Production
2. Crop wise Production
3. Year wise Production
4. Season wise Production
5. Crop Category wise Production
6. Different Proportion of crop Categories
Top cultivating states based on the Cultivation area are: Uttar Pradesh, Madhya
Pradesh, Maharashtra, Rajasthan and West Bengal.
Year wise trend of these states:
Uttar Pradesh - High Production was seen in 2005 and after that it’s been
reducing gradually.
Madhya Pradesh - 1998 showed a high production and then there was
gradual reduction but it picked up and 2012 also showed a peak in
Production.
Maharashtra - Production went down drastically in 2006 and again the levels
went up and hit a high peak after 2007.
Rajasthan - Production hit a all time low in the year 2002 and then picked up
by 2010.
West Bengal - Production hit a peak around 2006 but it has hit a low after
2007 and never recovered back.
3. Find the most efficient state (in terms of most production per unit area). Also find the
most efficient state for some of the crop categories.
Most efficieent states in terms of production per unit area are - Puducherry, Kerala
and Punjab.
Most efficieent states in terms of production per unit area for various categories of
crops are:
Cereals - Chandigarh
Pulses - Kerala
Fruits - Gujrat
Vegetables - Gujrat