1838
Vision-Based Perception for an Automated Harvester
Mark Ollis & Anthony Stentz
Robotics Institute
Carnegie Mellon University
Pittsburgh PA 15213
Abstract The seeming similarity of the road following problem
to crop line tracking led us initially to attack this problem
This paper describes a vision-based perception sys- using RALPH [9], a highly successful road-following sys-
tem which has been used to guide an automated harvester tem. Results from an early RALPH experiment soon dem-
cutting fields of alfalfa hay. The system tracks the bound- onstrated several difficulties: for instance, ragged edges,
ary between cut and uncut crop; indicates when the end of highly variable curvatures, and uneven coloring were
a crop row has been reached; and identifies obstacles in much more prevalent in the agricultural domain than in
the harvester’s path. The system adapts to local variations road following. We therefore investigated techniques
in lighting and crop conditions, and explicitly models and developed explicitly for agricultural use.
removes noise due to shadow. Klassen and Wilson [6] describe an algorithm for dis-
tinguishing cut and uncut crop using a monochrome CCD
Injield tests, the machine has successfully operated in camera. While their work was valuable in establishing that
four different locations, at sites in Pennsylvania, Kansas, machine vision techniques could be applied to this task,
and California. Using the vision system as the sole means there were some significant limitations: their system com-
of guidance, over 60 acres have been cut at speeds of up to putes only straight line boundaries, requires a specialized
4.5 mph (typical human operating speeds range f m m 3-6 digital signal processor, and has not been used to guide an
mph). Future work largely centers around combining actual vehicle. Gerrish et al. [4] used a variety of edge-
vision and GPS based navigation techniques to produce a detection and template-matching techniques to pick out
commercially viable product for use either as a navigation “work edges”; these were tested on actual field images
(including alfalfa). Straight line boundaries were still
aid or for a completely autonomous system.
assumed, however, and processing times required were on
the order of 20 seconds using a 68000 processor. Reid &
1. Introduction Searcy [lo] and Brandon & Searcy [ 2 ] have published
Agricultural applications have several appealing traits work on a related problem, vision-based segmentation of
as candidates for automation. Current agricultural machin- crop canopy from soil, which further supported the appli-
ery is often expensive, so that sensing and computing can cability of vision-based techniques to this area.
be added for a small marginal cost factor. The potential
market size is large. Many agricultural tasks are dull, Some results on guiding an actual agricultural vehicle
repetitive, and occasionally dangerous. They often take are presented in Billingsley and Schoenfish [ 11, but their
place in environments for which a priori knowledge is emphasis was on the discrimination of crop vs. soil for
plentiful; for example, most agricultural machines only row crops, and they had limited opportunities for field tri-
need to process one type of crop at a time, and accomplish als. Ollis and Stentz [8] present a precursor to our crop
their task within a known bounded geographic area. line follower; subsequent work presented in this paper
includes both a better algorithm for crop line following
This paper describes a vision-based perception sys-
(including an adaptive capability and shadow compensa-
tem which has been used to guide an automated harvester
tion) and development of additional behaviors (end-of-row
through fields of alfalfa hay. Several vision based behav-
detection and obstacle detection).
iors have been implemented as part of the Demeter auto-
mation project. A crop line tracker detects and follows the The harvester is shown in Figure 1; it is a New Hol-
boundary between cut and uncut crop; an end-of-row land 2550 Speedrower retrofitted with wheel encoders and
detector estimates the distance to the end of the crop row; servos to control a number of machine functions, such as
and an obstacle detector visually locates obstacles in the the throttle, steering and cutter bar. A Sun Sparc 20 board
vehicle’s path. running a real time operating system (VxWorks) is used to
PTW.WOS 97 0-7803-4119-8/97/$1001997 IEEE
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 19,2023 at 09:26:38 UTC from IEEE Xplore. Restrictions apply.
1839
the turn is complete, the system transitions back to the
crop line tracking behavior (Figure 2).
Tracking the crop line I
Figure 1: The Demeter automated harvester.
control machine functions; a separate Sparc 20 is dedi-
Detected end of row
I
cated to the perception system. An on board GPS receiver
coupled with a fixed base station allows the use of differ-
ential GPS-based positioning. Forward-facing RGB cam- Tum her)
eras equipped with auto-iris lenses are mounted to either
side of the cab roof, near the ends of the harvester's cutter
bar. The cameras are calibrated using a method developed
by Tsai [ 111, which allows conversion of image pixel coor-
dinates into real world locations.
One might reasonably question whether vision based
guidance is necessary for this application. Automation of
field vehicles using GPS (Global Positioning Satellite) Tracking the crop line again (
appears promising; differential GPS systems with accura-
cies of 20 cm or better are commercially available at the Figure 2: "ransitioning between behaviors
time of this writing. While still expensive, these systems
are rapidly €ailing in price as demand grows in applica- 2. Crop Line Tracking
tions such as surveying and automotive guidance. The crop line tracking method used is an adaptive ver-
As with all sensors, however, differential GPS is sub- sion of the algorithm presented by Ollis and Stentz [SI.
ject to various failures, such as satellite dropouts or broken Each scan line in the image is processed separately, in an
communication links between the mobile and fixed GPS attempt to find a boundary which divides the two roughly
units. A vision system can provide an independent source homogenous regions corresponding to cut and uncut crop.
of steering guidance while cutting; provide estimates to This is accomplished by computing the best fit step func-
the end of the crop row; and detect potential obstacles. tion to a plot of a pixel discriminant function f(i, j) for the
Further, such a system is extremely inexpensive compared scan line; the location of the step is then used as the
to the cost of differential GPS. Rather than viewing vision boundary estimate for that scan line, as shown in Figure 3.
based perception and GPS as competing sensor modalities, Previously published versions of this algorithm used a
it makes more sense to consider GPS and vision as com- fixed discriminant function, such as f = G/(R+G+B). How-
plementary; a combination of the two is likely to outper- ever, even within the same field, changes in lighting condi-
form either one alone. tions and soil type prevent any single discriminant
In order to test the effectiveness of vision based sens- function from consistently returning a correct segmenta-
tion. To address this variability in the environment, we
ing, we have performed a number of experiments using
have implemented a method for adaptively updating the
vision and dead reckoning as guidance for the harvester,
without making use of the GPS system. A typical experi- discriminant function.
ment begins by using the crop line tracker vision to follow After each image is processed, the algorithm com-
the boundary between cut and uncut crop. The end of row putes the Fisher linear discriminant [3] in RGB space
detector is used as a trigger to decide when to transition between the cut and uncut pixel classes; this becomes the
into a turn behavior controlled by dead reckoning. When discriminant used for the next image. The Fisher discrimi-
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 19,2023 at 09:26:38 UTC from IEEE Xplore. Restrictions apply.
1840
nant computes the line in RGB space such that when the
pixel values are projected onto that line, the ratio of aver-
age interclass distance to average intraclass scatter is max-
imized. Intuitively, this results in the linear function which
most cleanly separates the cut and uncut pixel classes. The
discriminant function used for the first image is chosen
arbitrarily; in practice, a poor choice may result in inaccu-
rate crop line estimates for the first few images until the
algorithm converges to more effective discriminant func-
tions. Since current cycle times for our implementation of
this algorithm are roughly 5 Hz, we have found the crop
line estimates to be quite reliable 0.5 seconds after the
crop line tracker begins to cycle.
I
A I
Figure 4 Sample output from the crop line tracker.
f(i, j>
can be problematic; however, such variations within any
Figure 3: A model plot of f(i, j) as a function of
j for a single scan line i. single image are typically slight and relatively unstruc-
tured. Shadow noise presents a more serious challenge,-
A summary of our adaptive algorithm is given below: and is diicussed in detail in Section 3.
I . Initialize the color discriminant function: f = 1.0 R + Using the adaptive algorithm, we have successfully
1.0 G + 1.0 B. tracked and cut entire curved crop rows of over 1 mile in
2. Digitize an image. length fnom circular fields in Kansas; this compares to
3. For each scan line in the image: maximum distances of 150 yards for the old non-adaptive
a. plot f as a function of image column i; allgorithrln using a discriminant of R/@. From one fairly
challenging sequence of 29 images (from which Figure 4
b. Compute the best fit step function to the above plot; is taken), the adaptive algorithm was able to correctly
c. Return the location of the step a s the crop line locate thb crop boundary in 28 of the 29 images; by com-
boundary estimate. parison, the WG discriminant described in [8] was able to
4. Compute an updated discriminant function using the correctlyllocate the boundary in only 21 of 29 images.
Fisher linear discriminant.
3. Shadow Compensation
5. go to step 2.
This algorithm allows for a very general crop line Shadow noise can heavily distort both image intensity
boundary; any single-valued function of image row can be (luminance) and color (chrominance). An example of a
represented. Figure 4 shows a typical result from this algo- severe case is shown in Figure 5; here, a shadow cast by
rithm; in this image, a white dot has been placed on each the harvester body lies directly over the region containing
scan line at the location of the estimated crop boundary. the crop line. This shadow is directly responsible for the
Using the camera calibration parameters, each image resultant Brror in the crop line boundary estimate produced
row crop line boundary pixel is converted into a vote for a by the crbp line tracker.
discretized pure pursuit steering angle. After the votes are Shadow noise causes difficulties for a number of rea-
tallied, the steering command with the most votes is sons. It id often quite structured, and thus is not well mod-
relayed from the crop line tracker to the vehicle controller. eled by stochastic techniques. Its effects and severity are
In order to reduce processing time, low-resolution (160 x difficultto predict; if the sun is momentarily obscured by a
120) images are used, and only that portion of the image passing oloud or the orientation of the harvester changes
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 19,2023 at 09:26:38 UTC from IEEE Xplore. Restrictions apply.
1841
ro and i ( h) are purely functions of the CCD camera;
our goal, therefore, is to construct a model of how shad-
ows alter the function s(h ) .
To a first approximation, S( h ) is simply the product
of the SPD of the illuminating light, I( h ), with the reflec-
tance function of the illuminated surface point, p ( h):
S(h) = I(VP(U (2)
Suppose we assume that every point in the environ-
ment is illuminated by one of two SPDs; either I,,,( h ),
comprising both sunlight and skylight, or I ,hadow(A),
comprising skylight only. Then the red pixel values for
unshadowed regions will be computed by
Figure 5: Shadow noise.
and the red pixel vales for shadowed regions by
rapidly, the prevalence and effect of shadow noise can vary
dramatically on time scales of less than a second.
Normalizing for intensity, though an intuitively From Equations (3) and (4), we see that it is in general
appealing method of dealing with shadow noise, fails to be not possible to compute R,,, from Rshadowwithout knowl-
useful in our application for two reasons. The primary edge of the reflectance function of the environment patch
problem is that it does not take into account the significant being imaged. This is problematic, because for our appli-
color changes present in shadowed areas. For example, cation, this reflectance function is always unknown. How-
normalizing the image in Figure 5 before processing still ever, if we approximate T. ( h )as a delta function with a
results in an incorrect crop line boundary estimate. A num- non-zero value only at h red, then (3) and (4) simplify to
ber of factors contribute to this color shift, but perhaps the Rsun = rO’sun(hred)P(hred) (5)
most significant is the difference in illumination sources
between the shadowed and unshadowed regions[5]; the and
dominant illumination source for the unshadowed areas is
sunlight, while the dominant illumination source for the
so that R,,, and Rshadowcan be related by a constant factor
shadowed areas is skylight. A secondary problem with
intensity normalization is that it prevents the crop line
,-.
%ed:
tracking algorithm from using natural intensity differences
to discriminate between cut and uncut crop; depending on
local conditions, such natural intensity differences can be
a useful feature. The same analysis can be repeated for the G and B
pixel values. Under the assumptions given above, the
We present a technique for modeling and removing parameters Cred, Cblue,and Cgreenremain constant across
shadow noise which is based on compensating for the dif- all reflectance functions p (A) for a given camera in a
ference in the spectral power distribution (SPD) between given lighting environment.
the light illuminating the shadowed and unshadowed Implementing this shadow compensation therefore
regions. In an ideal camera, the RGB pixel values at a requires
given image point are a function of S ( A), the spectral 1) the selection of appropriate constants for cre& and
power distribution (SPD) emitted by a point in the cnvi- Cgreen,
ronment [7]; for example, R is determined in Equation (l), 2) a method for determining whether points are shadowed
where ro is a scaling factor and i ( h ) is the function or unshadowed, and
describing the response of the CCD chip and red filter;
3) “Correcting” the shadowed pixels using Equation (7).
typically, this function falls to 0 outside of a narrow wave-
Determining whether points were shadowed or
length band.
unshadowed was accomplished by intensity threshholding.
Approximate values for Cred, and Cgreenwere hand-
R = rojS(h)7(h)dh selected by experimentation on several images containing
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 19,2023 at 09:26:38 UTC from IEEE Xplore. Restrictions apply.
1842
significant shadow; for our application values of Cred = line in an area in which it failed without the compensation.
5.6, Cgmn = 4.0, and = 2.8 were found to work well. Further, the same set of constants Cred, Cblue, and C,,,,
were found to work in the two locations for which shad-
An attempt was made to calculate Cred,Cblueand
owed images were collected (Kansas and Pennsylvania).
C, values a priori from blackbody spectral distribution
models of sunlight and skylight. This calculation produced From initial testing, it appears that the method works
qualitatively the correct result, e.g. Cred > Cgreen> Cblue; better away from shadow edges; as can be seen in Figure
however, the a priori calculated values were found to be 6, the compensation becomes increasingly inaccurate near
less useful than the experimentally determined values. shadow/sunlit boundaries, possibly because the simple
This discrepancy may be due to a number of sources, such two-source spectral distribution model breaks down.
as the inadequacy of the blackbody spectral distribution
function as a model for skylight and the variable sensitiv-
4. Detecting the End of a Crop Row
ity of the camera CCD to red, green, and blue light. The goal of the end of row detector is to estimate the
The method described above necessarily makes a distance of the harvester from the end of the crop row.
number of simplifications. The red, green, and blue filters When the end of row boundary is approximately perpen-
actually each pass a range of frequencies; SPD variations dicular to the crop line, and the camera is mounted with
in sunlight and skylight within a single band are not taken zero roll (as in our system), the distance to the end of row
into account. Shadowed areas can receive significant illu- is purely a function of the image row where the crop line
mination from reflected light from neighboring sunlit boundary stops. Figure 7 shows an image which has been
areas; such interreflections are not modeled. The differing correctly processed; the white line marks the computed
effects of lighting angle for sunlight and skylight are
ignored, as are non-linearities in the CCD chip response.
The values of Cre& and Cgeen depend on color
of both the sunlight and the skylight. These colors can
vary across different times of day and different atmo-
spheric conditions. In our application, shadows typically
cause the most trouble on cloudless days in the late after-
noon; we therefore chose coefficients optimized for this
case.
Figure 7: Locating the end of a crop row.
Our end of row detection algorithm attempts to find
the image row i which most cleanly separates those scan
lines containing a crop line boundary from those which do
not contain such a boundary. The algorithm, described
below, first uses a binary function F(i) to classify each
image row according to whether it contains a crop row
boundary; next, it searches for the row which best divides
Figure 6: A successful example of shadow compensation. the “beforeEnd” rows from the “afterEnd”.
Despite these limitations, applying this method allows I ) Digitize an image.
crop lines to be successfully extracted from a number of 2 ) Remove shadow noise as described in section 3.
images which would otherwise return incorrect results.
For example, applying the shadow compensation method 2 ) For each scan line i in the image:
described above to the image shown in Figure 5 produces Apply a binary evaluation function F(i) to determine
a much improved estimate of the crop boundary, as shown whether row i contains a genuine crop line bounday.
in Figure 6. In at Peast one field test, the shadow compen- Let F(i) = “beforeEnd” i f i copztains a genuine bound-
sation allowed the harvester to successfully follow a crop ary point, and F(i) = “aferEizd” if not.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 19,2023 at 09:26:38 UTC from IEEE Xplore. Restrictions apply.
3) For each scan line i in the image, compute a score for
the scan line S[i) as follows:
a) set S(i) = 0;
6 ) Increment S[i)for every scan line xfioin 0 (the top of
the image) to i-I for which F(x) = “afterEnd”.
c ) Increment S ( i ) f o r every scan line )J from i+l to
i-MAX (the bottom of the image) f o r which F ( y ) =
“beforeEnd”.
4 ) The end of row is the row i with the highest score S(i).
We experimented with a broad range of binary evalua-
tion functions F(i). Our current version first computes the
crop line fitting algorithm described in Section 2 to each
scan line. The location and height of the resulting best fit
step functions are then compared to precomputed ranges
gathered from training data; if they fall within the allowed
ranges, the boundary is accepted as a genuine crop line
and the row receives a “beforeEnd” label; otherwise, the
row is labeled “afkerEnd”.
Our most common use for the end-of-row detection
module is to trigger a transition into a turn behavior when
the end of row is reached, as described in Section 1. In
order to prevent a single spurious image from falsely caus-
ing an end-of-row trigger, this message is sent only after
both 1) the distance to the end of row falls below some
threshold and 2) a series of processed images indicate the
end of row has been coming progressively nearer.
Although we have not yet gathered enough data to
obtain accurate reliability figures, the system has success-
fully located several end-of-row points in field experi-
ments. One problem with trying to ascertain the reliability K-nearest-neighbor approximations, histograms, and neu-
of the end-of-row detector is the wide variety of situations ral nets, just to name a few. These representations vary in
which can be encountered; while the crop row tracker need computational efficiency and in the kinds of PDFs that can
only distinguish between cut and uncut crop, the end-of- be represented. Each of these representations has advan-
row detector must be capable of dealing with images con- tages and disadvantages in training time, representational
taining almost anything, such as the road and cow pasture power, lookup time, and storage space. For this applica-
which appear near the top of Figure 7. tion, we used a discretized 2D histogram in normalized
5. Obstacle detection color space, with each cell containing an independent
probability density estimate; the reasons for this choice are
The obstacle detection algorithm is used to locate
discussed below.
potential obstacles in the camera’s field of view. The
method uses a training image eo build a probability density The discretization used is 12 bits; 6 for FU(R+G+B),
function (PDF) for combined cut and uncut crop as a func- and 6 for G/(R+G+B). Producing the histograms requires
tion of RGB pixel value. For each new image, shadows are an independent estimation of probability for each of 64x64
compensated for as described in Section 3. Next, image = 4096 different discretized bins; it thus requires over
pixels are marked whose probability of belonging to the 4000 parameters to describe. Compared to a multi-Gauss-
crop PDF falls below some threshold. Finally, regions of ian representation, this may seem excessive; since there
the image containing a large number of such marked pix- are so many free parameters, a large number of training
els are identified as obstacles. Figure 8 shows an example samples are required in order to form a reasonable PDF. In
of such an image before and after processing; potential our application, however, training data is plentiful, since
obstacles are marked as a solid region. every image pixel represents a training point. Further,
Traditionally, a wide range of representations have updating the PDF with a new training point is quite rapid,
been used for PDFs; multi-dimensional Gaussian models, since it simply requires incrementing a single counter; this
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 19,2023 at 09:26:38 UTC from IEEE Xplore. Restrictions apply.
1844
is not the case with, for example, a neural net representa- Acknowledgments
tion. The large number of free parameters allows the histo-
gram to represent a wider range of density shapes than is The authors would like to acknowledge Mike Black-
possible with a multi-Gaussian PDF. Computing probabil- well, Kerien Fitzpatrick, Mike Happold, Regis Hoffman,
ity densities from the model is computationally fast when Alex Lozupone, Ed Mutschler, Henning Pangels, Simon
compared to any of the alternatives (particularly to K- Peffers, and Red Whittaker for their work on the Demeter
nearest neighbor); a single table lookup followed by a sin- harvester. This work was jointly supported by New Hol-
gle division operation produces the required result. land and NASA under contract number NAGW-3903.
Note that the issues concerning the use of full RGB References
color versus normalized color are quite different for the
obstacle detector than for shadow compensation and crop [ 11 Billinglsey, J. and Schoenfisch, M. Vision-Guidance of
line detection. Normalized color is not used as an attempt Agricultural Vehicles. Autonomous Robots, Vol. 2, No. 1,
to remove shadow noise; that is accomplished using the 1995, pp. 65-76
method described in Section 3. Here, normalized color is
used to compensate €or different iris openings between the [2] Brandon, J. Robert and Searcy, Stephen W. Vision
training image and the images to be processed, and also as Assisted Tractor Guidance for Agricultural Vehicles.
a means for reducing the dimensionality of the PDF space. Transactions of the Society of Automotive Engineers,
While this does prevent the use of intensity as a metric for 1993, paper 921650.
distinguishing obstacles, we have found that obstacles typ- [3] Duda, &chard and Hart, Peter. Pattern Classification
ically differ from crop significantly enough in color alone and Scene Analysis. 1973, J. Wiley & Sons, pp, 114-118.
that the use of intensity information is not as necessary;
such is often not the case for the crop line follower, which [4] Gerrish, John et. al. Path-finding by Image Processing
often must distinguish between the quite similar appear- in Agricultural Field Operations. Transactions of the Soci-
ance of cut and uncut crop. ety of Automotive Engineers, 1987, paper 861455.
In order to perform robustly, it is likely that further [ 5 ] Healey, Glenn. Segmenting Images Using Normalized
development of the obstacle detector will be necessary; for Color. IEEE Transactions on Systems, Man and Cybernet-
example, the histogram PDF may need to be allowed to ics, 1992.
evolve over time to compensate for changing crop appear-
ance. Field testing of the obstacle detector is still in a pre- [6] Klassen, N.D. et al. Guidance Systems for Agricultural
liminary stage, though results such as shown in Figure 5 Vehicles. Dept. of Mechanical Engineering, University of
are promising. Saskatchewan, Saskatchewan, Canada.
6. Conclusions [7] Novak, C.L. and Shafer, S.A. Color Vision. Encyclope-
dia of Artificial Intelligence, 1992, J. Wiley and Sons, pp
Several different vision-based behaviors have been
192-202.
implemented for the Demeter automated harvester, and
have been demonstrated successfully in real world condi- [8] Ollis, Mark & Stentz, Anthony. First Results in Crop
tions. A crop line tracking behavior, which adapts to local Line Tracking. Proceedings of IEEE Conference on
changes in the environment, has been successfully used to Robotics and Automation (ICRA ‘96), Minneapolis, MN
cut over 60 acres of alfalfa hay. Explicitly modeling and April1996, pp.951-956.
removing shadows in the outdoor environment, though a
difficultproblem in general, has proven partially amenable 191 Pomerleau, Dean. RALPH: Rapidly Adapting Lateral
to approximation methods. Other behaviors, such as end- Position Handler. Proceedings of the 1995 IEEE Sympo-
of-row detection and obstacle detection, show promising sium on Intelligent Vehicles, Detroit, Michigan.
initial results. [lo] Reid, J.F. and Searcy, S.W. An Algorithm for Separat-
These combined results demonstrate the feasibility of ing Guidance Information from Row Crop Images. Trans-
vision-based guidance in an agricultural environment. actions of the ASAE. Nov/Dec 1988 v 31 (6) pp. 1624-
Such a system, when combined with the positioning capa- 1632.
bility allowed by GPS, appears viable for near term com-
mercial development as either a driver aid or as a [ 111 Tsai, Roger. A Versatile Camera Calibration Tech-
completely autonomous system. nique for High-Accuracy 3D Machine Vision Metrology
Using Off-the-shelf TV Cameras and Lenses. IEEE Jour-
nal of Robotics and Automation, Vol. RA-3, No. 4, August
1987, pp. 323-344.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on January 19,2023 at 09:26:38 UTC from IEEE Xplore. Restrictions apply.