A project focused on designing and optimizing object detection pipelines to enhance the capabilities of autonomous vehicles. The project evaluates and implements state-of-the-art machine learning techniques to achieve reliable object detection in real-world scenarios.
Here's a presentation that I made to explore my thought process.
- Method: The image feed is divided into smaller boxes (sliding windows), and each box is classified independently.
- Drawbacks: Computationally expensive and less efficient for real-time systems.
- Architecture: A neural network classifier that consists of the following layers:
| Layer | Type | Input Shape | Output Shape | Purpose |
|---|---|---|---|---|
| 1. Flatten | Flatten | (32, 32, 3) | (3072) | Converts 3D image data into a 1D array. |
| 2. Dense (Hidden) | Fully Connected | (3072) | (128) | Learns features using non-linearity. |
| 3. Dense (Output) | Fully Connected | (128) | (3) | Outputs probabilities for 3 classes. |
- Observation: 50% accuracy demonstrates the limitations of basic models in handling complex object detection tasks.
- Architecture: A custom CNN model with layers optimized for feature extraction and classification:
| Layer | Type | Input Shape | Output Shape | Purpose |
|---|---|---|---|---|
| 1. Conv2D | Convolutional | (32, 32, 3) | (30, 30, 64) | Extracts spatial features using a 3x3 filter with 64 channels. |
| 2. Activation | ReLU | (30, 30, 64) | (30, 30, 64) | Applies non-linearity to introduce activation. |
| 3. MaxPooling2D | Pooling | (30, 30, 64) | (15, 15, 64) | Downsamples feature maps using a 2x2 pooling window. |
| 4. Flatten | Flatten | (15, 15, 64) | (14400) | Flattens the 3D feature maps into a 1D array. |
| 5. Dense (Hidden) | Fully Connected | (14400) | (128) | Learns high-level features using non-linearity. |
| 6. Dense (Output) | Fully Connected | (128) | (3) | Outputs probabilities for 3 classes. |
- Observation: Significant improvement (71% accuracy) over the perceptron model, showcasing the power of convolutional layers in spatial data analysis.
- Architecture: Transfer learning using the VGG16 model pre-trained on ImageNet:
| Layer | Type | Input Shape | Output Shape | Purpose |
|---|---|---|---|---|
| 1. VGG Expert | Pre-trained Feature Extractor | (224, 224, 3) | Feature Maps | Extracts rich features using pre-trained VGG model. |
| 2. GlobalAveragePooling2D | Pooling | Feature Maps | (Channels) | Reduces spatial dimensions to a single value per channel. |
| 3. Dense (Hidden) | Fully Connected | (Channels) | (1024) | Learns high-level features using non-linearity. |
| 4. Dropout | Regularization | (1024) | (1024) | Reduces overfitting by randomly dropping neurons (30%). |
| 5. Dense (Hidden) | Fully Connected | (1024) | (512) | Learns further refined features using non-linearity. |
| 6. Dropout | Regularization | (512) | (512) | Reduces overfitting by randomly dropping neurons (30%). |
| 7. Dense (Output) | Fully Connected | (512) | (3) | Outputs probabilities for 3 classes. |
- Observation: Leveraging pre-trained models leads to higher accuracy (85%) and reduced training time for complex datasets.
-
Architecture: YOLO model processes the entire image using the DarkNet architecture:

-
Observation: Combines speed and accuracy, ideal for real-time video feed processing by analyzing frames without relying on sliding windows.
- Programming Language: Python
- Framework: TensorFlow, Keras
- Models: VGG16, YOLO, custom CNNs
- Libraries: NumPy, Pandas, Matplotlib, Pillow for data processing and visualization
- Clone the repository:
git clone https://github.com/Profilist/Autonomous-Driving.git
- Launch Jupyter Notebook
jupyter notebook
This project is licensed under the MIT License. See the LICENSE file for details.
