Linux Device Drivers Notes
Linux Device Drivers Notes
INTRODUCTION
What is linux?
Linux is a free and open source operating system(OS) that is based on the Unix-like system(it
follows the unix philosophy). It powers everything from personal computers and servers to
embedded systems and iot devices.
Unix philosophy:
● Do one thing and do it well: Each program and tool should focus on doing one task
efficiently and effectively.
● Work together: One's output should be designed to interoperate. Example if we have a one
file it should be used as input to other work.
● Data should be plain text: Data should be stored and manipulated in a text because plain
text is easy to read, debug and process.
● Small and modular tools: Instead of building a large program for performing task build the
small tools that can combine to perform complex [Link] ls | grep “error” | wc -l
● Avoid cluster: keep the things simple and avoid unnecessary features or [Link]
should work without excessive setup or learning [Link] rm command for deleting
a files doesn’t have flashy features -its just remove files.
● Fail Early and Clearly: If something goes wrong, the program should fail quickly and
provide clear error messages.
Key Characteristics:
[Link] linux kernel is the core of the OS managing hardware resources and enabling software to
interact with hardware.
[Link] linux is open source and customizable. We can modify and recompile the linux kernel to
suit our needs.
[Link] it has cross compilation support that runs on various hardware architectures like x86,
ARM, RISC - V, etc. Example is that beagle bone black which is a compatible development
board to run the linux.
So the users can modify the kernel and create a variation of the source code, known as
distributions that are used in computers and other devices.
Linux Architecture
Linux is primarily divided into user space and kernel space. These two components interact
through the system call which acts as a gateway interface - which are predefined and Used to
interact with Linux kernel from userspace to application.
Kernel Space:
This is the privileged space where the linux kernel operates. It directly interacts with hardware
and provides OS COMPONENT services(Process Management, file Management, Memory
Management, I/O management, etc…).
So kernel space is where the kernel (the core of the operating system) execute/runs and provide
services.
User Space:
This is where the user applications and services [Link] runs in an unprivileged mode and it can
not directly access hardware.
The communication happens from user space to kernel space using APIs.
Example based Scenario:
When a user wants to print a document, the process starts in the user space, which includes the
desktop environment and the applications the user interacts with. For example, the user opens a
document in LibreOffice, a printing application, and sends a print command. However,
LibreOffice itself doesn’t know how to communicate with the printer directly. It relies on the
kernel for help. At this point, the kernel space comes into play. The kernel, which is the core of
the operating system, bridges the gap between user applications and hardware. The kernel uses
a specialized software called the device driver to manage communication with the hardware.
This driver translates the user's print request into precise instructions the printer hardware can
understand.
To initiate this process, LibreOffice makes a system call (e.g., write) to send the formatted data
to the printer. The system call acts as an interface between user space and kernel space,
transitioning the process securely into kernel space. This ensures the communication follows
predefined rules, preventing errors or harm to the system. Inside the kernel space, the kernel
invokes the printer driver, which is a device driver specific to the printer hardware. The printer
driver translates the generic system call into hardware-specific commands the printer
understands, such as moving the print head, feeding paper, and applying ink.
Finally, the printer hardware processes these instructions and begins printing the document.
During this time, the kernel ensures that other processes in user space do not interfere with the
ongoing print job. It also manages system resources like memory and CPU to ensure smooth
operation.
Once the job is complete, the printer driver sends a status (e.g., "Print complete") back to the
kernel. The kernel then relays this information to the user space application, LibreOffice, which
notifies the user with a message like "Print successful!"
Linux Kernel modules:
The Linux kernel is the core of the linux operating system. It interacts with hardware and
provides services to user-space applications and a Linux Kernel Module (LKM ) is a piece of
code that can be dynamically loaded into the kernel to extend its functionality without
rebooting the system.
Example:Adding device drivers for new hardware and implementing a new filesystem or
extending the system calls.
Loding: When we insert an LKM into the kernel during runtime and the kernel integreates it
makes it functional.
Unloading: when we remove the LKM from the kernel it frees the resources and the kernel
returns to its original state.
It is important because without booting we can add and remove the functionality and for the
save the system resource we only can load the required parts. When we are testing a new
feature or driver we can quickly load/unload it to debug without rebuilding the entire kernel.
we are loading these codes at runtime and they also not part of the official Linux kernel these
are called loadable kernel modules
Example:
Method 1: The basic way is to add the code to the kernel source tree and recompile the kernel.
We found that in the driver directory.
Method 2: So it is the efficient way to add the code to the kernel while it is running by using a
loading and unloading module where the modules are represented or refers to the code we want
to add to the kernel.
Since we are loading these codes at runtime and they are also not part of the official Linux
kernel, these are called loadable kernel modules. Which is different from the Base kernel is
located in the /boot directory and the base kernel is loaded when we boot our machine. Where
the LKMs are loaded after the base kernel is loaded.
These LKMs are very much part of our kernel and they communicate with the base kernel to
complete their functions.
3. Security Enhancements:
Security Modules: LKMs can be used to implement security modules that provide additional
security features, such as intrusion detection or encryption.
[Link] and Testing:
Debugging Tools: LKMs can be used to implement debugging tools that help identify and fix
kernel-level issues.
Testing New Features: LKMs can be used to test new features without affecting the stability of
the running kernel.
Key points:
Device Drivers are tailored to specific hardware devices a driver for usb keyboard wont work
for a usb mouse.
Many Devices are loaded kernel modules, meaning they can be added or removed without
rebooting the system providing flexibility.
FILE SYSTEM:
A file system is a way of organizing and storing data on a storage device. It defines how data is
structured, accessed, and stored. Different file systems have different formats.
DIFFERENT FILE SYSTEM FORMAT:
For Windows:
● NTFS (New Technology File System): This is the primary file system used in modern
Windows operating systems. It offers features like file compression, encryption, and access
control lists.
● FAT32 (File Allocation Table 32): An older file system, still used on older Windows
systems and many USB drives. It is simple and compatible with a wide range of devices,
but has limitations on file size and partition size.
● exFAT (Extended File Allocation Table): A more modern file system that overcomes the
limitations of FAT32. It is compatible with Windows, macOS, and Linux.
For Linux:
● EXT4 (Fourth Extended Filesystem): A widely used file system in Linux systems. It offers
good performance, reliability, and features like journaling and online file system checking.
● XFS (X Filesystem): A high-performance journaling file system, often used on servers and
high-performance workstations.
● Btrfs (B-tree File System): A modern file system that supports features like snapshots, data
deduplication, and self-healing.
Other File Systems:
● HFS+ (Hierarchical File System Plus): Used in older macOS systems.
● APFS (Apple File System): The modern file system used in macOS.
● ReFS (Resilient File System): A Microsoft file system designed for large-scale storage
systems.
[Link] Transfer:
Manages the transfer of data between the storage device and the system's memory.
Optimizes data transfer performance using techniques like buffering and caching.
[Link] Handling:
Detects and handles errors, such as disk failures, read/write errors, and file system corruption.
Implements error recovery mechanisms to minimize data loss.
3. SYSTEM CALLS
System calls are the interface between the user space and applications and the [Link]
allows a user program to request services from the kernel such as reading /write files
(read()/write()) allocating memory, or creating a process fork().
A user program invokes a system call using a wrapper function (eg. printf internally uses write
()) the request is sent to the kernel through a special instruction ( like syscall or int 0x80 on
x86.)
The kernel identifies the system call and executes the corresponding kernel function and it
returns the result (success or failure) back to the user program.
● We don't need to keep rebuilding the kernel every time we add a new device or if we
upgrade an old device. This saves time and also helps in keeping our base kernel error free.
● LKMs are flexible and they can load and unload with a single line command; this helps in
saving memory as we load LKM only when we need it.
Why ?
Security: Separating the address spaces prevents user programs from accessing or modifying
sensitive kernel data, protecting the system from bugs or malicious actions.
Stability: a faulty user prom can only crash itself, not the entire system. Kernel space isolation
ensures the core of the OS remains intact.
Efficiency: The kernel operates with full privileges and direct hardware access while user
programs interact with the kernel only via controlled system calls, maintaining efficient
resource usage.
KERNEL MODULE HAVE EXECUTION PRIVILEGES:
The kernel module operates with full access to hardware and kernel resources and it interacts
directly with hardware, bypassing protections.
Ex: reading and writing to an i/o port.
And other side user programs operate with the restricted privileges via system calls. Must use
system calls to access hardware resources ensuring safety.
Ex: reading file.
KERNEL MODULES DO NOT EXECUTE SEQUENTIALLY:
The kernel modules are event driven because they execute as needed (e.g. interrupt handling).
And user programs are executed sequentially from the start to finish in one flow.
HEADER FILES:
The Kernel modules was use kernel headers(eg..<linux/module.h>)
And the User space headers (eg. <stdio.h>)
—----------------------------------------------------------------------------------------------------------------
DIFFERENCE BETWEEN KERNEL MODULES AND KERNEL DRIVERS:
Kernel Modules:
A kernel module is a piece of code that can be dynamically loaded or unloaded into the linux
kernel at runtime and its main purpose is that without needing to reboot or recompile the kernel
to reboot or recompile the kernel.
Kernel modules are the tools you can add to the kernel as needed.
Example:
● A filesystem module to support a new filesystem(eg.ext4).
● A module implementing a custom system call.
Kernel Drivers:
A kernel driver is a type of kernel module specifically designed to interact with and control
hardware devices.
Kernel drives are a specific kind of tool that is only useful for a particular job - interacting with
hardware specific.
Purpose: Enable communication between the kernel and hardware components like network
cards, USB devices, GPUs, etc.
Examples:
● A network interface card (NIC) driver for Ethernet hardware.
● A sound card driver to enable audio output.
Key Characteristics:
● "Drives" a particular hardware device.
● Provides the hardware-specific implementation for abstract kernel interfaces.
What is a Device Driver?
A device driver is like a translator that helps the operating system (OS) and user applications
communicate with hardware devices.
In a linux system everything is a file and this means linux treats everything as a file even
hardware.
Key Characteristics:
● OS-specific: A driver written for Linux won't work in Windows.
● Hardware-dependent: A driver for an HP printer won't work for a Canon printer.
Character devices are represented as files in /dev, such as /dev/ttyS0 (serial port).
Key Features:
● Data is transferred in real time.
● Can't be used to store large files.
2. Block Device
A block device transfers data block by block (fixed-size chunks), making it ideal for storage
devices like hard drives or USB drives.
Examples:
● Hard Disks (HDD/SSD): Reads and writes large amounts of data.
● CD-ROMs: Transfers data in blocks during file access.
● USB Drives: Operates on block sizes for efficient data storage.
Linux Representation:
Block devices are also represented as files in /dev, such as /dev/sda (the first hard disk).
3. Network Device
A network device handles data transmission as packets over a network. It's used for
communication between computers or devices.
Devices that handle packets of data for communication between computers.
Examples:
Linux Representation:
Network devices are not visible in /dev. Instead, they can be listed using commands like:
ip link
bash
ls -l /dev
Output:
bash
brw-rw---- 1 root disk 8, 0 Dec 12 10:00 /dev/sda # Block
device
crw-rw---- 1 root tty 4, 0 Dec 12 10:00 /dev/tty0 # Character
device
The first character in the permission string indicates the type:
● b: Block device
● c: Character device
License:
A license in a Linux kernel module tells the kernel and users how the module can be used,
shared, or modified. It's like a legal label that describes the "rules" of the module's code.
A software license defines the terms under which a developer code can be used, shared or
modified.
The license specifies:
The license we choose determines how the module interacts with linux and how it can be
distributed.
Type of licenses:
[Link] (General Public License)
● Core Rule: If you modify and redistribute the code, you must share your
changes under the same license (GPL).
● Think of it like a group project in school.
● If someone uses your work, they must share their changes with everyone.
● You say: “You can use my work for free, but if you improve it, you must share
the improved version too!”
● Purpose: Ensures freedom for users to modify and share the code while
maintaining transparency.
Use Case:
Most Linux kernel modules use GPL because the Linux kernel itself is under GPL.
Example: A network card driver under GPL. If a company modifies it, they must
share their improvements with the community.
Declaration in Code:
c
MODULE_LICENSE("GPL");
3. MIT License
● Core Rule: Completely permissive; users can do anything with the code,
including using it in proprietary software, without crediting the original author.
● Think of it as giving your recipe to everyone without any conditions.
● They can use it, change it, or sell products made from it. They don’t even have
to credit you!
● You say: “Do whatever you want with my work.”
● Purpose: Encourages widespread use and adoption without any restrictions.
Use Case:
Commonly used for libraries or utilities that are meant to be reused in various
projects.
Example: A library for handling file I/O under MIT license can be freely integrated
into a Linux driver.
4. Proprietary License
● Core Rule: Users cannot see, modify, or redistribute the code. Access is
controlled by the creator.
● Think of it as a secret recipe.
● Only you know it, and others can use it only if they pay or follow strict rules.
● You say: “You can use my work, but you can’t see or change the details.”
● Purpose: Protects intellectual property by preventing reverse engineering or
unauthorized use.
Use Case:
Closed-source modules like NVIDIA’s graphics driver. Users can use it but can’t
modify or understand the internal implementation.
Declaration in Code:
c
MODULE_LICENSE("Proprietary");
Summary
● The license you choose determines how your code integrates with the kernel and how
it can be used by others.
● GPL is preferred for open collaboration and kernel compatibility.
● Proprietary licenses restrict sharing but protect intellectual property.
[Link]
The MODULE_AUTHOR macro is a way to associate the author's information with the
Linux kernel module.
MODULE_INFO Macro:
It stores the key-value pair (author, "Author Name") in the .modinfo section of the
compiled module.
The kernel's module loader and modinfo tool use this metadata to fetch and display the
author's information.
Purpose of MODULE_AUTHOR
[Link]:
This macro allows you to specify the author of the module, helping others understand
who developed it.
[Link]:
The modinfo command displays this information, aiding users or developers in
identifying the contributor(s) of a kernel module.
c
#include <linux/module.h>
c
MODULE_AUTHOR("Your Name <your_email@[Link]>");
c
MODULE_AUTHOR("Author1 <author1@[Link]>");
MODULE_AUTHOR("Author2 <author2@[Link]>");
Let’s dive into Module Description and Module Version in a detailed but
beginner-friendly manner. These macros are part of the module.h header file and provide
metadata about the module. This metadata is primarily used for identifying and managing
kernel modules.
[Link] Description
The MODULE_DESCRIPTION macro provides a brief explanation of what the module
does.
Purpose:
When you run the modinfo command on a module, it displays the description. This helps
users and developers understand the purpose of the module without diving into its code.
How to Use:
c
MODULE_DESCRIPTION("A sample driver for learning purposes.");
Use Case:
Helps system administrators or developers quickly understand what a module does
without inspecting the source code.
[Link] Version
The MODULE_VERSION macro specifies the version of your module. This can be used
to track updates, fixes, or changes.
Purpose:
Versions are crucial when maintaining or debugging modules. It helps identify if the
correct version of a module is loaded in the kernel.
How to Use:
c
MODULE_VERSION("1.0");
Example:
For a mouse driver:
c
Copy code
MODULE_VERSION("2.0.1");
Version Format Explained:
bash
modinfo my_module.ko
Example Output:
makefile
description: A sample driver for learning purposes.
author: Your Name <[Link]@[Link]>
version: 1.0
license: GPL
—-------------------------------------xox—---------------------------------------------
Simple kernel Module Programming:
Introduction
In Linux kernel module programming, instead of the standard main function used in user-space
programs, Init and Exit functions act as the entry and exit points of a module. These functions
manage the loading and unloading of the module into/from the kernel. Let's break it down step
by step.
1. Init Function
What It Does:
● This function is called when the module is inserted into the kernel (e.g., using
insmod).
● It acts as the "constructor" of the kernel module, where you can initialize resources,
register device drivers, or set up data structures.
Syntax:
static int __init hello_world_init(void)
{
}
module_init(hello_world_init);
key Points:
● __init: A compiler attribute that marks this function as initialization code. After
initialization, this code is freed to save memory.
● module_init: A macro that registers the init function with the kernel.
2. Exit Function
What It Does:
This function is called when the module is removed from the kernel (e.g., using rmmod).
It acts as the "destructor" of the kernel module, cleaning up any resources allocated in the init
function.
Syntax:
void __exit hello_world_exit(void)
{
}
module_exit(hello_world_exit);
Key Points:
● __exit: A compiler attribute that marks this function as cleanup code.
● module_exit: A macro that registers the exit function with the kernel.
What is a Log?
A log is simply a record of events, messages, or data generated by a program, operating system,
or device to give information about its operation. Logs help developers, administrators, or users
to:
● Monitor what’s happening.
● Diagnose problems.
● Debug code.
In the Linux kernel, the logs record what is happening inside the kernel. These logs are created
using the printk() function.
In the Linux kernel, log levels are represented by macros such as KERN_INFO, KERN_ERR,
etc.
[Link] Function
The printk() function is used for logging messages in the kernel. It behaves like printf() in
user-space programs but works in kernel space.
How it Works:
Messages logged via printk() are stored in the kernel log buffer.
Use the dmesg command to view these messages.
printk() Function in the Linux Kernel
In user-space programs, you often use the printf() function to print messages to the terminal.
However, in the Linux kernel, you use the printk() function to log messages, which is similar to
printf() but is specifically designed for the kernel.
Syntax of printk()
c
printk(log_level, "Your message here");
Where:
log_level: This specifies the priority of the message (e.g., KERN_INFO, KERN_ERR).
Message: The actual text message that you want to log.
In Linux Kernel Modules (LKMs), just like passing arguments to a user-space program (via
argc and argv in main), we can also pass parameters to a kernel module. These parameters are
typically passed during module loading using insmod or modprobe.
To accomplish this, Linux provides module parameter macros that allow you to define
parameters your kernel module can accept. These parameters are then accessible from within
the module and can be used to alter its behavior.
Permissions:
Before discussing the module parameter macros we can see the permissions.
What Are Permissions?
Permissions control who (user, group, or others) can read, write, or execute a parameter (or file)
in the Linux system. In the context of module parameters, permissions decide who can access
and modify the parameter values.
Permissions are applied when a module parameter is exposed in the
/sys/module/<module_name>/parameters/ directory. This is where Linux creates an
interface for accessing module parameters.
Structure of a Macro
S_<Action><Scope>
Action:
● R: Read — Permission to read the value.
● W: Write — Permission to modify the value.
● X: Execute — Permission to execute (not typically used for module parameters).
Scope:
● USR: User — The owner of the module (usually root or the one who loaded it).
● GRP: Group — Users belonging to the same group as the module's owner.
● OTH: Others — All other users on the system.
How Permissions Are Combined
You can combine multiple permissions using the bitwise OR (|) operator.
Examples:
[Link] + Write for User Only
c
S_IRUSR | S_IWUSR
● The module parameter can be read and written by the user (owner).
1.module_param()
The module_param() macro is used in Linux kernel modules to allow the user to pass
arguments (parameters) to the module at runtime. These parameters can be configured when
the module is loaded using the insmod command.
Syntax of module_param()
c
module_param(name, type, perm);
● name: The variable to store the value of the parameter.
● type: The type of the variable (e.g., int, bool, charp, etc.).
● perm: Permissions for accessing this parameter from user space via
/sys/module/<module_name>/parameters/<parameter_name>.
How it works
1. When a kernel module uses module_param(), a parameter is created in the
/sys/module/<module_name>/parameters/ directory.
● Example: If the parameter name is my_value, and the module is named
hello_module, the parameter file will be located at
/sys/module/hello_module/parameters/my_value.
2. Users can view or modify these parameters dynamically by accessing this sysfs entry.
Example:
c
module_param(my_value, int, S_IWUSR | S_IRUSR);
2. in bool
● Inverted boolean. If the user provides 1 (true), the parameter is treated as false internally,
and vice versa.
Example:
C
module_param(my_inv_flag, in bool, S_IRUSR);
● If the user sets my_inv_flag=1, the module treats it as "false".
3. charp
● Stores a string value.
● Variable type: char*.
Example:
c
module_param(my_string, charp, S_IRUSR);
● If the user passes my_string="Hello", this value is stored in the module.
4. int
● Stores a signed integer.
Example:
c
module_param(my_number, int, S_IRUSR | S_IWUSR);
● The user can set my_number=42.
5. uint
● Stores an unsigned integer.
Example:
c
module_param(my_unsigned, uint, S_IRUSR);
● The user can set my_unsigned=100.
6. long
● Stores a signed long integer.
Example:
C
module_param(my_long, long, S_IRUSR);
7. ulong
● Stores an unsigned long integer.
Example:
C
module_param(my_ulong, ulong, S_IRUSR);
8. short
● Stores a signed short integer.
Example:
c
module_param(my_short, short, S_IRUSR);
9. ushort
● Stores an unsigned short integer.
Example:
c
module_param(my_ushort, ushort, S_IRUSR);
—-------------------------------------------------------------------------------------------------------------
IMPORTANT : PRACTICAL EXAMPLE OF THESE IN FOLDER.
2.module_param_arry()
What is module_param_array()?
The module_param_array() macro allows you to pass an array of values to a Linux kernel
module as a parameter during module loading. These values are provided as a
comma-separated list from the command line.
Syntax of module_param_array()
c
module_param_array(name, type, num, perm);
Parameters:
● name: The name of the array (and the parameter name passed to the module).
● type: The type of the array elements (int, charp, etc.).
● num: A pointer to an integer variable where the count of array elements will be stored.
Pass NULL if you don’t need this.
● perm: The file permissions for the parameter (e.g., 0644 or 0444).
bash
dmesg | tail
[Link] the Module:
Use:
bash
sudo rmmod my_module
c
#include <linux/module.h>
#include <linux/init.h>
#define MAX_ARRAY_SIZE 5
static int my_array[MAX_ARRAY_SIZE]; // Array to hold values
static int array_size; // To store the number of elements passed
module_param_array(my_array, int, &array_size, 0444);
MODULE_PARM_DESC(my_array, "An array of integers");
static int __init my_module_init(void)
{
int i;
pr_info("Module loaded with parameters:\n");
for (i = 0; i < array_size; i++) {
pr_info("my_array[%d] = %d\n", i, my_array[i]);
}
return 0;
}
static void __exit my_module_exit(void)
{
pr_info("Module unloaded.\n");
}
module_init(my_module_init);
module_exit(my_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Vicky Deokar");
MODULE_DESCRIPTION("Example module with array parameter.");
Expected Output:
If you pass my_array=10,20,30, the output in dmesg will look like this:
less
Module loaded with parameters:
my_array[0] = 10
my_array[1] = 20
my_array[2] = 30
Module unloaded.
3.module_param_cb():
The module_param_cb() macro allows you to register a callback function to handle
changes to module parameters. This is useful when you need to perform actions or
handle events dynamically when a module parameter is updated.
Syntax of module_param_cb()
c
module_param_cb(name, ops, arg, perm);
Parameters:
c
Copy code
struct kernel_param_ops
{
int (*set)(const char *val, const struct kernel_param *kp);
int (*get)(char *buffer, const struct kernel_param *kp);
};
—-------------------------------------------------------------------------------------------------
When Would You Need This Notification?
A notification is crucial when a change in a parameter value requires an immediate
action or reaction in the system. Let’s break this down with a practical scenario:
● Using the callback mechanism, the set function gets triggered whenever the
parameter value changes.
● In this function, you can check the value and write to the hardware register (or
perform any other necessary action).
● Dynamic Control: You can dynamically control the behavior of your driver
without recompiling the module.
● Real-Time Responses: Immediate responses to parameter changes allow the driver
to interact efficiently with hardware or the kernel.
Introduction to Character Drivers
Character drivers are a specific type of device driver that manage devices operating
with byte-oriented input/output (I/O). These are essential for interacting with
devices where data flows sequentially, such as serial ports or audio devices.
[Link] Processing:
[Link] Communication:
● Ideal for devices like:
● Serial ports (e.g., UART, RS-232).
● Character-based terminals.
● Sensors sending single readings.
Why Use Byte-Oriented I/O?
● Simplicity: Devices that don’t require high throughput or buffering work
well with byte-oriented I/O.
● Real-Time Communication: Enables immediate processing of data, which is
critical for interactive devices like keyboards or sensors.
● Low Resource Requirement: Requires minimal memory and computational
resources compared to block-oriented systems.
What Makes Character Drivers Special?
● They handle sequential data (byte-by-byte operations).
● Commonly used for a wide range of devices:
○ Serial ports
○ Audio devices
○ Video and camera devices
○ Basic I/O devices
● Any driver that doesn’t involve block storage (like hard disks) or networking
usually falls under the category of character device drivers.
Application
↓
Device File (/dev/my_device)
↓
Major Number → Identifies Driver
↓
Minor Number → Identifies Specific Device
↓
Driver
↓
Hardware Device
Application Layer:
● Each hardware device has a corresponding device file in the /dev directory.
Major Number:
● The major number identifies the device type (e.g., IDE disk, SCSI disk, serial port,
etc.).
● It acts as the driver identifier: Each device driver in the Linux system is assigned a
unique major number.
● When the kernel receives a request for a device, it uses the major number to
determine which driver is responsible for handling that device.
Minor Number
● The minor number identifies the specific device instance handled by the driver
(e.g., first disk, second serial port, etc.).
● It acts as a device specifier: It distinguishes between multiple devices managed by
the same driver.
Example:
Most of the time the major identifies the driver while the minor number identifies
each physical device served by the driver.
Major Number: Used by the kernel to identify the driver that handles the device.
Minor Number: Passed to the driver to identify a specific device.
To see the Major and Minor numbers:
bash
ls -l /dev/ttyS0
crw-rw---- 1 root dialout 4, 64 Dec 15 10:00 /dev/ttyS0
Here:
● c: Character device.
● 4: Major number.
● 64: Minor number.
1. At /proc/devices:
Lists all major numbers and their associated drivers.
bash
Copy code
cat /proc/devices
Example Output:
Character devices:
1 mem
4 tty
240 my_device
Block devices:
1 ramdisk
/dev/ Directory:
[Link] device files with Major and Minor numbers.
bash
ls -l /dev/my_device
Output:
<
bash
crw------- 1 root root 240, 0 Dec 15 10:00 /dev/my_device
Here:
—-----------------------------------------------------------------------------------------------------
Allocating Major and Minor Number
There are two ways to allocate a major and minor number.
1. Statically allocating
2. Dynamically Allocating.
● Static allocation is used when we want to set a particular major number for a driver
like manually set..
● If the major number is already taken to another device or that number is allocated
then it fails to create a device file.
Function:
int register_chrdev_region(dev_t first, unsigned int count, char *name);
● dev_t first: Start of the device number range (both major and minor).
● unsigned int count: How many device numbers you need.
● char *name: The name of your device (visible in /proc/devices).
Steps for Static Allocation:
[Link] MKDEV to create a dev_t structure with a specific major and minor number.
c
dev_t dev = MKDEV(235, 0); // 235 is the major number, 0 is minor
c
register_chrdev_region(dev, 1, "my_device");
Function:
c
int alloc_chrdev_region(dev_t *dev, unsigned int firstminor, unsigned int count, char *name);
c
dev_t dev;
alloc_chrdev_region(&dev, 0, 1, "my_device");
[Link] kernel allocates a major number, and you can retrieve it using:
c
printk("Allocated Major = %d\n", MAJOR(dev));
How It Works:
When you call:
c
void unregister_chrdev_region(dev_t first, unsigned int count);
● first: The starting device number (major + minor).
● count: How many contiguous numbers you want to release.
The kernel removes these numbers from its list of allocated device numbers, making
them available for reuse.
Static Dynamic
You manually set the major Kernel assigns the major number.
number.
Prone to conflicts with other No conflicts—always safe.
drivers.
Useful if you need a fixed number. Preferred method—avoids conflicts.
Device nodes must match Device nodes are created at
major/minor. runtime.
Device Node in Linux
● Major Number: Identifies which device driver will handle the request.
● Minor Number: Specifies which device (if multiple) the driver should manage.
bash
sudo mknod /dev/cdac_edd c 202 0
● c: Indicates that the device is a character device.
● 202: Major Number (the link to the driver).
● 0: Minor Number (identifies the specific device if there are multiple).
After creating the device node, the kernel associates it with the specified major and
minor numbers.
Interacting with the Device
When a user-space program interacts with /dev/cdac_edd, the kernel will check the
major number (202) and send the request to the driver that registered this major
number.
For example:
bash
Copy code
echo "hello" > /dev/cdac_edd
● The kernel sees that /dev/cdac_edd has major number 202, so it directs the request
to the driver that has registered major 202.
● This makes the driver process the request (e.g., reading or writing data).
4. Analogy
● Major Number = Phone Number for the driver (the link to the driver).
● Device Node = Phone that you pick up to make a call.
● User Program (like echo, cat) = Caller that wants to talk to the driver.
Without the device node, user-space programs have no way to talk to the driver.
5. Key Takeaways
● The device node is essential to allow user-space programs to communicate with
kernel-space drivers.
● Major Number links the device node to the driver.
● Minor Number helps the driver identify specific devices.
● Device nodes are created using mknod and provide a way for programs to read,
write, and interact with the kernel driver.
In Short:
bash
sudo insmod static_allocation.ko
bash
dmesg | tail
You should see messages like:
vb net
mod31: Hello world from mod31!
mod31: major:minor 202:0 allotted!
bash
cat /proc/devices | grep cdac_edd
You should see:
202 cdac_edd
4. Create the Device Node
The kernel has allocated major number 202 and minor number 0 for this driver.
bash
sudo mknod /dev/cdac_edd c 202 0
Output:
bash
crw-r--r-- 1 root root 202, 0 <date> /dev/cdac_edd
bash
echo "test message" > /dev/cdac_edd
cat /dev/cdac_edd
Since this driver is simple and doesn’t implement actual read/write callbacks, these
commands will likely result in errors, but it shows interaction.
6. Testing the Driver
You can interact with the device node by reading or writing to it. For example:
bash
echo "test message" > /dev/cdac_edd
cat /dev/cdac_edd
Since this driver is simple and doesn’t implement actual read/write callbacks, these
commands will likely result in errors, but it shows interaction.
bash
sudo rmmod static_major_driver
bash
dmesg | tail
[Link] Involvement:
● Device drivers that handle the operations are part of the Linux kernel.
bash
ls -l /dev/
Explanation:
[Link] first letter of the permission field indicates the type of device:
● When an application accesses a device file, the kernel looks at the major number
to identify the appropriate driver.
● The minor number helps the driver determine which specific device is being
accessed.
● The driver then performs the required operation (e.g., reading from or writing to
the hardware).
Creating Device Files
[Link]
[Link]
Command Syntax:
bash
mknod -m <permissions> <name> <device type> <major> <minor>
Here:
● <name>: The name of the device file, including the full path (e.g.,
/dev/my_device).
● <major>: The major number assigned to your driver (identifies the driver).
● <minor>: The minor number assigned to the device (identifies a specific device
instance).
● -m <permissions>: (Optional) Set permissions during file creation. You can also
set permissions later using chmod.
Example Command
To create a character device file named /dev/etx_device with major number 246 and
minor number 0, use:
bash
Copy code
sudo mknod -m 666 /dev/etx_device c 246 0
If permissions are not specified during creation, you can use the chmod command
to modify them:
bash
Copy code
sudo chmod 666 /dev/etx_device
1. You can create the device file even before loading the driver.
2. It provides flexibility—anyone with the required permissions can create the device
file.
Rules for Manually Creating Device Files
[Link] Major and Minor Numbers
● Ensure the major number matches the driver registered in the kernel.
● The minor number should correspond to a specific device instance handled by the driver.
● Use c for character devices and b for block devices when specifying the device type in the
mknod command.
● Always create the device file inside the /dev/ directory (e.g., /dev/my_device). This is the
standard location for all device files.
● Permissions must be set carefully to control access to the device. For example:
● 666: Allows all users to read and write.
● 660: Limits read and write to the owner and group.
● Ensure that the corresponding driver is loaded in the kernel before using the device file.
Otherwise, user-space applications cannot communicate with the hardware.
[Link] Conflicts
● Check that the major and minor numbers do not conflict with existing device files. Use ls
-l /dev/ to confirm.
● After creating the file, test it by performing read or write operations using simple
user-space programs.
Programming Example
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kdev_t.h>
#include <linux/fs.h>
dev_t dev = 0;
/* Module initialization function */
static int __init hello_world_init(void)
{
/* Allocating a major number dynamically */
if ((alloc_chrdev_region(&dev, 0, 1, "Embetronicx_Dev")) < 0) {
pr_err("Cannot allocate major number for device\n");
return -1;
}
pr_info("Kernel Module Inserted Successfully...\n");
return 0;
}
/* Module cleanup function */
static void __exit hello_world_exit(void)
{
unregister_chrdev_region(dev, 1);
pr_info("Kernel Module Removed Successfully...\n");
}
module_init(hello_world_init);
module_exit(hello_world_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("AmbeTronicS <embetronicx@[Link]>");
MODULE_DESCRIPTION("Simple Linux driver (Manually Creating a Device file)");
MODULE_VERSION("1.1");
bash
sudo insmod [Link]
bash
ls -l /dev/
At this point, the device file is not yet created.
bash
sudo mknod -m 666 /dev/etx_device c 246 0
[Link] Creation
List the /dev directory again to confirm the file was created:
bash
ls -l /dev/ | grep "etx_device"
bash
crw-rw-rw- 1 root root 246, 0 Aug 15 13:53 etx_device
bash
sudo rmmod driver
[Link] Creating Device File
In Linux, you can automate the creation of device files using udev, a device manager that dynamically handles
device nodes in the /dev directory. This method is simpler and reduces manual work. Below are the detailed
steps and concepts related to automatically creating device files.
#include <linux/device.h>
#include <linux/kdev_t.h>
● Use class_create() to create a device class, which organizes device entries in /sys/class/.
● Use device_create() to register the device with the class. This automatically creates a device file in /dev/.
if (IS_ERR(device_create(dev_class, NULL, dev, NULL, "etx_device"))) {
pr_err("Cannot create the Device\n");
goto r_device;
}
[Link] Up on Exit
device_destroy(dev_class, dev);
class_destroy(dev_class);
unregister_chrdev_region(dev, 1);
Program: Automatically Creating a Device File
Below is a simple Linux kernel module for automatic device file creation.
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kdev_t.h>
#include <linux/fs.h>
#include <linux/device.h>
dev_t dev = 0;
static struct class *dev_class;
/* Module init function */
static int __init hello_world_init(void)
{
/* Allocate Major Number */
if ((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) < 0) {
pr_err("Cannot allocate major number for device\n");
return -1;
}
pr_info("Major = %d Minor = %d\n", MAJOR(dev), MINOR(dev));
/* Create Struct Class */
dev_class = class_create(THIS_MODULE, "etx_class");
if (IS_ERR(dev_class)) {
pr_err("Cannot create the struct class for device\n");
goto r_class;
}
/* Create Device */
if (IS_ERR(device_create(dev_class, NULL, dev, NULL, "etx_device"))) {
pr_err("Cannot create the Device\n");
goto r_device;
}
pr_info("Kernel Module Inserted Successfully...\n");
return 0;
r_device:
class_destroy(dev_class);
r_class:
unregister_chrdev_region(dev, 1);
return -1;
}
/* Module exit function */
static void __exit hello_world_exit(void)
{
device_destroy(dev_class, dev);
class_destroy(dev_class);
unregister_chrdev_region(dev, 1);
pr_info("Kernel Module Removed Successfully...\n");
}
module_init(hello_world_init);
module_exit(hello_world_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("EmbeTronicX <embetronicx@[Link]>");
MODULE_DESCRIPTION("Simple linux driver (Automatically Creating a Device file)");
MODULE_VERSION("1.2");
Key Functions Explained
1.class_create()
● Creates a device class that appears in /sys/class/.
c
Copy code
struct class *class_create(struct module *owner, const char
*name);
● Parameters:
c
dev_class = class_create(THIS_MODULE, "etx_class");
2.device_create()
● Registers the device with the class and creates an entry in /dev/.
c
Copy code
struct device *device_create(struct class *class, struct device *parent, dev_t dev, void
*drvdata, const char *fmt, ...);
● Parameters:
c
device_create(dev_class, NULL, dev, NULL, "etx_device");
[Link] Functions
—----------------------------------------------------------------------------------------------------
Cdev structure and File Operations
Purpose: Character device driver provides a way for user-space applications (like programs
we write) to interact with hardware devices. These devices are typically associated with a
single character stream, like serial port or keyboard.
1. Cdev structure : This structure represents a character device in the kernel. It contains
information about the device, such as minor and major numbers. So the struct cdev is used to
represent a character device.
● It acts as a bridge between the kernel and the character device driver we write and the
struct cdev is like a “registration book” where the kernel notes down all the information it
needed to handle our character device.
● This structure is linked with the inode of the device.( An inode is a data structure in
the kernel uses to manage files.)
c
struct cdev {
struct kobject kobj; // Kernel object, used for sysfs interaction.
struct module *owner; // Module that owns this `cdev` (usually
`THIS_MODULE`).
const struct file_operations *ops; // Pointer to the file_operations structure.
struct list_head list; // Links multiple cdevs (not often used directly).
dev_t dev; // Device number (Major + Minor).
unsigned int count; // Number of device numbers associated with this
cdev.
};
[Link] allocation
We use cdev_alloc() to allocate memory for the struct cdev dynamically at runtime.
Code example:
Explanation:
cdev_alloc():
my_cdev->ops = &my_fops;:
● The ops field in struct cdev is assigned the address of a file_operations structure.
● This tells the kernel which functions to call when user-space interacts with the device (e.g.,
open, read, write).
● Useful when your driver needs to create multiple devices at runtime or when the number of
devices isn't fixed.
● Reduces kernel memory usage if the cdev is needed only under specific conditions.
● We must ensure that the dynamically allocated cdev is freed during cleanup (kfree() or
implicitly by cdev_del()).
[Link] Allocation:
In this method, we declare the struct cdev as a static/global variable, so its memory is
allocated at compile time and remains fixed.
Code Example:
Explanation:
Static Allocation
● The struct cdev is part of the driver’s global/static data.
● It is automatically allocated by the compiler and linked to the driver’s lifetime.
1. May consume unnecessary memory if the cdev is not always used (since it exists for
the driver’s entire lifetime).
Registration of cdev and Remove the unregister character device.
1. cdev_add()
This function registers a character device with the kernel, making it accessible to user
space through the device file (e.g., /dev/mydevice). It's a critical step in the lifecycle of
a struct cdev.
Parameters:
[Link] cdev *cdev
Pointer to the struct cdev object you want to add. This must already be initialized using
cdev_init().
2.dev_t dev
The device number (major and minor) for the character device. You typically allocate
this using alloc_chrdev_region() or set it manually with MKDEV().
Return Value:
● 0 on success.
● Negative error code (e.g., -ENOMEM, -EINVAL) on failure.
What It Does:
● Registers the device with the kernel so it knows about the device and associates it
with the provided dev_t number.
● Links the device number to the file operations defined in the struct cdev object.
● Once added, the device can be accessed via user-space tools (e.g., open(), read(),
write()).
Example:
// Assume 'my_cdev' is initialized and 'my_fops' is set
struct cdev my_cdev;
dev_t dev;
// Allocate device numbers
alloc_chrdev_region(&dev, 0, 1, "mydevice");
// Initialize cdev
cdev_init(&my_cdev, &my_fops);
// Add the cdev to the kernel
if (cdev_add(&my_cdev, dev, 1) < 0) {
pr_err("Failed to add cdev\n");
unregister_chrdev_region(dev, 1);
}
2. cdev_del()
This function removes a previously registered character device from the kernel. It is the counterpart to
cdev_add().
What It Does:
● Unregisters the device from the kernel, making it inaccessible to user space.
● Frees any internal resources allocated during cdev_add().
● After calling cdev_del(), the associated dev_t is no longer linked to the device, and operations like open()
will fail.
Parameters:
struct cdev *cdev
● Pointer to the struct cdev object to remove. This must be a device that was successfully
registered using cdev_add().
Return Value:
● None. This is a void function.
Without cdev_add(), the kernel doesn't know about your device. Without cdev_del(), the kernel might
still reference your device, leading to potential errors when unloading the driver.
Linux device driver example that demonstrates how to create, register, and
manage a character device.
/***************************************************************************//**
* \file driver.c
*
* \details Simple Linux device driver (File Operations)
*
* \author EmbeTronicX
*
* \Tested with Linux raspberrypi 5.10.27-v7l-embetronicx-custom+
*******************************************************************************/
#include <linux/kernel.h> // Kernel log functions
#include <linux/init.h> // __init and __exit macros
#include <linux/module.h> // Essential module macros
#include <linux/kdev_t.h> // Major and minor number macros
#include <linux/fs.h> // File operations structure
#include <linux/err.h> // Error handling functions
#include <linux/cdev.h> // Character device functions
#include <linux/device.h> // Device creation functions
/* Global Variables */
dev_t dev = 0; // Device major and minor numbers
static struct class *dev_class; // Device class
static struct cdev etx_cdev; // Character device structure
/*
** Function Prototypes
*/
static int __init etx_driver_init(void);
static void __exit etx_driver_exit(void);
static int etx_open(struct inode *inode, struct file *file);
static int etx_release(struct inode *inode, struct file *file);
static ssize_t etx_read(struct file *filp, char __user *buf, size_t len, loff_t *off);
static ssize_t etx_write(struct file *filp, const char __user *buf, size_t len, loff_t *off);
/* File Operations Structure */
static struct file_operations fops = {
.owner = THIS_MODULE, // Owner of the module
.read = etx_read, // Read operation
.write = etx_write, // Write operation
.open = etx_open, // Open operation
.release = etx_release, // Close operation
};
/* File Operation Functions */
/*
** Open function: Called when the device is opened
*/
static int etx_open(struct inode *inode, struct file *file) {
pr_info("Driver Open Function Called...!!!\n");
return 0; // Always succeeds
}
/*
** Release function: Called when the device is closed
*/
static int etx_release(struct inode *inode, struct file *file) {
pr_info("Driver Release Function Called...!!!\n");
return 0; // Always succeeds
}
/*
** Read function: Called when data is read from the device
*/
static ssize_t etx_read(struct file *filp, char __user *buf, size_t len, loff_t *off) {
pr_info("Driver Read Function Called...!!!\n");
return 0; // Indicates end-of-file
}
/*
** Write function: Called when data is written to the device
*/
static ssize_t etx_write(struct file *filp, const char __user *buf, size_t len, loff_t *off) {
pr_info("Driver Write Function Called...!!!\n");
return len; // Acknowledges the data length written
}
/* Module Initialization Function */
/*
** Module init: Sets up the device and registers it
*/
static int __init etx_driver_init(void) {
pr_info("Initializing the Device Driver...\n");
/* Allocate Major and Minor Numbers */
if ((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) < 0) {
pr_err("Cannot allocate major number\n");
return -1;
}
pr_info("Major = %d, Minor = %d\n", MAJOR(dev), MINOR(dev));
/* Initialize the cdev Structure */
cdev_init(&etx_cdev, &fops);
/* Add the cdev to the Kernel */
if (cdev_add(&etx_cdev, dev, 1) < 0) {
pr_err("Cannot add the device to the system\n");
goto r_class;
}
/* Create a Class */
if (IS_ERR(dev_class = class_create(THIS_MODULE, "etx_class"))) {
pr_err("Cannot create the struct class\n");
goto r_class;
}
/* Create a Device Node in /dev */
if (IS_ERR(device_create(dev_class, NULL, dev, NULL, "etx_device"))) {
pr_err("Cannot create the Device\n");
goto r_device;
}
pr_info("Device Driver Inserted Successfully...!!!\n");
return 0;
r_device:
class_destroy(dev_class);
r_class:
unregister_chrdev_region(dev, 1);
return -1;
}
/* Module Exit Function */
/*
** Module exit: Cleans up the device and unregisters it
*/
static void __exit etx_driver_exit(void) {
/* Destroy the Device Node */
device_destroy(dev_class, dev);
/* Destroy the Class */
class_destroy(dev_class);
/* Remove the cdev from the Kernel */
cdev_del(&etx_cdev);
/* Unregister Major and Minor Numbers */
unregister_chrdev_region(dev, 1);
pr_info("Device Driver Removed Successfully...!!!\n");
}
/* Module Metadata */
module_init(etx_driver_init);
module_exit(etx_driver_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("EmbeTronicX <embetronicx@[Link]>");
MODULE_DESCRIPTION("Simple Linux device driver (File Operations)");
MODULE_VERSION("1.3");
1. Include Necessary Headers
These headers provide access to essential kernel APIs, such as device registration,
logging, and module initialization.
● dev_t dev: Stores the major and minor numbers assigned to the device.
● struct class *dev_class: Represents a device class for grouping related devices under
/sys/class.
● struct cdev etx_cdev: Represents the character device.
3. File Operations
● fops: Links the file operations (e.g., open, read, write, release) to this device driver.
● Each function defined here will be invoked when the corresponding operation is
performed on the device file.
File Operation Functions:
/* File Operation Functions */
/*
** Open function: Called when the device is opened
*/
static int etx_open(struct inode *inode, struct file *file) {
pr_info("Driver Open Function Called...!!!\n");
return 0; // Always succeeds
}
/*
** Release function: Called when the device is closed
*/
static int etx_release(struct inode *inode, struct file *file) {
pr_info("Driver Release Function Called...!!!\n");
return 0; // Always succeeds
}
/*
** Read function: Called when data is read from the device
*/
static ssize_t etx_read(struct file *filp, char __user *buf, size_t len, loff_t *off) {
pr_info("Driver Read Function Called...!!!\n");
return 0; // Indicates end-of-file
}
/*
** Write function: Called when data is written to the device
*/
static ssize_t etx_write(struct file *filp, const char __user *buf, size_t len, loff_t *off) {
pr_info("Driver Write Function Called...!!!\n");
return len; // Acknowledges the data length written
}
c
pr_info("Driver Open Function Called...!!!\n");
Logs that the device was opened.
2.etx_release(): Called when the device file is closed.
c
pr_info("Driver Release Function Called...!!!\n");
Logs that the device was closed.
c
pr_info("Driver Read Function Called...!!!\n");
return 0;
Logs the read request and returns 0 (no data is returned in this example).
c
pr_info("Driver Write Function Called...!!!\n");
return len;
Logs the write request and returns the length of the data written.
4. Module Initialization (etx_driver_init)
/* Module Initialization Function */
/*
** Module init: Sets up the device and registers it
*/
static int __init etx_driver_init(void) {
pr_info("Initializing the Device Driver...\n");
/* Allocate Major and Minor Numbers */
if ((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) < 0) {
pr_err("Cannot allocate major number\n");
return -1;
}
pr_info("Major = %d, Minor = %d\n", MAJOR(dev), MINOR(dev));
/* Initialize the cdev Structure */
cdev_init(&etx_cdev, &fops);
/* Add the cdev to the Kernel */
if (cdev_add(&etx_cdev, dev, 1) < 0) {
pr_err("Cannot add the device to the system\n");
goto r_class;
}
/* Create a Class */
if (IS_ERR(dev_class = class_create(THIS_MODULE, "etx_class"))) {
pr_err("Cannot create the struct class\n");
goto r_class;
}
/* Create a Device Node in /dev */
if (IS_ERR(device_create(dev_class, NULL, dev, NULL, "etx_device"))) {
pr_err("Cannot create the Device\n");
goto r_device;
}
pr_info("Device Driver Inserted Successfully...!!!\n");
return 0;
r_device:
class_destroy(dev_class);
r_class:
unregister_chrdev_region(dev, 1);
return -1;
}
This function runs when the module is loaded using insmod (__init etx_driver_init(void) )
Step-by-step Flow:
[Link] Major and Minor Numbers:
c
if((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) < 0){
pr_err("Cannot allocate major number\n");
return -1;
}
pr_info("Major = %d Minor = %d \n", MAJOR(dev), MINOR(dev));
c
if((cdev_add(&etx_cdev, dev, 1)) < 0){
pr_err("Cannot add the device to the system\n");
goto r_class;
}
● Registers the device with the kernel.
● Links the device number to the etx_cdev.
[Link] a Device Class:
c
if(IS_ERR(dev_class = class_create(THIS_MODULE, "etx_class"))){
pr_err("Cannot create the struct class\n");
goto r_class;
}
c
if(IS_ERR(device_create(dev_class, NULL, dev, NULL, "etx_device"))){
pr_err("Cannot create the Device 1\n");
goto r_device;
}
Error Handling:
● If any step fails, the code releases allocated resources using goto.
[Link] Exit
/* Module Exit Function */
/*
** Module exit: Cleans up the device and unregisters it
*/
static void __exit etx_driver_exit(void) {
/* Destroy the Device Node */
device_destroy(dev_class, dev);
/* Destroy the Class */
class_destroy(dev_class);
/* Remove the cdev from the Kernel */
cdev_del(&etx_cdev);
/* Unregister Major and Minor Numbers */
unregister_chrdev_region(dev, 1);
pr_info("Device Driver Removed Successfully...!!!\n");
}
5. Module Exit (etx_driver_exit)
This function runs when the module is removed using rmmod.
Step-by-step Flow:
[Link] the Device File:
c
device_destroy(dev_class, dev);
Removes the /dev/etx_device file.
c
unregister_chrdev_region(dev, 1);
Free the device numbers.
6. Module Macros
c
Copy code
module_init(etx_driver_init);
module_exit(etx_driver_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("EmbeTronicX <embetronicx@[Link]>");
MODULE_DESCRIPTION("Simple Linux device driver (File Operations)");
MODULE_VERSION("1.3");
● module_init: Specifies the initialization function to run when the module is loaded.
● module_exit: Specifies the cleanup function to run when the module is removed.
● Metadata: Includes the license, author, description, and version.
Program Flow Overview
[Link] the Driver (Initialization):
bash
make
[Link] Logs:
bash
dmesg
Creating A Real Device
So we learn about major and minor numbers, device files and file operations of device
drivers using dummy drivers. But today we are going to write a real driver without
hardware.
We know that in linux everything is a file and we are going to develop two applications
as follow:
1. User space application(User program).
[Link] space program(driver.)
The user program will communicate with the kernel space program using the device
file.
Problem statement:
In this driver we can send string or data to the kernel device driver using the write
function. It will store the string in the kernel space. Then when we read the device file it
will send the data which is written by write by function to the user space.
[Link]():
Kmalloc is a function used to allocate the memory in kernel space. This is like the
malloc() function in userspace. Its kmalloc function is used to dynamically allocate the
memory in kernel space.
#include <linux/slab.h>
void *kmalloc(size_t size, gfp_t flags);
Arguments:
1.#include<linux/slab.h>
This directive in the linux kernel module includes the necessary definitions and
functions for memory allocation and deallocation in kernel space.
[Link]
The number of bytes I want to allocate.
[Link]
Determines the behaviour of memory allocation. Common flag include:
2.GFP_ATOMIC:
● Used in critical contexts where sleeping is not allowed (e.g., inside interrupt handlers).
● Allocates memory from emergency pools if necessary.
3.GFP_USER:
● Used when memory is allocated on behalf of a user process.
● May sleep.
4.GFP_NOWAIT:
● Allocation does not sleep and returns immediately if memory is unavailable.
5.GFP_DMA:
● Allocates memory suitable for DMA (Direct Memory Access) operations.
Required for devices needing specific physical memory regions.
6.GFP_NOFS:
● Prevents filesystem calls during memory allocation.
c) when the kmalloc allocates memory it does not clear or reset the memory it provides.
● The memory might still contain data left over from its previous use.
● We need to clear it if required using memset() or similar methods.
Example:
char *buffer = kmalloc(100, GFP_KERNEL);
if (buffer) {
memset(buffer, 0, 100); // Clear the memory to set it to zero.
}
memset() funcion: sets a block of memory to a specific value (e.g., filling with zeros or
any other byte) and It only changes the contents of the memory, but the memory remains
[Link] can still use the memory after calling memset().
[Link] function :
The kfree() function is used to release memory that was previously allocated using
kmalloc().
#include <linux/slab.h>
void kfree(const void *objp);
objp: A pointer to the memory block that was returned by kmalloc()
What it does:
● Releases the allocated memory back to the system so that it can be reused.
● After calling kfree(), you can no longer use that memory (the pointer becomes invalid).
Example:
c
char *buffer = kmalloc(100, GFP_KERNEL);
kfree(buffer); // Frees the memory
If you try to access buffer after kfree(), it can cause a crash or undefined behavior.
3.copy_from_user():
● This function is used in linux programming to copy data from user space(application
level to memory) to kernel space(kernel - level memory).
● Its simply transfer data from a user application to the kernel and used when a user
program communicates with a kernel module or driver pasing data like commands or
config.
Function systex:
unsigned long copy_from_user(void *to, const void __user *from, unsigned long n);
[Link]:
The destination buffer in the kernel space where the data will be copied to.
[Link]:
The source buffer in user space that contains the data you want to copy.
3.n:
Return Value
● 0: All bytes were successfully copied.
● Non-zero: The number of bytes that could not be copied.
4.copy_to_user()
This function is used to Copy a block of data into userspace (Copy data from kernel
space to user space).
unsigned long copy_to_user(const void __user *to, const void *from, unsigned long n);
Arguments
Returns a number of bytes that could not be copied. On success, this will be zero.
Kernel space code:
/***************************************************************************//**
* \file driver.c
*
* \details Simple Linux device driver (Real Linux Device Driver)
*
* \author Vicky
*
* \Tested with Linux Beaglebone black*
*******************************************************************************/
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kdev_t.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include<linux/slab.h> // kmalloc() for memory allocation
#include<linux/uaccess.h> // copy_to/from_user() for data transfer
#include <linux/err.h>
#define mem_size 1024 // Memory Size for the kernel buffer
dev_t dev = 0; // Declare the device number
static struct class *dev_class; // Declare a pointer to class
static struct cdev etx_cdev; // Declare the character device structure
uint8_t *kernel_buffer; // Declare a pointer for the kernel buffer
/*
** Function Prototypes for the operations in the driver
*/
static int __init etx_driver_init(void);
static void __exit etx_driver_exit(void);
static int etx_open(struct inode *inode, struct file *file);
static int etx_release(struct inode *inode, struct file *file);
static ssize_t etx_read(struct file *filp, char __user *buf, size_t len, loff_t *off);
static ssize_t etx_write(struct file *filp, const char *buf, size_t len, loff_t *off);
/*
** File Operations structure that defines how the driver interacts with the device
*/
static struct file_operations fops =
{
.owner = THIS_MODULE, // Defines module ownership
.read = etx_read, // Read operation
.write = etx_write, // Write operation
.open = etx_open, // Open operation
.release = etx_release, // Release operation
};
/*
** This function will be called when we open the Device file
*/
static int etx_open(struct inode *inode, struct file *file)
{
pr_info("Device File Opened...!!!\n");
return 0; // Return 0 if open is successful
}
/*
** This function will be called when we close the Device file
*/
static int etx_release(struct inode *inode, struct file *file)
{
pr_info("Device File Closed...!!!\n");
return 0; // Return 0 if close is successful
}
/*
** This function will be called when we read the Device file
*/
static ssize_t etx_read(struct file *filp, char __user *buf, size_t len, loff_t *off)
{
// Copy the data from kernel space to user space
if( copy_to_user(buf, kernel_buffer, mem_size) )
{
pr_err("Data Read: Error in copying data to user space!\n");
}
pr_info("Data Read: Done!\n");
return mem_size; // Return the number of bytes read
}
/*
** This function will be called when we write to the Device file
*/
static ssize_t etx_write(struct file *filp, const char __user *buf, size_t len, loff_t *off)
{
// Copy the data from user space to kernel space
if( copy_from_user(kernel_buffer, buf, len) )
{
pr_err("Data Write: Error in copying data from user space!\n");
}
pr_info("Data Write: Done!\n");
return len; // Return the number of bytes written
}
/*
** Module Init function - Called when the module is loaded into the kernel
*/
static int __init etx_driver_init(void)
{
/* Allocating a Major number for the device */
if((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) < 0){
pr_info("Cannot allocate major number\n");
return -1;
}
pr_info("Major = %d Minor = %d \n", MAJOR(dev), MINOR(dev));
/* Creating cdev structure */
cdev_init(&etx_cdev, &fops);
/* Adding the character device to the system */
if((cdev_add(&etx_cdev, dev, 1)) < 0){
pr_info("Cannot add the device to the system\n");
goto r_class;
}
/* Creating struct class for the device */
if(IS_ERR(dev_class = class_create(THIS_MODULE, "etx_class"))){
pr_info("Cannot create the struct class\n");
goto r_class;
}
/* Creating the device */
if(IS_ERR(device_create(dev_class, NULL, dev, NULL, "etx_device"))){
pr_info("Cannot create the device\n");
goto r_device;
}
/* Allocating physical memory for the kernel buffer */
if((kernel_buffer = kmalloc(mem_size , GFP_KERNEL)) == 0){
pr_info("Cannot allocate memory in kernel\n");
goto r_device;
}
strcpy(kernel_buffer, "Hello_World"); // Initialize the buffer with a string
pr_info("Device Driver Insert: Done!!!\n");
return 0; // Return success
r_device:
class_destroy(dev_class); // Cleanup if device creation failed
r_class:
unregister_chrdev_region(dev, 1); // Unregister the device number
return -1; // Return failure
}
/*
** Module Exit function - Called when the module is unloaded from the kernel
*/
static void __exit etx_driver_exit(void)
{
kfree(kernel_buffer); // Free the allocated memory
device_destroy(dev_class, dev); // Destroy the device
class_destroy(dev_class); // Destroy the class
cdev_del(&etx_cdev); // Delete the character device
unregister_chrdev_region(dev, 1); // Unregister the device number
pr_info("Device Driver Remove: Done!!!\n");
}
module_init(etx_driver_init); // Register the module init function
module_exit(etx_driver_exit); // Register the module exit function
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Vicky"); // Updated author name
MODULE_DESCRIPTION("Simple Linux device driver (Real Linux Device Driver)");
MODULE_VERSION("1.4");
Instruction:
[Link] the driver first then run the user-space application.
[Link] see the device loaded enter ls /dev/etx_driver.
User-space application.
/***************************************************************************//**
* \file test_app.c
*
* \details Userspace application to test the Device driver
*
* \author Vicky // Updated author name
*
* \Tested with Linux raspberrypi 5.10.27-v7l-embetronicx-custom+
*
*******************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int8_t write_buf[1024]; // Buffer for writing data to the device
int8_t read_buf[1024]; // Buffer for reading data from the device
int main()
{
int fd;
char option;
// Display initial information
printf("*********************************\n");
printf("*******[Link]*******\n");
// Open the device file for reading and writing
fd = open("/dev/etx_device", O_RDWR);
if(fd < 0) {
// If the device file can't be opened, display an error message
printf("Cannot open device file...\n");
return 0;
}
// Main menu loop
while(1) {
// Display options for the user
printf("****Please Enter the Option******\n");
printf(" 1. Write \n");
printf(" 2. Read \n");
printf(" 3. Exit \n");
printf("*********************************\n");
// Get the user's option
scanf(" %c", &option);
printf("Your Option = %c\n", option);
switch(option) {
case '1':
// Write data to the driver
printf("Enter the string to write into driver :");
scanf(" %[^\t\n]s", write_buf); // Read a string with spaces
printf("Data Writing ...");
write(fd, write_buf, strlen(write_buf)+1); // Write to the device
printf("Done!\n");
break;
case '2':
// Read data from the driver
printf("Data Reading ...");
read(fd, read_buf, 1024); // Read from the device
printf("Done!\n\n");
printf("Data = %s\n\n", read_buf); // Display the read data
break;
case '3':
// Close the file descriptor and exit the program
close(fd);
exit(1);
break;
default:
// If the user enters an invalid option
printf("Enter Valid option = %c\n", option);
break;
}
}
// Close the file descriptor (this line will never be reached due to exit in option 3)
close(fd);
}
Just run and seen the drivers and user space application.
IOCTL in linux (I/O control)
The operating system divides memory into two main areas: kernel space and user space.
● Kernel Space: This is where the core parts of the operating system, such as the kernel
and device drivers, run. It is protected to ensure system stability and security, meaning
user applications cannot directly access this area.
● User Space: This is where user applications and programs run, such as text editors,
browsers, and games. The OS can swap data from this area to disk when more memory
is needed.
Communication between these two spaces happens through various mechanisms, allowing
user applications to interact with the kernel or device drivers. These methods include:
These methods enable user applications to control, query, or communicate with the kernel,
hardware, or the underlying OS, providing flexibility in managing device interactions,
configurations, and network tasks.
Introduction to IOCTL
IOCTL (Input/Output Control) is a system call used to communicate with device drivers in
Linux. It is widely used when specific device operations cannot be handled by standard
system calls like read() or write(). IOCTL allows user applications to send commands to
devices and perform operations that require kernel-level interaction.
● 'magic': A unique identifier for your IOCTL commands (often a character or number).
● 'command': A number to distinguish different commands.
● 'data type': The type of data that the IOCTL command will use.
There are four main types for IOCTL commands:
● IO: No parameters.
● IOW: Command with data to write to the driver (copy_from_user).
● IOR: Command that reads data from the driver (copy_to_user).
● IOWR: Command that both reads and writes data.
Example:
c
#define WR_VALUE _IOW('a', 'a', int32_t*)
#define RD_VALUE _IOR('a', 'b', int32_t*)
c
#include <linux/ioctl.h>
c
int ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long
arg)
Parameters:
c
static long etx_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
case WR_VALUE:
// Write value from user to driver
copy_from_user(&value, (int32_t*)arg, sizeof(value));
break;
case RD_VALUE:
// Read value from driver to user
copy_to_user((int32_t*)arg, &value, sizeof(value));
break;
default:
pr_info("Invalid command\n");
break;
}
return 0;
}
In the file operations structure, you associate the IOCTL function with the driver's unlocked_ioctl
field:
c
static struct file_operations fops = {
.unlocked_ioctl = etx_ioctl,
// Other file operations like read, write, etc.
};
c
#define WR_VALUE _IOW('a', 'a', int32_t*)
#define RD_VALUE _IOR('a', 'b', int32_t*)
4. Use IOCTL System Call in Userspace
Finally, in the userspace application, we call the IOCTL system call to interact with the device. This
is how we send commands to the driver from userspace.
c
long ioctl(int fd, unsigned int cmd, unsigned long arg);
Where:
● fd: The file descriptor of the device (opened with open()).
● cmd: The IOCTL command to be executed (e.g., WR_VALUE, RD_VALUE).
● arg: The arguments passed to the IOCTL command.
Example:
c
ioctl(fd, WR_VALUE, (int32_t*)&number); // Write data to driver
ioctl(fd, RD_VALUE, (int32_t*)&value); // Read data from driver
Summary
Thorough Testing: Test extensively, including edge cases and concurrent calls.
Follow Standards: Adhere to Linux kernel coding guidelines for clarity and maintainability.
Kernel Space Code:
/***************************************************************************//**
* \file new_driver.c
*
* \details Enhanced Linux device driver (IOCTL)
*
* \author YourName
*
* \Tested with Linux kernel 5.15.0-embedded-custom
*
*******************************************************************************/
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kdev_t.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include <linux/slab.h> // kmalloc()
#include <linux/uaccess.h> // copy_to/from_user()
#include <linux/ioctl.h>
#define WR_VALUE _IOW('b','x',int32_t*)
#define RD_VALUE _IOR('b','y',int32_t*)
int32_t value = 0;
dev_t dev = 0;
static struct class *dev_class;
static struct cdev my_cdev;
/* Function Prototypes */
static int __init my_driver_init(void);
static void __exit my_driver_exit(void);
static int my_open(struct inode *inode, struct file *file);
static int my_release(struct inode *inode, struct file *file);
static ssize_t my_read(struct file *filp, char __user *buf, size_t len,loff_t * off);
static ssize_t my_write(struct file *filp, const char __user *buf, size_t len, loff_t * off);
static long my_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
/* File operations structure */
static struct file_operations fops =
{
.owner = THIS_MODULE,
.read = my_read,
.write = my_write,
.open = my_open,
.unlocked_ioctl = my_ioctl,
.release = my_release,
};
/* Device open function */
static int my_open(struct inode *inode, struct file *file)
{
pr_info("My Device File Opened\n");
return 0;
}
/* Device release function */
static int my_release(struct inode *inode, struct file *file)
{
pr_info("My Device File Closed\n");
return 0;
}
/* Device read function */
static ssize_t my_read(struct file *filp, char __user *buf, size_t len, loff_t *off)
{
pr_info("My Read Function\n");
return 0;
}
/* Device write function */
static ssize_t my_write(struct file *filp, const char __user *buf, size_t len, loff_t *off)
{
pr_info("My Write Function\n");
return len;
}
/* IOCTL function */
static long my_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch(cmd) {
case WR_VALUE:
if (copy_from_user(&value, (int32_t*)arg, sizeof(value))) {
pr_err("Error writing data\n");
}
pr_info("Value Written: %d\n", value);
break;
case RD_VALUE:
if (copy_to_user((int32_t*)arg, &value, sizeof(value))) {
pr_err("Error reading data\n");
}
break;
default:
pr_info("Invalid Command\n");
break;
}
return 0;
}
/* Module initialization */
static int __init my_driver_init(void)
{
/* Allocate Major and Minor numbers */
if ((alloc_chrdev_region(&dev, 0, 1, "my_device")) < 0) {
pr_err("Failed to allocate major number\n");
return -1;
}
pr_info("Major: %d, Minor: %d\n", MAJOR(dev), MINOR(dev));
/* Create cdev structure */
cdev_init(&my_cdev, &fops);
/* Add cdev to the system */
if ((cdev_add(&my_cdev, dev, 1)) < 0) {
pr_err("Failed to add cdev\n");
goto r_class;
}
/* Create struct class */
if (IS_ERR(dev_class = class_create(THIS_MODULE, "my_class"))) {
pr_err("Failed to create class\n");
goto r_class;
}
/* Create device */
if (IS_ERR(device_create(dev_class, NULL, dev, NULL, "my_device"))) {
pr_err("Failed to create device\n");
goto r_device;
}
pr_info("Device Driver Inserted Successfully\n");
return 0;
r_device:
class_destroy(dev_class);
r_class:
unregister_chrdev_region(dev, 1);
return -1;
}
/* Module cleanup */
static void __exit my_driver_exit(void)
{
device_destroy(dev_class, dev);
class_destroy(dev_class);
cdev_del(&my_cdev);
unregister_chrdev_region(dev, 1);
pr_info("Device Driver Removed Successfully\n");
}
module_init(my_driver_init);
module_exit(my_driver_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Vicky");
MODULE_DESCRIPTION("Enhanced Linux Device Driver with IOCTL");
MODULE_VERSION("2.0");
User Space Code:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <stdint.h>
#define WR_VALUE _IOW('a', 'a', int32_t*)
#define RD_VALUE _IOR('a', 'b', int32_t*)
int main()
{
int fd;
int32_t value, number;
printf("Opening Device...\n");
fd = open("/dev/my_device", O_RDWR);
if (fd < 0) {
perror("Cannot open device");
return -1;
}
printf("Enter the value to send: ");
scanf("%d", &number);
printf("Writing value to the device...\n");
if (ioctl(fd, WR_VALUE, &number) < 0) {
perror("IOCTL Write Error");
close(fd);
return -1;
}
printf("Reading value from the device...\n");
if (ioctl(fd, RD_VALUE, &value) < 0) {
perror("IOCTL Read Error");
close(fd);
return -1;
}
printf("Value received from device: %d\n", value);
printf("Closing Device...\n");
close(fd);
return 0;
}
Steps to Create and Run the User Space Code
Load the Kernel Module:
dmesg | tail
./user_ioctl
Follow the prompts to send and receive data via the ioctl interface.
[Link] Operation:
dmesg | tail
[Link] Up:
After testing, remove the device file and unload the module:
sudo rm /dev/my_device
sudo rmmod my_driver
Procfs In Linux
Process File System ----it is a runtime interface.
The procfs (Process File System) is a virtual filesystem in Linux that provides an interface to kernel
data structures. It does not correspond to physical files on disk but is created dynamically in memory
when the system boots.
The procfs is typically mounted at /proc and allows users and applications to query and sometimes
modify system and process-specific information.
Key Points:
procfs are a virtual filesystem in linux that shows information about sysstem and running processes.
it is not stored on a disk insted it is created in memory while the system is running.
we can find it mounted at /proc in linux.
Examples:
Some files allow writing to adjust kernel parameters dynamically without requiring a reboot.
Some Example: /proc/ file is provided information such as we have one file called "meminfo" That
gives the details of memory used in system just type following command :
cat /proc/meminfo
more examples:
[Link] /proc/module - give info about all modules that are part of kernel.
most important commad is lsmod :- its show the status of modules that are running on kernel
modules. all the module that are loaded .
Some of other files inside /proc/ provide a info that most are read-only as given follow:
● /proc/devices — registered character and block major numbers
● /proc/iomem — on-system physical RAM and bus device addresses
● /proc/ioports — on-system I/O port addresses (especially for x86 systems)
● /proc/interrupts — registered interrupt request numbers
● /proc/softirqs — registered soft IRQs
● /proc/swaps — currently active swaps
● /proc/kallsyms — running kernel symbols, including from loaded modules
● /proc/partitions — currently connected block devices and their partitions
● /proc/filesystems — currently active filesystem drivers
● /proc/cpuinfo — information about the CPU(s) on the system.
In some case we can write the proc files.
Main features:
The proc file system is also very useful when we want to debug a kernel module. While debugging
we might want to know the values of various variables in the module or maybe the data that the
module is handling. In such situations, we can create a proc entry for ourselves and dump whatever
data we want to look into in the entry.
whatever data in user space to kernel space we write changes are temporary onces the system reboot
the data is change or [Link] depending on that we have two kinds of proc entries.
[Link] entry that only reads data from the kernel space.
[Link] entry that reads as well as writes data into and from kernel space.
Parameters:
● name: Name of the directory.
● parent: Parent directory under /proc. If NULL, the directory is created at the root of /proc.
Creating Proc Files
This involves adding specific files under /proc (or its subdirectories) to expose or interact with your
kernel module's information.
c
Copy code
struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry
*parent, const struct file_operations *proc_fops);
Parameters:
File Operations
For proc entries, file operations are defined in:
//for opening
static int open_proc(struct inode *inode, struct file *file) {
printk(KERN_INFO "Proc file opened");
return 0;
}
For release a proc
static int release_proc(struct inode *inode, struct file *file) {
printk(KERN_INFO "Proc file released");
return 0;
}
[Link] Operation
Data can be written to the kernel using copy_from_user:
static ssize_t write_proc(struct file *filp, const char *buff, size_t len, loff_t *off) {
printk(KERN_INFO "Proc file write");
copy_from_user(etx_array, buff, len);
return len;
}
[Link] Operation
Data can be read from the kernel using copy_to_user:
static ssize_t read_proc(struct file *filp, char __user *buffer, size_t length, loff_t *offset)
{
if (copy_to_user(buffer, etx_array, 20)) {
pr_err("Data read error");
}
return length;
}
For example:
remove_proc_entry("etx_proc", NULL);
To remove entire directories:
proc_remove(parent);
Wait Queues in Linux
Wait Queues are a kernel mechanism used to put a process to sleep until a certain condition becomes
true. This allows the CPU to perform other tasks while a process waits for an event to occur. Once the
event happens, the process is woken up and resumes its operation.
Key Concepts:
● To efficiently handle situations where a process cannot proceed until an event occurs.
● To prevent busy waiting, where the CPU is unnecessarily consumed by repeatedly checking for
an event.
Example:
[Link]-Process Communication (IPC)
● What happens? One process wants to send data to another process but must wait for the other
process to read the current data first.
● How does it work? The sender sleeps while waiting. Once the receiver processes the data, the
sender wakes up to send more data.
2. Multithreading in Kernel
● What happens? Multiple threads are sharing a resource (like a memory buffer). A thread may
need to wait if the resource is busy.
● How does it work? The thread sleeps until the resource is free. Once space becomes available, the
thread is woken up to use the resource.
These examples, wait queues help processes avoid wasting CPU time by sleeping until they get a
signal that the event they’re waiting for has occurred. This makes the system more efficient and
faster.
There are three key steps involved in using waitqueues:
[Link] a Waitqueue
[Link] (Putting Tasks to Sleep)
[Link] Up Queued Tasks
1. Initializing a Waitqueue
Before using a waitqueue in a Linux kernel, you need to create and initialize it. This sets up the
waitqueue structure so that processes can be added to it or woken up later.
Waitqueues are used in the Linux kernel to manage processes that need to wait for certain events to
occur. Proper initialization is crucial for their use. There are two types of waitqueue initialization:
static initialization and dynamic initialization.
To use a waitqueue, it must first be initialized. Include the header file:
c
#include <linux/wait.h>
Waitqueues are used in the Linux kernel to manage processes that need to wait for certain events to
occur. Proper initialization is crucial for their use. There are two types of waitqueue initialization:
static initialization and dynamic initialization.
1. Static Initialization
Static initialization sets up the waitqueue at the time of declaration. It is simple and the waitqueue is
ready for use immediately after declaration.
Advantages
c
wait_queue_head_t wq; // Step 1: Declare
init_waitqueue_head(&wq); // Step 2: Initialize
1.wait_queue_head_t wq;:
● Declares a waitqueue named wq without initializing it.
2.init_waitqueue_head(&wq);:
● Initializes the declared waitqueue wq.
● Makes it ready for use.
Use Case
● Used when the waitqueue is needed only conditionally or later in the code.
● Preferred for dynamically created or allocated waitqueues.
Advantages
Visual Analogy
● Static Initialization: Like a pre-assembled, ready-to-use tool that doesn’t require setup.
● Dynamic Initialization: Like a modular tool you need to assemble before using.
Queuing (Putting Tasks to Sleep) in Linux Kernel
Queuing involves making a process sleep on a waitqueue until a specific condition becomes true.
This mechanism allows efficient CPU usage by avoiding busy waiting.
Linux provides several macros to implement this functionality based on the requirements. Each
macro has specific behavior and return values.
a) wait_event
Purpose
● The process get sleep untile an condition not get change.
● Puts the process to sleep in a TASK_UNINTERRUPTIBLE state until a specified condition
evaluates to true.
Syntax
wait_event(wq, condition);
where:
● wq: The waitqueue the process will sleep on.
● condition: A boolean expression. The process sleeps until this condition evaluates to true.
Key Features
● The condition is checked every time the wait queue is woken up.
● The process cannot be interrupted by signals while sleeping.
Use Case
When the process should only wake up after the event occurs and cannot be interrupted while running
the other process in the processor.
Example
DECLARE_WAIT_QUEUE_HEAD(wq);
void example_function(void) {
int condition_met = 0;
// Wait until the condition becomes true
wait_event(wq, condition_met == 1);
// Continue execution when condition_met becomes true
}
b) wait_event_timeout
Purpose
● Puts the process to sleep in a TASK_UNINTERRUPTIBLE state until the condition becomes
true or a timeout occurs.
Syntax
wait_event_timeout(wq, condition, timeout);
Return Values
● 0: The condition was false, and the timeout occurred.
● 1: The condition became true after the timeout elapsed.
● Remaining jiffies: The condition became true before the timeout.
Use Case
When the process should wake up either on event occurrence or after a specific timeout.
Example:
DECLARE_WAIT_QUEUE_HEAD(wq);
void example_function(void) {
int condition_met = 0;
long timeout = 100; // Timeout in jiffies
// Wait until condition_met becomes true or timeout occurs
long remaining = wait_event_timeout(wq, condition_met == 1,
timeout);
if (remaining > 0)
printk("Condition met before timeout.\n");
else
printk("Timeout occurred.\n");
}
c) wait_event_cmd
Purpose
Puts the process to sleep until a condition is true, executing specified commands before and after
sleeping.
Syntax
Use Case
When additional setup or cleanup operations are needed around the sleep.
Example
DECLARE_WAIT_QUEUE_HEAD(wq);
void example_function(void) {
int condition_met = 0;
wait_event_cmd(wq, condition_met == 1,
printk("Preparing to sleep...\n"),
printk("Woke up!\n"));
}
d) wait_event_interruptible
Purpose
Puts the process to sleep in a TASK_INTERRUPTIBLE state, allowing it to be interrupted by signals.
Syntax
wait_event_interruptible(wq, condition);
Return Values
0: The condition became true.
-ERESTARTSYS: The process was interrupted by a signal.
Use Case
When the process must remain responsive to user signals while waiting for an event.
Example:
DECLARE_WAIT_QUEUE_HEAD(wq);
void example_function(void) {
int condition_met = 0;
int ret = wait_event_interruptible(wq, condition_met == 1);
if (ret == -ERESTARTSYS)
printk("Interrupted by signal.\n");
else
printk("Condition met.\n");
}
e) wait_event_interruptible_timeout
Purpose
● Puts the process to sleep in a TASK_INTERRUPTIBLE state until the condition
becomes true, a timeout occurs, or the process is interrupted.
Syntax
● wait_event_interruptible_timeout(wq, condition, timeout);
● timeout: Timeout duration, specified in jiffies.
Return Values
● 0: The condition was false, and the timeout occurred.
● 1: The condition became true after the timeout elapsed.
● Remaining jiffies: The condition became true before the timeout.
● -ERESTARTSYS: The process was interrupted by a signal.
Use Case
When both timeout and interruptibility are required.
Example:
DECLARE_WAIT_QUEUE_HEAD(wq);
void example_function(void) {
int condition_met = 0;
long timeout = 100;
int ret = wait_event_interruptible_timeout(wq, condition_met == 1, timeout);
if (ret == -ERESTARTSYS)
printk("Interrupted by signal.\n");
else if (ret == 0)
printk("Timeout occurred.\n");
else
printk("Condition met before timeout.\n");
}
f) wait_event_killable
Purpose
Puts the process to sleep in a TASK_KILLABLE state, allowing it to be killed by certain
signals.
Syntax
wait_event_killable(wq, condition);
Return Values
0: The condition became true.
-ERESTARTSYS: The process was interrupted by a kill signal.
Use Case
When the process should be terminated only by specific signals (e.g., SIGKILL).
Example
DECLARE_WAIT_QUEUE_HEAD(wq);
void example_function(void) {
int condition_met = 0;
int ret = wait_event_killable(wq, condition_met == 1);
if (ret == -ERESTARTSYS)
printk("Killed by signal.\n");
else
printk("Condition met.\n");
}
3. Waking Up Queued Tasks
When a task is waiting for an event to happen (sleeping), it can be woken up using specific functions.
Here’s how each function works:
a) wake_up
● What it does: Wakes up one task that is sleeping in the TASK_UNINTERRUPTIBLE state.
● When to use: When you want to wake up one task that can’t be interrupted by signals while
sleeping.
Example:
wake_up(&wq);
If you have a task waiting for an event using wait_event(wq, condition); and the condition becomes
true, this will wake up the task.
b) wake_up_all
● What it does: Wakes up all tasks that are sleeping in the TASK_UNINTERRUPTIBLE state.
● When to use: When you want to wake up all tasks waiting on the same event.
Example:
wake_up_all(&wq);
This will wake up every task waiting on the wq waitqueue, allowing all of them to proceed once the
event happens.
c) wake_up_interruptible
● What it does: Wakes up one task that is sleeping in the TASK_INTERRUPTIBLE state,
meaning it can be interrupted by signals.
● When to use: When you want to wake up a task that may have been waiting with the
possibility of being interrupted by a signal.
Example:
wake_up_interruptible(&wq);
If a task was waiting using wait_event_interruptible(wq, condition);, this function will wake it up
once the condition is true.
wake_up_sync(&wq);
wake_up_interruptible_sync(&wq);
These will wake up the tasks, but the CPU won’t immediately reschedule them, allowing the current
task to finish some additional work first.
—-----------------------------------------------------------------------------------------------------------------------
1. What is sysfs?
sysfs is a special filesystem in Linux that the kernel uses to communicate information about devices,
drivers, and kernel objects to user space. Think of it as a bridge between the kernel and user
programs, allowing you to access information and control devices through files in the /sys directory.
● struct kobject: In the kernel, a kobject is represented by the structure struct kobject. It includes
important information about the object, such as:
● name: The name of the directory to be created (this will appear under /sys/).
● parent: The parent directory for the new directory (could be kernel_kobj for /sys/kernel/).
Example:
● After the task is done, you can free the kobject memory using kobject_put(kobj_ref).
struct kobj_attribute {
struct attribute attr; // Basic file information
ssize_t (*show)(struct kobject *kobj, struct kobj_attribute
*attr, char *buf);
ssize_t (*store)(struct kobject *kobj, struct kobj_attribute
*attr, const char *buf, size_t count);
};
● store function: This function is used to store data when writing to the sysfs file.
if (sysfs_create_file(kobj_ref, &etx_attr.attr)) {
printk(KERN_INFO "Cannot create sysfs file...\n");
goto r_sysfs;
}
sysfs_remove_file(kobj_ref, &etx_attr.attr);
8. Complete Example
Let’s put all of it together in an example.
Driver Code:
// Define the sysfs attribute
struct kobj_attribute etx_attr = __ATTR(etx_value, 0660, sysfs_show, sysfs_store);
// The show function
static ssize_t sysfs_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) {
return sprintf(buf, "%d", etx_value); // Display the value
}
// The store function
static ssize_t sysfs_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf,
size_t count) {
sscanf(buf, "%d", &etx_value); // Store the new value
return count;
}
// The init function to create sysfs file
static int __init sysfs_driver_init(void) {
struct kobject *kobj_ref;
// Create a directory under /sys/kernel/
kobj_ref = kobject_create_and_add("etx_sysfs", kernel_kobj);
// Create sysfs file for etx_value
if (sysfs_create_file(kobj_ref, &etx_attr.attr)) {
printk(KERN_INFO "Cannot create sysfs file...\n");
return -ENOMEM;
}
return 0;
}
// The exit function to remove sysfs file
static void __exit sysfs_driver_exit(void) {
struct kobject *kobj_ref;
// Remove the sysfs file
sysfs_remove_file(kernel_kobj, &etx_attr.attr);
kobject_put(kobj_ref); // Free memory
}
module_init(sysfs_driver_init);
module_exit(sysfs_driver_exit);
Summary:
● sysfs provides a way for user space to interact with the kernel and devices via files.
● Kernel objects (kobjects) represent various kernel-managed entities and are the basis of sysfs.
● Attributes in sysfs are represented as files and allow you to read/write kernel data from user
space.
● Functions like kobject_create_and_add and sysfs_create_file are used to create directories and
files in sysfs, while show and store functions handle read/write operations.
INTERRUPTS IN A LINUX KERNEL
Definition:
An interrupt is a signal sent to the processor that temporarily halts the current execution of code, allowing the
processor to handle a specific event. Once the event is handled, the processor resumes its previous activity.
Purpose:
Interrupts are used to handle events that require immediate attention, such as hardware signals (keyboard
presses, mouse clicks) or software requests (system calls). They help in managing asynchronous events
efficiently.
3. Interrupt Mechanism
● When an interrupt occurs, the processor stops its current execution.
● It saves the state of the current execution so that it can resume later.
● It then transfers control to a specific function known as an Interrupt Service Routine (ISR) or Interrupt
Handler, which handles the event.
● After the ISR completes its task, control is returned to the interrupted process.
5. Polling vs Interrupts
Polling:
The CPU continuously checks each device to see if it needs service.
It is resource-intensive as it consumes CPU cycles even when no event occurs.
Example: A salesperson knocking on every door to check if someone needs something.
Interrupts:
The CPU responds only when a device signals that it needs attention, freeing up CPU time for other tasks.
Example: A shopkeeper waiting for customers to approach when they need something.
Key Takeaways:
● Interrupts improve efficiency by allowing the CPU to handle events as they occur rather than
continuously checking for them.
● The use of ISRs ensures that each interrupt is handled quickly and appropriately.
● Polling can be wasteful, whereas interrupts provide a more efficient way to handle asynchronous events.
Interrupts and Exceptions
Interrupts:
● Asynchronous: They occur independently of the processor's current instruction cycle.
● Generated by hardware: Devices like keyboards, mice, or network cards send interrupt signals to
the processor to indicate they need attention.
Exceptions:
● Synchronous: They occur in sync with the processor’s instruction cycle, meaning they happen as
a direct result of executing an instruction.
● Generated by the processor: They are triggered by certain events during instruction execution,
such as errors or special conditions.
Comparison Between Interrupts and Exceptions
Timing:
Interrupts happen asynchronously, meaning they can occur at any time, regardless of what the CPU is
doing.
Exceptions are synchronous, meaning they happen precisely when a specific instruction is being
executed.
Source:
Interrupts come from external hardware.
Exceptions are caused by the processor itself while executing instructions.
Examples of Exceptions
Abnormal Conditions:
Example: A page fault, which occurs when a program accesses a portion of memory that is not
currently mapped to physical memory. The kernel needs to handle this by loading the required page
into memory.
Handling Mechanism
Both interrupts and exceptions are handled similarly in the kernel:
● When an interrupt or exception occurs, the processor stops its current task and jumps to a
specific routine in the kernel to handle it.
● This routine could be an Interrupt Service Routine (ISR) for interrupts or an exception handler
for exceptions.
System Calls
● System Calls are a specific type of exception.
● On the x86 architecture, system calls are implemented using software interrupts.
● A software interrupt is issued when a program requests a service from the kernel, such as file
operations or process control.
● The software interrupt triggers a trap into the kernel, leading to the execution of a system call
handler.
Further Classification
Interrupts and exceptions can be further classified based on their types:
Function in Linux
● They follow a specific prototype, which ensures the kernel can pass necessary information to the
handler in a standard way.
● What sets interrupt handlers apart from other kernel functions is:
● They are invoked in response to interrupts.
● They run in a special context called interrupt context (or atomic context), where blocking
operations are not allowed.
Efficiency Considerations
● For hardware: The operating system must service interrupts promptly to ensure hardware can
continue its operations without bottlenecks.
● For the system: The interrupt handler should execute as quickly as possible to minimize
disruption to the interrupted code and maintain system performance.
Process Context and Interrupt Context
context refers to the state or environment in which a program, process, or part of the operating system
(such as the kernel) operates.
Process Context
Definition: Kernel code that services system calls issued by user applications runs in the process
context.
Preemptibility: Kernel code in this context is preemptible, meaning it can be interrupted to run other
code, including interrupt handlers or higher-priority tasks.
Capabilities:
Interrupt Context
Definition: Interrupt handlers execute in the interrupt context, triggered asynchronously by hardware
events.
Non-preemptible: Code in interrupt context is not preemptible and must run to completion before
the CPU can handle other tasks.
Restrictions:
● Interrupt handlers must run quickly to prevent blocking other interrupts and to maintain system
performance.
● Blocking other interrupts: While a high-priority ISR runs, other interrupts are blocked.
● Missed interrupts: If the ISR for a particular type takes too long time, subsequent interrupts
of the type might be missed.
Top Halves and Bottom Halves
We use top halves and bottom halves to ensure quick response to interrupts by handling urgent tasks
immediately in the top half, while deferring non-urgent processing to the bottom half to maintain
system efficiency and responsiveness.
Top Halves
Definition: The top half is the part of the interrupt handler that runs immediately when an
interrupt is received.
Purpose: It handles time-critical tasks that must be done right away, like acknowledging the
interrupt or resetting the hardware.
Example: Imagine a network card that receives packets. When it gets a packet, it triggers an
interrupt. The top half would quickly acknowledge this interrupt and prepare the card for more
packets.
Reason: The top half must run quickly because it responds to hardware signals, and delaying could
cause the hardware (like the network card) to miss new data or events.
Bottom Halves
Definition: The bottom half processes less urgent tasks that can be deferred to a later time when
the system is less busy.
Purpose: It allows the top half to handle new interrupts without delay, focusing only on
non-time-critical tasks.
Example: Continuing with the network card example, after the top half acknowledges the packet, the
bottom half processes the packet data (like checking its destination or handling errors).
Reason: By splitting the work into top and bottom halves, the system remains responsive to new
interrupts and can handle multiple tasks efficiently.
Why Use It: It can perform more extensive processing that the top half cannot handle.
Example:A file system might use a workqueue to handle disk I/O operations initiated by an interrupt.
Threaded IRQs:
Description: Allows interrupt handlers to run as kernel threads, making them preemptible and
capable of blocking.
Why Use It: Provides more flexibility, as threaded IRQs can be prioritized or scheduled like
regular threads.
Example: A device driver that needs to perform heavy computation or access user-space data may
use a threaded IRQ.
Softirq:
Description: A mechanism for handling high-priority tasks that don't need to run immediately
but should run soon.
Why Use It: Balances between quick response and deferred execution, handling tasks like
networking or block device processing.
Example: The networking stack uses softirqs to process incoming packets after they are copied into
main memory by the top half.
Tasklets:
Description: Similar to softirqs but designed for simpler tasks that don’t require complex
processing.
Why Use It: Provides a lightweight mechanism for handling quick deferred tasks.
Example: A mouse driver might use a tasklet to update the cursor position on the screen after
processing input data from an interrupt.
Key Takeaways:
● Top Half: Handles urgent, minimal tasks that cannot wait, ensuring the system remains
responsive.
● Bottom Half: Defers non-urgent tasks to a more convenient time, preventing the system from
missing new interrupts.
● Mechanisms: Different mechanisms (Workqueue, Threaded IRQs, Softirq, Tasklets) provide
flexibility in handling deferred tasks based on their complexity and urgency.
Functions Related to Interrupt Handling
1. request_irq
Description: This function is used to register an interrupt request (IRQ) line and associate it
with an interrupt handler function.
Syntax:
int request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags, const
char *name, void *dev_id);
Parameters:
● irq: IRQ number that needs to be allocated.
● handler: The interrupt handler function (irq_handler_t) to be invoked whenever the
interrupt occurs. It should return IRQ_HANDLED when it successfully processes the
interrupt, and IRQ_NONE if it fails.
● flags: Flags that can modify the behavior of the interrupt. Important flags include:
● IRQF_DISABLED: Disables all interrupts when the handler runs.
● IRQF_SAMPLE_RANDOM: Uses the interrupt as a source of entropy for random number
generation.
● IRQF_SHARED: Allows multiple interrupt handlers to share the same IRQ line.
● IRQF_TIMER: Specifies that the handler deals with system timer interrupts.
● name: The device name that uses this IRQ, visible in /proc/interrupts.
● dev_id: A unique identifier (device structure pointer) for the interrupt handler. Used for shared interrupt
lines to differentiate between different handlers.
● Return value:
● Returns 0 on success.
● Returns non-zero if there’s an error.
Important note: request_irq() cannot be called from an interrupt context (i.e., within another
interrupt handler), as it may block, causing system issues.
2. free_irq
Description: This function releases an interrupt handler that was previously registered with
request_irq().The free_irq function removes the interrupt handler from the system and disables
the IRQ line.
Syntax:
void free_irq(unsigned int irq, void *dev_id);
Parameters:
● irq: The IRQ number to release.
● dev_id: The device identifier (same as used in request_irq()).
Behavior:
● If the interrupt line is not shared, the function removes the handler and disables the IRQ line.
● If the interrupt line is shared, the handler identified by dev_id is removed, but the IRQ line is disabled only when the last
handler is removed.
● Important note: free_irq() must bIf the interrupt line is not shared, the function removes the handler and disables the IRQ line.
3. enable_irq
Description: The enable_irq function ensures that the interrupt is enabled and can be serviced by the interrupt handler.
Syntax:
void enable_irq(unsigned int irq);
4. disable_irq
Description: The disable_irq function ensures that no more interrupts are handled for the specified IRQ, which helps in
cleaning up and avoiding any issues when the handler is no longer needed.
Syntax:
void disable_irq(unsigned int irq);
5. disable_irq_nosync
Description: Disables an IRQ, but it ensures that the interrupt handler (if already running) is allowed to complete before
the IRQ line is fully disabled.
Syntax:
void disable_irq_nosync(unsigned int irq);
6. in_irq
Description: Returns true if the current execution is inside an interrupt handler.
Syntax:
bool in_irq(void);
Interrupt Flags:
1.IRQF_DISABLED
Description: When set, all interrupts are disabled while the interrupt handler is executing.
Note: This flag is generally avoided for most interrupt handlers because disabling all interrupts can negatively
impact the system’s performance by increasing interrupt latency.
2. IRQF_SAMPLE_RANDOM
Description: When set, the timing of the interrupts generated by the device is added to the kernel entropy pool
for random number [Link] flag should be used with devices that generate interrupts at
non-deterministic times (like hardware random number generators or sensors).
3. IRQF_TIMER
Description: This flag specifies that the interrupt handler is responsible for handling interrupts from the
system timer.
4. IRQF_SHARED
Description: Allows multiple interrupt handlers to share the same IRQ line.
Interrupt Handler Execution:
When an interrupt occurs, the interrupt handler is responsible for handling the interrupt and making sure the
system responds quickly. The handler should not perform time-consuming tasks, as it must return as soon as
possible to avoid blocking other interrupts.
Top Half:
What is the Top Half?
● The top half is the actual interrupt handler function that is executed immediately when an interrupt
occurs.
● It handles the time-critical tasks that must be done quickly.
● The primary responsibility of the top half is to acknowledge the interrupt and prepare the hardware
for the next interrupt (e.g., by clearing flags, resetting devices, etc.).
Bottom Half:
What is the Bottom Half?
● The bottom half is used to defer time-consuming tasks that don't need to be executed
immediately after the interrupt.
● It allows the system to continue handling interrupts efficiently without blocking for too long.
The bottom half runs later in a more convenient time, usually after the interrupt handler has
completed its critical work.
● The bottom half can be executed using mechanisms like softirqs, tasklets, or workqueues.
● These mechanisms defer the work and allow the kernel to process other interrupts in the
meantime.
Workqueue in Linux Kernel
A workqueue is a mechanism in the Linux kernel used to defer work (i.e., tasks) that need to be
done after handling an interrupt or [Link] of processing everything right away in the
interrupt handler (which can block other interrupts), we can defer the work to a kernel thread
that will run in process context.
Deferred work in the context of the Linux kernel refers to tasks or operations that are delayed or
postponed and not executed immediately when an interrupt or event occurs.
Process Context: This means the deferred work runs like a regular task.
When we say that workqueue tasks run in process context, it means that the deferred work
scheduled through workqueues is executed in a kernel thread that behaves similarly to a user-space
process.
Process Context: The work that is deferred to a workqueue runs in process context. This is important
because it allows the work to perform longer operations like waiting (sleeping) or allocating
resources, which cannot be done in an interrupt context.
There are two main methods to use work queues in the kernel:
Global Workqueue (Static/Dynamic)
Global Workqueues are pre-existing work queues that the kernel already [Link] can simply
submit work to these queues without having to create a custom one.
Dynamic Workqueue: This is another global workqueue, but it allows the kernel to create or modify
work queues dynamically as needed.
Advantages of Workqueues:
#include <linux/workqueue.h>
#include <linux/module.h>
#include <linux/init.h>
void my_work_function(struct work_struct *work) {
printk(KERN_INFO "Workqueue function executed\n");
}
// Declare and initialize a workqueue item named my_work
DECLARE_WORK(my_work, my_work_function);
static int __init my_module_init(void) {
printk(KERN_INFO "Module loaded\n");
// Schedule the work to be executed
schedule_work(&my_work): //schedule the work it adds my_work to kernel
return 0;
}
static void __exit my_module_exit(void) {
printk(KERN_INFO "Module unloaded\n");
}
module_init(my_module_init);
module_exit(my_module_exit);
MODULE_LICENSE("GPL");
2. Scheduling Work to the Workqueue
Once you have initialized the work item we can schedule it using different functions depending on
our needs.
2.1 schedule_work
Schedules a work item to the global workqueue for immediate execution.
Syntax:
int schedule_work(struct work_struct *work);
Example:
schedule_work(&my_workqueue);
● This adds my_workqueue to the global workqueue for execution as soon as possible.
2.2 schedule_delayed_work
Schedules a work item to be executed after a specified delay.
Syntax:
int schedule_delayed_work(struct delayed_work *dwork, unsigned long delay);
DECLARE_DELAYED_WORK(my_delayed_workqueue, workqueue_fn);
schedule_delayed_work(&my_delayed_workqueue, 10); // Delay in jiffies
2.3 schedule_work_on
Schedules a work item to run on a specific CPU.
Syntax:
int schedule_work_on(int cpu, struct work_struct *work);
Example:
schedule_work_on(1, &my_workqueue); // Schedule on CPU 1
2.4 schedule_delayed_work_on
Schedules delayed work on a specific CPU.
Syntax:
int schedule_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay);
Example:
schedule_delayed_work_on(1, &my_delayed_workqueue, 10); // Delay on CPU 1
3 Deleting Work from Workqueue
We can remove or flush work from the workqueue using specific functions.
3.1 flush_work
Waits for a specific work item to complete.
Syntax:
int flush_work(struct work_struct *work);
Example:
flush_work(&my_workqueue);
3.2 flush_scheduled_work
Waits for all scheduled work items in the global workqueue to finish.
Syntax:
void flush_scheduled_work(void);
Example:
flush_scheduled_work();
4.2 cancel_delayed_work_sync
Cancels delayed work if it hasn’t started, or waits for its completion.
Syntax:
int cancel_delayed_work_sync(struct delayed_work *dwork);
Example:
cancel_delayed_work_sync(&my_delayed_workqueue);
5. Checking the Workqueue
We can check whether work is pending in the queue using the following functions.
5.1 work_pending
Checks if a work item is pending.
Syntax:
bool work_pending(struct work_struct *work);
Example:
if (work_pending(&my_workqueue)) {
printk(KERN_INFO "Work is pending\n");
}
6.2. delayed_work_pending
Checks if delayed work is pending.
Syntax:
bool delayed_work_pending(struct delayed_work *work);
Example:
if (delayed_work_pending(&my_delayed_workqueue)) {
printk(KERN_INFO "Delayed work is pending\n");
}
Static Code:
In this example, we define a simple kernel module that uses a workqueue to defer the execution of a task. The
workqueue task is defined by a function, workqueue_fn, which prints a message when executed. This function is
associated with a work item, my_work, which is declared using the struct work_struct structure.
Upon loading the module, the my_module_init function is called. Inside this function, we initialize the work item with
INIT_WORK and associate it with our workqueue function, workqueue_fn. We then schedule this work to be executed
by the global workqueue using schedule_work. Immediately after scheduling, we check if the work is pending using the
work_pending function, which returns a status indicating whether the work is queued but not yet executed.
The my_module_exit function handles the module cleanup when it is unloaded. Before the module exits, we attempt to
cancel the work using cancel_work_sync. This function cancels the work if it hasn't been executed yet and waits for its
completion if it is already running. We then check again if the work is still pending. Finally, a message is printed to
indicate that the module is being unloaded.
This flow illustrates how to initialize, schedule, cancel, and check the status of a work item using workqueues in the
Linux kernel. It provides a clear lifecycle of a workqueue task, from creation to cleanup, within a kernel module
context.
#include <linux/module.h>
#include <linux/init.h>
#include <linux/workqueue.h>
#include <linux/delay.h>
static struct work_struct my_work; // Declare a work_struct
// Workqueue function to be executed
void workqueue_fn(struct work_struct *work) {
printk(KERN_INFO "Workqueue function executed\n");
}
// Initialize the module
static int __init my_module_init(void) {
printk(KERN_INFO "Module loaded\n");
// Initialize the workqueue item with the workqueue function
INIT_WORK(&my_work, workqueue_fn);
// Schedule the work to be executed by the global workqueue
schedule_work(&my_work);
// Check if the work is pending (should return false since we just scheduled it)
if (work_pending(&my_work)) {
printk(KERN_INFO "Work is pending\n");
} else {
printk(KERN_INFO "Work is not pending\n");
}
return 0;
}
// Exit function to cleanup the module
static void __exit my_module_exit(void) {
printk(KERN_INFO "Module unloading\n");
// Cancel the work if it hasn't been executed yet
if (cancel_work_sync(&my_work)) {
printk(KERN_INFO "Work was pending and now cancelled\n");
} else {
printk(KERN_INFO "Work was already completed or not pending\n");
}
// Check again if the work is still pending (should return false now)
if (work_pending(&my_work)) {
printk(KERN_INFO "Work is still pending\n");
} else {
printk(KERN_INFO "Work is not pending\n");
}
printk(KERN_INFO "Module unloaded\n");
}
module_init(my_module_init);
module_exit(my_module_exit);
MODULE_LICENSE("GPL");
Workqueue in Linux - Dynamic Method
In the dynamic method, workqueues are initialized and managed dynamically at runtime. This
provides flexibility in creating and scheduling tasks in workqueues. The dynamic method primarily
uses the INIT_WORK macro and other related functions.
INIT_WORK
The INIT_WORK macro is used to initialize a work item dynamically. This macro sets up a
work_struct with a specific function that will be executed when the work item is processed.
Syntax:
INIT_WORK(struct work_struct *work, void (*work_fn)(struct
work_struct *));
● work: The work item to be initialized, typically a work_struct structure.
● work_fn: The function to be executed when the work item is scheduled.
Example:
struct work_struct my_work;
void work_fn(struct work_struct *work) {
printk(KERN_INFO "Work function executed\n");
}
INIT_WORK(&my_work, work_fn);
2. Scheduling Work
Once initialized, work items can be scheduled using various functions, depending on when and where
you want the work to be executed.
2.1 schedule_work
This function schedules a work item to be executed in the kernel-global workqueue.
Syntax:
int schedule_work(struct work_struct *work);
● work: The work item to be scheduled.
Example:
schedule_work(&my_work);
2.2 schedule_delayed_work
This function schedules a work item to be executed after a specified delay.
Syntax:
int schedule_delayed_work(struct delayed_work *dwork, unsigned
long delay);
● dwork: The delayed work item to be scheduled.
● delay: The number of jiffies to wait before executing the work.
Example:
struct delayed_work my_delayed_work;
INIT_DELAYED_WORK(&my_delayed_work, work_fn);
schedule_delayed_work(&my_delayed_work, 100); // Delays for 100
jiffies
2.2 schedule_work_on
This function schedules a work item to be executed on a specific CPU.
Syntax:
int schedule_work_on(int cpu, struct work_struct *work);
● cpu: The CPU on which to run the work.
● work: The work item to be scheduled.
Example:
schedule_work_on(1, &my_work); // Schedule on CPU 1
2.3 schedule_delayed_work_on
Similar to schedule_delayed_work, but allows specifying the CPU on which the work should be
executed after a delay.
Syntax:
int schedule_delayed_work_on(int cpu, struct delayed_work *dwork,
unsigned long delay);
Example:
schedule_delayed_work_on(1, &my_delayed_work, 100); // Delays for
100 jiffies on CPU 1
3 Deleting and Canceling Work
3.1 flush_work
This function blocks until the specified work item has finished executing.
Syntax:
int flush_work(struct work_struct *work);
Example:
flush_work(&my_work);
3.2 flush_scheduled_work
Flushes all work items in the global workqueue.
Syntax:
void flush_scheduled_work(void);
3.3 cancel_work_sync
This function cancels a work item if it is not currently executing, or waits for it to finish if it is
already running.
Syntax:
int cancel_work_sync(struct work_struct *work);
Example:
cancel_work_sync(&my_work);
3.4 cancel_delayed_work_sync
Cancels a delayed work item in a similar fashion.
Syntax:
int cancel_delayed_work_sync(struct delayed_work *dwork);
Example:
cancel_delayed_work_sync(&my_delayed_work);
4. Checking Work Status
4.1 work_pending
Checks if a work item is pending.
Syntax:
int work_pending(const struct work_struct *work);
Example:
if (work_pending(&my_work)) {
printk(KERN_INFO "Work is pending\n");
}
4.2 delayed_work_pending
Checks if a delayed work item is pending.
Syntax:
if (delayed_work_pending(&my_delayed_work)) {
printk(KERN_INFO "Delayed work is pending\n");
}
Linked List in Linux Kernel
Introduction to Linked List
A linked list is a fundamental data structure comprising a sequence of nodes, where each node
consists of two main components: the data field (which stores the actual data) and the reference field
(a pointer pointing to the next node in the sequence).
Each node in a linked list is termed an element and the list is tracked using a head pointer that always
points to the first element. the elements in a linked list do not need to occupy contiguous memory
locations, and each node connects to the next through pointers, forming a chain-like structure
● Building Other Structures: Linked lists are like building blocks for more complicated data
structures such as stacks and queues.
● Faster for Some Tasks: In some cases, linked lists can be faster than arrays because you can
directly add or remove items without shifting others, which saves time.
● Extra Memory Usage: Each node requires additional memory for pointers, leading to potential
memory overhead.
● Sequential Access: Elements must be accessed sequentially, unlike arrays that allow random access.
● Difficult Reverse Traversal: Traversing backward through the list is complicated, especially in
singly linked lists.
Applications of Linked Lists
● Used in the implementation of stacks, queues, and graph representations.
● No need to define the size in advance, making them more flexible compared to arrays.
Types of Linked Lists
While there are multiple types, the primary categories are:
● Singly Linked List: Each node points to the next node in the sequence.
● Doubly Linked List: Each node has two pointers, one pointing to the next node and another
pointing to the previous node, allowing bidirectional traversal.
● Circular Linked List: The last node points back to the first node, forming a circle.
INIT_LIST_HEAD(&new_node.list);
new_node.data = 10;
4. Adding Nodes to the Linked List
4.1 Add After the Head: Use list_add() to add a node after the head, useful for stack-like behavior
(LIFO).
Syntax:
list_add(&new_node.list, &etx_linked_list);
4.2 Add Before the Head: Use list_add_tail() to add a node before the head, useful for queue-like
behavior (FIFO).
Syntax:
list_add_tail(&new_node.list, &etx_linked_list);
6. Replacing Nodes
6.1 Replace a Node: Use list_replace() to swap an old node with a new one.
list_replace(&old_node.list, &new_node.list);
6.2 Replace and Reinitialize: Use list_replace_init() to replace a node and reinitialize the old one.
list_replace_init(&old_node.list, &new_node.list);
10.2 Safe Traversal: Use list_for_each_entry_safe() for safe traversal, especially when removing
nodes.
struct my_list *pos, *n;
list_for_each_entry_safe(pos, n, &etx_linked_list, list) {
// Process each node
}
Kernel Thread – Linux Device Driver
Process vs. Thread in Linux Kernel
Process
● Definition: An executing instance of a program.
● Terminology: Some operating systems use the term ‘task’ to refer to a program that is
being executed.
Characteristics:
● Heavyweight Process: Consumes more resources.
● Context Switch: Switching between processes is time-consuming due to the need to save
and load context (CPU registers, program counter, etc.).
Threads
● Definition: An independent flow of control within the same address space as other threads
in the same process.
Characteristics:
Advantages:
● Easier communication between threads due to shared address space.
● Faster creation and context switching compared to processes.
Disadvantages:
● Requires synchronization to handle concurrency issues.
● User-Level Threads:
○ Managed by the user-level thread library.
○ The kernel is unaware of these threads.
○ Library Functions: Create, destroy, schedule threads, and handle context
switches in user space.
● Kernel-Level Threads:
○ Managed by the OS kernel.
○ Thread operations are implemented in the kernel code.
○ No thread management code in the user application.
Thread Management
Thread Management: Managing threads involves creating, destroying, scheduling, and handling
their execution state (program counter, CPU registers).
Execution State: Consists of the program counter (next instruction to execute) and CPU registers
(hold execution arguments).
Returns: A pointer to the task_struct of the new thread or an error pointer on failure.
Parameters:
● k: The task_struct of the thread to bind.
● cpu: The CPU to bind the thread to.
Example:
kthread_bind(my_thread, 0);
Parallel Execution:
Non-preemptive Scheduling:
● Tasklets are executed one after another in the order they are scheduled.
● There are two priorities for scheduling tasklets: normal and high.
Atomic Nature:
● Tasklets run in an atomic context, meaning they cannot be interrupted by other tasklets.
Because of this, tasklets can't use functions that sleep or block, like sleep(), or use
synchronization primitives like mutexes or semaphores.
● We can use spinlocks if you need to protect data that might be accessed by other parts of the
kernel.
Points to Remember
1. Atomic Context: No sleeping or waiting. Use spinlocks if necessary.
2. Single CPU: A one tasklet runs on the CPU that scheduled it, not on multiple CPUs.
3. Concurrency Control: While different tasklets can run on different CPUs, a single tasklet
won’t run concurrently on multiple CPUs.
4. Priorities: Tasklets can be scheduled with normal or high priority.
Creation of Tasklets
Tasklets can be created in two ways:
● 1. Static Method: Static tasklets are created at compile-time using predefined macros, such as
DECLARE_TASKLET or DECLARE_TASKLET_DISABLED
● 2. Dynamic Method Definition: Dynamic tasklets are created at runtime using functions like
tasklet_init.
Structure of tasklet:
The tasklet_struct is the core data structure used to define a tasklet.
struct tasklet_struct {
struct tasklet_struct *next; // Next tasklet in line for scheduling
unsigned long state; // Tasklet state: TASKLET_STATE_SCHED or
TASKLET_STATE_RUN
atomic_t count; // Nonzero if disabled, 0 if enabled
void (*func)(unsigned long); // Pointer to the function to execute
unsigned long data; // Data to pass to the function
};
Parameters:
● next: Points to the next tasklet in the queue.
● state: Indicates the tasklet's state, either scheduled or running.
● count: Holds the value indicating if the tasklet is enabled (0) or disabled (nonzero).
● func: The main function that the tasklet will execute.
● data: Data passed to the function func.
1 Creating Tasklets
1.1 DECLARE_TASKLET
This macro is used to create a tasklet and initialize its parameters. The tasklet is in the enabled state
by default.
Function Prototype:
DECLARE_TASKLET(name, func, data);
Parameters:
● name: The name of the tasklet structure.
● func: Pointer to the function that will be executed.
● data: Data to pass to the function.
Example
DECLARE_TASKLET(tasklet, tasklet_fn, 1);
This creates a tasklet structure with the name tasklet and assigns the parameters. The structure will
look like:
struct tasklet_struct tasklet = { NULL, 0, 0, tasklet_fn, 1 };
1.2 DECLARE_TASKLET_DISABLED
This macro creates a tasklet in a disabled state. It must be enabled using tasklet_enable before it can
run.
2.1 tasklet_enable
● Enabling a tasklet means making it eligible to be scheduled and executed. When a tasklet is
enabled, it can be placed in the queue and run by the CPU.
● This function enables a previously disabled tasklet.
2.2 tasklet_disable
● Disabling a tasklet means preventing it from being scheduled and executed. A disabled tasklet
won't run even if it is placed in the queue until it is explicitly enabled again.
● This function disables a tasklet and waits for its current operation to complete.
This function disables a tasklet immediately without waiting for its current operation to complete.
Note: If a tasklet is disabled, it can still be added to the queue but will not run until enabled.
The count field tracks the number of times a tasklet is disabled and must be enabled the same number
of times.
3 Scheduling Tasklets
When a tasklet is scheduled, it is placed in one of two queues based on its priority. Each CPU has its
own queue.
3.1 tasklet_schedule
This function schedules a tasklet with normal priority.
3.2 tasklet_hi_schedule
This function schedules a tasklet with high priority.
3.3 tasklet_hi_schedule_first
This function schedules a tasklet with high priority without affecting other tasklets.
Example
/* Kill the Tasklet */
tasklet_kill(&tasklet);
4.2 tasklet_kill_immediate
This function is used to delete a tasklet immediately when a CPU is in a dead state.
void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int
cpu);
When tasklet_init is called, the function and data are assigned to the tasklet structure, and the tasklet's
state is set to scheduled (TASKLET_STATE_SCHED), and its count is initialized to 0, indicating it is
enabled.
Mutex
A mutex ensures mutual exclusion, allowing only one thread to access a resource at a time.
The thread that locks a mutex must also unlock it.
Initializing Mutex
We can initialize Mutex in two ways:
Static Method
● A static mutex is a mutex that is declared and initialized at compile time. It is part of the static or
global memory of the kernel module.
● It exists for the entire lifetime of the module, meaning it is always present and does not need to be
allocated or deallocated explicitly.
Dynamic Method
● A dynamic mutex is a mutex that is allocated and initialized at runtime. It is typically a part of a
dynamically allocated structure or used in cases where the mutex's lifetime is tied to the resource
it protects.
● Dynamic mutexes are useful when you have multiple instances of a resource that require
synchronization, or when the resource's lifetime is limited and determined at runtime.
1. Static Method
Static Method:
DEFINE_MUTEX(name): Used for global mutexes.
Example:
DEFINE_MUTEX(my_mutex);
2. Dynamic Method:
Mutex Locking
Prototype:1. mutex_lock
Purpose: This function locks the mutex for the current thread.
Behavior:
● If the mutex is already locked by another thread, the calling thread will block (sleep) until
the mutex becomes available.
● Once the mutex is available, it gets locked by the current thread.
● Usage: This is used in situations where the thread needs exclusive access to a resource and
is willing to wait until it gets the lock.
Prototype:
Example:
Example:
if (mutex_lock_interruptible(&my_mutex)) {
// Handle signal interruption
} else {
// Critical section code here
mutex_unlock(&my_mutex);
}
3. mutex_trylock
Purpose: Attempts to lock the mutex without waiting.
Behavior:
● If the mutex is already locked, this function returns immediately with a failure.
● If the mutex is not locked, it locks the mutex and returns success.
● This is useful for non-blocking scenarios where you want to try to acquire the lock but don't
want to wait if it's not available.
Return Value:
● 1 if the mutex was successfully locked.
● 0 if the mutex was already locked by another thread.
Prototype:
int mutex_trylock(struct mutex *lock);
Example:
if (mutex_trylock(&my_mutex)) {
// Critical section code here
mutex_unlock(&my_mutex);
} else { // Mutex was already locked, handle this case
mutex_lock(&my_mutex);
// Critical section code here
mutex_unlock(&my_mutex);
if (mutex_is_locked(&my_mutex)) {
printk(KERN_INFO "Mutex is locked");
} else {
printk(KERN_INFO "Mutex is not locked");
}
What is Spinlock?
● Spinlock is a type of lock used to protect shared data in a multi-threaded environment., if a
thread cannot acquire the spinlock, it will "spin" (keep trying in a loop) until the lock is
available.
● Two States: Locked or Unlocked.
● Used where the waiting time is expected to be short, avoiding the overhead of sleep and
wake-up mechanisms.
Initialization Methods
[Link] Method
[Link] Method
Use spin_lock_init(spinlock_t *lock); to initialize.
Example:
spinlock_t etx_spinlock;
spin_lock_init(&etx_spinlock); // Dynamically initializing etx_spinlock.
Usage Approaches
1. Locking in User Context (Kernel Threads)
Example:
spin_lock_bh(&etx_spinlock);
// Critical section
spin_unlock_bh(&etx_spinlock);
5. Alternative to Approach 4
Lock: spin_lock_irqsave(spinlock_t *lock, unsigned long flags); (Saves
interrupt state).
Unlock: spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags);
(Restores interrupt state).
Example:
unsigned long flags;
spin_lock_irqsave(&etx_spinlock, flags);
// Critical section
spin_unlock_irqrestore(&etx_spinlock, flags);
Reader Threads:
● If one reader thread is already in the critical section, other reader threads can enter .
● However, a writer thread must wait until all existing reader threads have exited the critical
section before it can acquire the write lock and enter.
Reader Priority: The described behavior generally prioritizes reader threads. Once a reader
thread has entered the critical section, subsequent reader threads can enter without being blocked
by the writer thread.
Writer Thread:
● If a writer thread is in the critical section, neither reader nor writer threads can enter. The
writer has exclusive access.
Writer Priority (Seqlock): In Linux, the seqlock mechanism is designed to prioritize writer
threads over reader threads.
Initialization Methods
1. Static Method:
2. Dynamic Method:
● Declares an rwlock_t variable: rwlock_t etx_rwlock;
● Initializes the variable using rwlock_init(&etx_rwlock); at runtime.
● Offers more flexibility as initialization can be delayed or conditional.
Choosing a Method:
● Use the static method for simplicity if the spinlock needs to exist throughout the module's
lifetime.
● Use the dynamic method if you need more control over when the spinlock is initialized (e.g.,
during module initialization or based on specific conditions).
Key Points:
● rwlock_t: This is the data structure used to represent a Read-Write Spinlock in the Linux
kernel.
● DEFINE_RWLOCK(): This macro simplifies the static initialization process.
● rwlock_init(): This function is used for dynamic initialization of a Read-Write Spinlock.
Example:
#include <linux/rwlock.h>
// Static initialization
DEFINE_RWLOCK(my_rwlock);
// Dynamic initialization
rwlock_t my_dynamic_rwlock;
static int __init my_module_init(void)
{
rwlock_init(&my_dynamic_rwlock);
// ... rest of your module initialization
}
static void __exit my_module_exit(void)
{
// ... module cleanup
}
module_init(my_module_init);
module_exit(my_module_exit);
Mechanism:
● read_lock(): Acquires a read lock. If another thread already holds the write lock, it will spin
(busy-wait) until the write lock is released.
● read_unlock(): Releases the read lock.
● write_lock(): Acquires a write lock. If any thread (reader or writer) holds the lock, it will spin.
● write_unlock(): Releases the write lock.
Usage: Suitable when you need to protect shared data accessed by multiple kernel threads or processes
within the user context. No special interrupt handling is required.
Example:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/rwlock.h>
#include <linux/kthread.h>
#include <linux/delay.h>
static rwlock_t my_rwlock;
static int etx_global_variable = 0;
static int thread_function1(void *pv)
{
while (!kthread_should_stop()) {
write_lock(&my_rwlock);
etx_global_variable++;
write_unlock(&my_rwlock);
msleep(1000);
}
return 0;
}
static int thread_function2(void *pv)
{
while (!kthread_should_stop()) {
read_lock(&my_rwlock);
printk(KERN_INFO "Read value: %lu\n", etx_global_variable);
read_unlock(&my_rwlock);
msleep(1000);
}
return 0;
}
// ... (rest of your module code)
Usage: Crucial when you want to prevent soft interrupts from interfering with the critical section,
ensuring data consistency.
Example:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/rwlock.h>
#include <linux/interrupt.h>
static rwlock_t my_rwlock;
static int etx_global_variable = 0;
static DECLARE_TASKLET(my_tasklet, tasklet_fn, 0);
static int thread_function(void *pv)
{
while (!kthread_should_stop()) {
write_lock_bh(&my_rwlock);
etx_global_variable++;
write_unlock_bh(&my_rwlock);
msleep(1000);
}
return 0;
}
static void tasklet_fn(unsigned long arg)
{
read_lock_bh(&my_rwlock);
printk(KERN_INFO "Tasklet Function: %lu\n", etx_global_variable);
read_unlock_bh(&my_rwlock);
}
// ... (rest of your module code)
4. Locking Between Hard IRQ and Bottom Halves
Scenario: When we need to synchronize access between hardware interrupt service routines (ISRs)
and bottom halves.
Mechanism:
● read_lock_irq(): Disables all interrupts on the CPU before acquiring the read lock. This is the
most stringent form of locking.
● read_unlock_irq(): Releases the read lock and re-enables all interrupts.
● write_lock_irq(): Same as read_lock_irq(), but acquires a write lock.
● write_unlock_irq(): Same as read_unlock_irq().
Usage: Essential for scenarios where you need to prevent any interrupt from interrupting the critical
section, such as when dealing with hardware interrupts.
Example:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/rwlock.h>
#include <linux/interrupt.h>
static rwlock_t my_rwlock;
static int etx_global_variable = 0;
static DECLARE_TASKLET(my_tasklet, tasklet_fn, 0);
static irqreturn_t irq_handler(int irq, void *dev_id)
{
read_lock_irq(&my_rwlock);
printk(KERN_INFO "ISR Function: %lu\n", etx_global_variable);
read_unlock_irq(&my_rwlock);
tasklet_schedule(&my_tasklet);
return IRQ_HANDLED;
}
static void tasklet_fn(unsigned long arg)
{
write_lock_irq(&my_rwlock);
etx_global_variable++;
write_unlock_irq(&my_rwlock);
}
// ... (rest of your module code)
5. Locking Between Hard IRQ and Bottom Halves (IRQ Save/Restore)
Scenario: Similar to Approach 4, but allows you to save the current interrupt state before disabling
interrupts and restore it later. This is useful when you need to maintain the previous interrupt state.
Mechanism:
● read_lock_irqsave(): Saves the current interrupt state (enabled/disabled), disables interrupts, and
acquires the read lock.
● read_unlock_irqrestore(): Releases the read lock and restores the previously saved interrupt state.
● write_lock_irqsave(): Same as read_lock_irqsave(), but acquires a write lock.
● write_unlock_irqrestore(): Same as read_unlock_irqrestore().
Example:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/rwlock.h>
#include <linux/interrupt.h>
static rwlock_t my_rwlock;
static int etx_global_variable = 0;
static DECLARE_TASKLET(my_tasklet, tasklet_fn, 0);
static irqreturn_t irq_handler(int irq, void *dev_id)
{
unsigned long flags;
read_lock_irqsave(&my_rwlock, flags);
printk(KERN_INFO "ISR Function: %lu\n", etx_global_variable);
read_unlock_irqrestore(&my_rwlock, flags);
tasklet_schedule(&my_tasklet);
return IRQ_HANDLED;
}
static void tasklet_fn(unsigned long arg)
{
unsigned long flags;
write_lock_irqsave(&my_rwlock, flags);
etx_global_variable++;
write_unlock_irqrestore(&my_rwlock, flags);
}
// ... (rest of your module code)
6. Locking Between Hard IRQs
● Interrupt Handling: The choice of locking functions depends heavily on the interrupt handling
requirements.
● For simple cases, *_bh() might suffice.
● For scenarios involving hardware interrupts, *_irq() or *_irqsave() are necessary to prevent
unexpected interrupts.
Performance: Disabling interrupts has performance implications. Use it only when necessary and for
the shortest possible time.
Example:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/rwlock.h>
#include <linux/interrupt.h>
static rwlock_t my_rwlock;
static int etx_global_variable = 0;
static irqreturn_t irq_handler1(int irq, void *dev_id)
{
read_lock_irq(&my_rwlock);
printk(KERN_INFO "ISR Handler 1: %lu\n", etx_global_variable);
read_unlock_irq(&my_rwlock);
return IRQ_HANDLED;
}
static irqreturn_t irq_handler2(int irq, void *dev_id)
{
write_lock_irq(&my_rwlock);
etx_global_variable++;
write_unlock_irq(&my_rwlock);
return IRQ_HANDLED;
}
// ... (rest of your module code)
Signals in Linux
Definition: In Linux, signals are software interrupts. They are asynchronous messages sent to a
process to notify it of an event.
Purpose: Signals are used for various purposes, including:
Sending Signals:
● User Space: Processes can send signals to other processes using system calls like kill().
● Kernel Space:
● Drivers: Device drivers can send signals to user-space processes using functions like
send_signal_info().
● Kernel Threads: Kernel threads can also send signals to user-space processes.
Receiving Signals:
Processes can handle signals in various ways:
Default action: The default action for a signal can be defined (e.g., terminate the process).
Key Points:
● Asynchronous: Signals are asynchronous, meaning they can interrupt the normal execution flow
of a process at any time.
● Interrupts: Signals are similar to hardware interrupts in that they cause a change in the normal
execution flow of a process.
Sending Signal from Linux Device Driver to User Space
Example:
#define SIGETX 44
Registration Methods:
Example:
To prevent unintended signal deliveries, unregister the user-space application when it is no longer
interested in receiving signals from the driver.
Unregistration Methods:
Device File Close: Unregister the application when the device file is closed (release() system call).
Example:
Key Characteristics:
● Measurement: Timers are used to measure the duration of events or the time elapsed between
events.
● Control: Timers can be used to trigger events or actions after a specific time interval.
● Flexibility: Timers can be configured to count up (stopwatches) or count down (countdown
timers).
● Versatility: Timers are used in a wide range of applications, from everyday household appliances
(microwaves, washing machines) to complex industrial systems and scientific experiments.
Types of Timers:
Stopwatches:
● Measure the elapsed time between the start and stop signals.
● Used to measure the duration of events like races, sports activities, or cooking times.
Countdown Timers:
● Count down from a pre-set time interval to zero.
● Used for setting alarms, scheduling events, and controlling processes with time limits.
Timer Interrupts:
● The foundation of timekeeping in the Linux kernel lies in timer interrupts.
● These interrupts are generated periodically by the system's hardware timer (often a dedicated
hardware device).
● The frequency of these interrupts is typically in the range of milliseconds or microseconds,
depending on the system configuration.
● Each timer interrupt increments a system-wide counter (often called the jiffies counter).
Jiffies Counter:
● The jiffies counter represents the number of timer interrupts (jiffies) that have occurred
since the system boot.
● Kernel timers provide a mechanism to schedule the execution of a function (called a timer
function) at a specific time in the future.
● They are implemented as a data structure that holds information about the timer, such as:
● The timer function to be executed.
● The time at which the timer should expire.
● The interval between timer expirations (for repeating timers).
● When a timer expires, the kernel executes the associated timer function.
1. timer_setup: Initializes a kernel timer by setting up its callback function and data. It's used in newer
kernel versions.
Function:
void timer_setup(struct timer_list *timer, void (*function)(unsigned long), unsigned
long data);
2. mod_timer: Modifies the expiration time of an active timer or starts it if it's inactive. It's an efficient
way to update the timer's timeout.
Function:
1. Callback Function:
Purpose: This function is the core of a kernel timer. It's the code that gets executed when the timer
expires.
Program for understanding :
#include <linux/module.h>
#include <linux/timer.h>
#include <linux/kernel.h>
#include <linux/init.h>
#define TIMER_TIMEOUT 5 // Timeout in seconds
static struct timer_list my_timer;
void timer_callback(struct timer_list *timer) {
pr_info("Timer callback function executed.\n");
// Re-schedule the timer
mod_timer(&my_timer, jiffies + msecs_to_jiffies(TIMER_TIMEOUT * 1000));
}
static int __init my_module_init(void) {
pr_info("Module loaded. Setting up timer.\n");
// Initialize the timer
timer_setup(&my_timer, timer_callback, 0);
// Start the timer with an initial timeout
mod_timer(&my_timer, jiffies + msecs_to_jiffies(TIMER_TIMEOUT * 1000));
return 0;
}
static void __exit my_module_exit(void) {
pr_info("Module unloaded. Deactivating timer.\n");
// Deactivate the timer
del_timer(&my_timer);
}
module_init(my_module_init);
module_exit(my_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Vicky");
MODULE_DESCRIPTION("A simple example of using kernel timers.");
High Resolution Timers (HRT)
HRTs are a specialized timer mechanism within the Linux kernel designed to address the limitations of
traditional kernel timers in terms of resolution and performance.
Need for HRT:
Limitations of Kernel Timers: Kernel timers are bound to jiffies, which represent the number of timer
interrupts. This granularity (resolution) might not be sufficient for applications requiring precise
timing, such as:
● Multimedia applications: Audio/video processing, gaming, etc., where precise timing is crucial for
smooth playback and synchronization.
● Networking: High-performance networking applications that require precise timing for packet
scheduling and synchronization.
● Real-time systems: Applications with strict timing requirements.
HRT Features:
● Higher Resolution: HRTs provide much finer time resolution than kernel timers, typically in
nanoseconds.
● 64-bit Timestamps: HRTs use 64-bit timestamps for greater precision and support for longer time
intervals.
Enabling HRT:
● Kernel Configuration: HRTs are enabled by default in most modern Linux kernels. However, you
can check the kernel configuration file (/boot/config) for the CONFIG_HIGH_RES_TIMERS
option.
● /proc/timer_list: This file provides information about the timer subsystem. Look for .resolution
with a value in nanoseconds and event_handler as hrtimer_interrupt to confirm HRT support.
● clock_getres() system call: This system call can be used to obtain the resolution of the system's
clock.
1 Header Files
#include <linux/hrtimer.h>
#include <linux/ktime.h>
Fields:
● struct rb_node node: Node for red-black tree insertion based on time order.
● ktime_t expires: Absolute expiry time in the internal representation of HR timers.
● int (*function)(struct hrtimer *): Callback function called when the timer expires.
● struct hrtimer_base *base: Pointer to the timer base (specific to the CPU and clock).
3 ktime_t Datatype
Purpose: Stores time values with nanosecond precision.
Conversion Function:
Parameters:
● secs: Seconds to set.
● nanosecs: Nanoseconds to set.
Parameters:
● timer: Pointer to the HR timer to initialize.
● clock_id: Clock to use (e.g., CLOCK_MONOTONIC, CLOCK_REALTIME).
● mode: Timer mode (absolute HRTIMER_MODE_ABS or relative HRTIMER_MODE_REL).
Parameters:
● timer: Timer to forward.
● interval: Interval to forward.
1. Validating GPIO
GPIO pin, it's crucial to validate whether the GPIO number is valid for the platform.
bool gpio_is_valid(int gpio_number);
Parameters:
● gpio_number: The GPIO number to validate.
Returns:
● true if the GPIO number is valid, false otherwise.
2. Requesting GPIO
We must request a GPIO before using it to ensure exclusive access.
int gpio_request(unsigned gpio, const char *label);
Parameters:
● gpio: The GPIO number to request.
● label: A string label for the GPIO, visible in /sys/kernel/debug/gpio.
Returns: 0 on success, a negative number on failure.
2.1 Request one GPIO with flags:
int gpio_request_one(unsigned gpio, unsigned long flags, const char
*label);
3. Exporting GPIO
To debug or manipulate GPIOs from user space, you can export a GPIO to sysfs.
int gpio_export(unsigned int gpio, bool direction_may_change);
Parameters:
● gpio: The GPIO number to export.
● direction_may_change: Allows user space to change the direction if true.
Returns:
0 on success, an error code otherwise.
4. Unexporting GPIO
To remove a GPIO from sysfs after it has been exported:
void gpio_unexport(unsigned int gpio);
Parameters:
● gpio: The GPIO number to unexport.
Parameters:
● gpio: The GPIO number to set as input.
Returns:
0 on success, an error code otherwise.
5.2 Set as Output:
int gpio_direction_output(unsigned gpio, int value);
Parameters:
● gpio: The GPIO number to set as output.
● value: Initial value for the output (0 for low, 1 for high).
Returns:
0 on success, an error code otherwise.
Returns:
The IRQ number associated with the GPIO.
9. Releasing GPIO
To release a previously requested GPIO:
Variants:
Parameters:
● gpio: The GPIO number.
● debounce: The debounce time in milliseconds.
Returns:
● 0 on success, an error code otherwise.