Platform Architecture Lab
USB Performance Analysis
of Bulk Traffic
Brian Leete
[email protected]
Introduction
Platform Architecture Lab
Bulk
Traffic
Designed
for reliable, highly variable data
transfer
No guarantees are made in the specification
for throughput
Is scheduled last after ISOC, Interrupt, and
Control
Throughput is dependant on many factors
Introduction
will look at Bulk Throughput from the
following aspects
Platform Architecture Lab
We
Distribution
of Throughput for Various Packet Sizes and
Endpoints
Low Bandwidth Performance
Small Endpoint Performance
Nak Performance
CPU Utilization
PCI bus Utilization
Test Environment -- Hardware
Platform Architecture Lab
PII
233 (8522px) with 512 Bytes Cache
Atlanta Motherboard with 440LX (PIX
4A) Chipset
32 Meg Memory
Symbios OHCI Controller (for OHCI
Measurements)
Intel Lava Card as Test Device
Test Environment -- Software
Platform Architecture Lab
Custom
Driver and Application
Test Started by IOCTL
IOCTL allocates static memory
structures, submits IRP to USBD
Completion routine resubmits next
buffer
All processing done at ring 0,
IRQL_DISPATCH
Terminology
Platform Architecture Lab
A Packet is a Single Packet of Data on the Bus. It is
determined by Max Packet Size of the Device
Valid
A Buffer is the amount of data sent to USBD in a
Single IRP.
In
numbers are 8, 16, 32, 64
this presentation buffers range from 8 Bytes to 64K Bytes
Unless otherwise specified, Most Data Taken at 64
Byte Max Packet Size, 15 Endpoints Configured in the
System
Platform Architecture Lab
Host Controller Operation (UHCI)
Total Throughput on All End Points v.s. Buffer Size for Multiple
Endpoints
(UHCI)
Oscillations @ 256, 512
Single Endpoint Throughput
Byte Buffers
Flat Throughput @
512 and 1024 Byte
Buffers
1000000
Throughput
(Bytes per Second)
Platform Architecture Lab
1200000
800000
600000
400000
32767
8192
2048
512
Buffer Size
128
(Bytes)
200000
0
15 14
13 12
11 10
32
9
Number of Endpoints
Small Buffer Throughput
8
4
Small Buffer Throughput
Platform Architecture Lab
For
Buffer Sizes < Max Packet Size
Host
Controller sends 1 Buffer per Frame
No
Ability to Look Ahead and Schedule
Another IRP Even Though Time Remains in
the Frame
Why
is this?
Platform Architecture Lab
Interrupt Delay
Last Packet
Buffer 'n'
First Packet
Buffer 'n+1'
Unused Frame
Software Latency
Start of Frame
Interrupt
10
Platform Architecture Lab
Single Endpoint Graph
Flat Throughput @ 1024 and 512 Byte Graphs
Single Ended Throughput for 64K Byte Buffers Below
Theoretical Max of 1216000 Bytes per Second
Both are explained by Looking at the Number of
Packets per Frame
11
Platform Architecture Lab
Maximum Packets per Frame
Buffer Maximum Number of Number
Total
Size
Bytes per Frames to of Bytes Number of
Frame (15 transfer Left Over Frames To
Packets @ bulk of data
Transfer
64 Bytes
Data
Per
Packet)
8
960
1
0
1
16
960
1
0
1
32
960
1
0
1
64
960
1
0
1
128
960
1
0
1
256
960
1
0
1
512
960
1
0
1
1024
960
1
64
2
2048
960
2
128
3
4096
960
4
256
5
8192
960
8
512
9
16384
960
17
64
18
32768
960
34
128
35
65536
960
68
256
69
Maximum Measured
Expected Throughput
Throughput (Bytes per
(Bytes per
Second)
Second for
Transfer
Size)
8000
8071
16000
16082
32000
32293
64000
64264
128000
129186
256000
255667
512000
512017
512000
515515
682666
682803
819200
819200
910222
910131
910222
910404
936228
936072
949797
948087
12
Throughput for Multiple Endpoints
512 Byte Buffers
1000000
Throughput (Bytes Per Second)
Platform Architecture Lab
1200000
800000
600000
400000
200000
0
1
10
11
12
13
14
15
Number of Endpoints
13
Platform Architecture Lab
512 Byte Buffers 1 Endpoint
End
Point
1
SOF
Inter P
Delay
(Bits)
1000 0
M B
End
Time
(Bits)
5000
8 Packets Total per Frame
8 Packets * 64 Bytes per Packet = 512,000 B/S
511986
Measured
14
Platform Architecture Lab
512 Byte Buffers 2 Endpoints
End
Point
2
1
SOF
Inter
Delay
(Bits)
5
0
0
1
1
2
2
3
3
4
4
Ending
Time
(Bits)
5
5
6
6
480
16 Packets Total per Frame
16 Packets * 64 Bytes per Packet = 1,024,000 B/S
1,022,067
B/S Measured
Notice that Interrupt Delay is not a factor here!
15
512 Byte Buffer -- 3 Endpoints
For Frame N
Platform Architecture Lab
End
Point
3
2
1
Inter
Delay
(Bits)
S
O
F
0
1000
1
0
1
15 Packets Total in This Frame
Ending
Time
554
For Frame N + 1
End
Point
3
2
1
S
O
F
Inter
Delay
(Bits)
Ending
Time
7
6
4
5
9 Packets Total in This Frame
7
6
5700
24 Packets * 64 Bytes / 2 Frames = 768,000 B/S
776,211
Measured
16
Total Throughput on All Endpoints V.S. Buffer Size
for Multiple Endpoints
(OHCI)
Single Ended Throughput
900,000 VS 950,000 B/S
1200000
Total Throughput
(Bytes per Second)
Platform Architecture Lab
High End Throughput
18 PPF VS 17 PPF
Flat Throughput @
512 and 1024 B
Buffers
1000000
800000
600000
400000
32768
4096
512
200000
0
Oscillations @
256 and 512 B
buffers
15 14 13
64
12 11 10
9 8 7
6 5 4
3 2
Number of Endpoints
Buffer Size
(Bytes)
Small Buffer
Throughput
17
Platform Architecture Lab
Minimal Endpoint Configuration
Total Throughput on All Endpoints V.S. Buffer Size for
Multiple Endpoints
Minimal Endpoint Configuration
(UHCI)
1200000
Total Throughput
(Bytes per Second)
Platform Architecture Lab
Higher Single Endpoint
Throughput 17 VS 15 PPF
1000000
800000
600000
400000
32768
200000
4096
512
0
15 14
64
13
12
11 10
Number of Endpoints
Buffer Size
(Bytes)
8
4
19
Platform Architecture Lab
Host Controller Operation (UHCI)
20
Throughput of a Single Endpoint in Single and Multiple Endpoint Configurations
(UHCI)
1200000
Throughput (Bytes per Second)
Platform Architecture Lab
1000000
800000
Single
Multiple
600000
400000
200000
0
8
16
32
64
128
256
512
1024
2048
4096
8192
16384 32768 65536
Buffer Size (Bytes)
21
Platform Architecture Lab
Results
We
are working with Microsoft to remove
unused endpoints from the Host Controller
Data Structures
22
Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints
Minimal Endpoint Configuration
(OHCI)
Higher Single Endpoint
Throughput
More Endpoints get 18
Packets per Frame
1000000
Total Throughput
(Bytes per Second)
Platform Architecture Lab
1200000
800000
600000
400000
32768
8192
2048
512
Buffer Size
128
(Bytes)
200000
0
15 14 13
12 11
10 9
32
8
Number of Endpoints
8
3
23
Platform Architecture Lab
Distribution of Throughput across
Endpoints
Throughput by End Point V.S. Number of Endpoints
(UHCI)
64K Byte Buffers
900000
800000
700000
Throughput
(Bytes Per Sec)
Platform Architecture Lab
1000000
600000
500000
400000
300000
200000
3
100000
6
9
0
1
Number of Endpoints
12
4
Endpoint Number
10 11
12 13
14 15
15
25
Platform Architecture Lab
Results
We are working with Microsoft to get the Host
Controller driver to start sending packets at the next
endpoint rather than starting over at the beginning of
the frame.
26
Throughput by Endpoint V.S. Number of Endpoints
64K Byte Buffers
(OHCI)
800000
700000
600000
Throughput
(Bytes Per Sec)
Platform Architecture Lab
900000
500000
400000
300000
200000
3
100000
6
9
0
1
Num ber of Endpoints
12
4
Endpoint Num ber
10 11
12 13
14 15
15
27
Platform Architecture Lab
Limited Bandwidth Operation
Throughput by Endpoint V.S. Number of Endpoints
1023 Bytes / Frame Isoc Traffic
(UHCI)
250000
Throughput
(Bytes per Second)
Platform Architecture Lab
300000
200000
150000
100000
50000
3
5
7
9
0
1
Endpoint Number
11
2
13
7
Number of Endpoints
10
11
12
15
13
14
15
29
Throughput by Endpoint V.S. Number of Endpoints
768 Bytes / Frame Isoc Traffic
(OHCI)
400000
Throughput (Bytes Per Sec)
Platform Architecture Lab
350000
300000
250000
200000
150000
100000
1
3
50000
5
7
9
0
1
Endpoint Number
11
2
13
8
Number of Endpoints
10 11
12
15
13 14
15
30
Platform Architecture Lab
Small Endpoint Performance
450000
400000
Total Throughput
(Bytes per Second)
Platform Architecture Lab
Total Throughput on All End Points V.S. Buffer Size for
Multiple Endpoints
8 Byte Max Packet Size
(UHCI)
350000
300000
250000
200000
150000
100000
32768
4096
50000
512
0
15 14 13
12 11 10
9 8 7
6
64
5 4
Buffer Size
(Bytes)
8
3
Number of Endpoints
32
Total Throughput on All End Points v.s. Buffer Size for Multiple Endpoints
8 Byte Max Packet Size
(OHCI)
450000
400000
Total Throughput
(Bytes per Second)
Platform Architecture Lab
500000
350000
300000
250000
200000
150000
100000
32768
8192
50000
2048
512
0
15
128
14
13
12
11
10
Buffer Size
(Bytes)
32
9
Number of Endpoints
8
3
33
Total Throughput for a Single Endpoint for Various Packet Sizes
(OHCI)
1000000
800000
Throughput (Bytes per Second)
Platform Architecture Lab
900000
700000
600000
8
16
500000
32
64
400000
300000
200000
100000
0
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
Buffer Size
34
Throughput by Endpoint V.S. Number of Endpoints
Mixed 64 and 8 Byte Endpoints
(UHCI)
900000
Throughput (Bytes per Second)
Platform Architecture Lab
1000000
800000
700000
600000
500000
400000
300000
200000
1
3
5
100000
7
9
0
1
11
2
Number of Endpoints
13
7
Endpoint Number
10
11
12
15
13
14
15
35
Platform Architecture Lab
If you care about throughput.
Use 64 byte Max Packet Size Endpoints
Use Large Buffers
36
Platform Architecture Lab
Nak Performance
1200000
1000000
Total Throughput
Platform Architecture Lab
Total Throughput on All Endpoints V.S. Buffer Size
for Multiple Endpoints
with 1 Endpoint NAKing 64 Bytes OUT
(OHCI)
800000
600000
400000
32768
8192
2048
Buffer Size
512
(Bytes)
128
200000
0
15 14
13
12 11
10
32
9
Number of Endpoints
8
3
38
Single Endpoint Throughput
With 64 Byte Endpoint NAKing on the Bus
(OHCI)
900000
45 % Drop in Total
Throughput
800000
700000
600000
No NAK
NAK
500000
400000
300000
200000
100000
Buffer Size
9
16 2
38
32 4
76
65 8
53
6
81
96
40
48
20
24
10
51
6
25
8
12
64
32
16
0
8
Throughput (Bytes per Second)
Platform Architecture Lab
1000000
39
Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints
14 Endpoints OUT, 1 Endpoint NAK IN
(UHCI)
1000000
Total Throughput
(Bytes per Second)
Platform Architecture Lab
1200000
800000
600000
400000
32768
8192
2048
Buffer Size
512
(Bytes)
128
200000
0
15 14
13 12
11 10
32
9
Number of Endpoints
8
3
40
Single Endpoint Throughput
One Endpoint NAKing IN
1000000
Throughput (Bytes per Second)
Platform Architecture Lab
900000
800000
700000
600000
Nak
No NAK
500000
400000
300000
200000
100000
0
8
16
32
64
128
256
512
1024
2048
4096
8192
16384 32768 65536
Buffer Size
41
Platform Architecture Lab
CPU Utilization
CPU Utilization
Platform Architecture Lab
Idle process incrementing a counter in main memory
Designed
Numbers indicate how much work the CPU could
accomplish after servicing USB traffic
Higher
numbers are better
Small buffers and large numbers of Endpoints take
more overhead
Software
to simulate a heavily CPU bound load
Stack Navigation
Endpoint 0 is the Control -- No USB Traffic running
43
CPU Utilization
(UHCI)
10000000
8000000
6000000
4000000
3
5
7
2000000
Number of
Endpoints
11
13
65536
15
32768
Buffer Size (Bytes)
16384
8192
4096
0
2048
Idle Count
Platform Architecture Lab
12000000
44
CPU Utilization
(OHCI)
Platform Architecture Lab
12000000
10000000
8000000
6000000
4000000
3
5
7
2000000
9
11
13
0
2048 4096
8192 16384
32768 65536
15
45
Platform Architecture Lab
PCI Utilization
PCI Utilization
(UHCI)
30
25
% Utilization
Platform Architecture Lab
35
20
15
1
3
10
5
7
9
Number of Endpoints
11
13
0
2048
4096
8192
Buffer Size
15
16384 32768
65536
47
Platform Architecture Lab
PCI Utilization
(UHCI)
15 Endpoint Configuration
For low numbers of active endpoints, Host Controller
must poll memory for each unused endpoint, causing
relatively high utilization.
Removing unused endpoints will lower single
endpoint PCI utilization for this configuration.
48
Conclusions
UHCI Host Controller Driver needs a few tweaks
Platform Architecture Lab
Need
to get Host Controller to start sending packets where it last
left off rather than at endpoint 1.
Needs to remove unused endpoints from the list
Performance Recommendations
Use
64 Byte Max Packet Size Endpoints
Large Buffers are better than small buffers
Reduce NAKd traffic
Fast devices if possible
49
Platform Architecture Lab
Future Research Topics
Multiple IRPS per Pipe
USB needs to control throughput to the slow device
Small
Endpoints arent good
Small Buffers arent good
NAKing isnt good
50