0% found this document useful (0 votes)
220 views18 pages

Sparse Matrix Optimization Guide

This document discusses sparse matrix techniques for representing and performing operations on sparse matrices efficiently. It introduces common sparse matrix storage formats like compressed row storage (CRS) and compressed column storage (CCS) and describes how to perform sparse matrix-vector multiplication (SpMV) using each format. It also discusses other storage schemes like jagged diagonal storage and segmented sum and optimizations like matrix reordering and register blocking. The performance of different schemes on various hardware is compared.

Uploaded by

narakrib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
220 views18 pages

Sparse Matrix Optimization Guide

This document discusses sparse matrix techniques for representing and performing operations on sparse matrices efficiently. It introduces common sparse matrix storage formats like compressed row storage (CRS) and compressed column storage (CCS) and describes how to perform sparse matrix-vector multiplication (SpMV) using each format. It also discusses other storage schemes like jagged diagonal storage and segmented sum and optimizations like matrix reordering and register blocking. The performance of different schemes on various hardware is compared.

Uploaded by

narakrib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sparse Matrix Techniques

(Tutorial)
[Link]
LawrenceBerkeleyNationalLab
Math290/CS298,UCB
Jan.31,2007

Outline
PartI
Computerrepresentationsofsparsematrices
Sparsematrixvectormultiplywithvariousstorages
Performanceoptimizations
PartII
Techniquesforsparsefactorizations
(e.g.,SuperLUsolver)

01/31/07

Math290/CS298

SparseStorageSchemes
Notation
Ndimension
NNZnumberofnonzeros
Assumearbitrarysparsitypattern
tripletsformat({i,j,val})isnotsufficient...
Storage:2*NNZintegers,NNZreals
Noteasytorandomlyaccessoneroworcolumn
Linkedlistformatprovidesflexibility,butnotfriendlyonmodern
architectures...
CannotcallBLASdirectly

01/31/07

Math290/CS298

CompressedRowStorage(CRS)
Storenonzerosrowbyrowcontiguously
Example:N=7,NNZ=19
3arrays:
Storage:NNZreals,NNZ+N+1integers

2
c d
e

4
h

f
5
g
i 6 j
l
7

135811131720

nzval1a2bcd3e4f5ghi6jkl7
colind1425123245574567357
rowptr135811131720

01/31/07

Math290/CS298

SpMV(y=Ax)withCRS
135811131720

nzval1a2bcd3e4f5ghi6jkl7
colind1425123245574567357
rowptr135811131720

doi=1,N...rowiofA
sum=0.0
doj=rowptr(i),rowptr(i+1)1
sum=sum+nzval(j)*x(colind(j))
enddo
y(i)=sum
enddo

dotproduct
Nolocalityforx
Vectorlengthusuallyshort
Memorybound:3reads,2flops

01/31/07

Math290/CS298

CompressedColumnStorage(CCS)
AlsoknownasHarwellBoeingformat
Storenonzeroscolumnwisecontiguously
3arrays:
Storage:NNZreals,NNZ+N+1integers

2
c d
e

4
h

f
5
g
i 6 j
l
7

nzval1c2de3ka4hbf5il6gj7
rowind1323437146245676567
colptr136811161720

01/31/07

Math290/CS298

SpMV(y=Ax)withCCS
nzval1c2de3ka4hbf5il6gj7
rowind1323437146245676567
colptr136811161720

y(i)=0.0,i=1N
doj=1,N...columnjofA
t=x(j)
doi=colptr(j),colptr(j+1)1
y(rowind(i))=y(rowind(i))+nzval(i)*t
enddo
enddo

SAXPY
Nolocalityfory
Vectorlengthusuallyshort
Memorybound:3reads,1write,2flops

01/31/07

Math290/CS298

JaggedDiagonalStorage(JDS)
AlsoknownasITPACK,orEllpackstorage[Saad,Kincaidetal.]
Forceallrowstohavethesamelengthasthelongestrow,
thencolumnsarestoredcontiguously

1
c

a
2
d
e

b
3
4
h
k

f
5
i
l

1
2
c
e
g
5
j
h
7
k

a
b
d
4
g
i
l

0
0
3
f
0
6
7

0
0
0
0
0
j
0

2arrays:nzval(N,L)andcolind(N,L),whereL=maxrowlength
N*Lreals,N*Lintegers
UsuallyL<<N
01/31/07

Math290/CS298

SpMVwithJDS
y(i)=0.0,i=1N
doj=1,L
doi=1,N
y(i)=y(i)+nzval(i,j)*x(colind(i,j))
enddo
enddo


1
2
c
e
5
h
k

a
b
d
4
g
i
l

0
0
3
f
0
6
7

0
0
0
0
0
j
0

NeitherdotnorSAXPY
Goodforvectorprocessor:longvectorlength(N)
Extramemory,flopsforpaddedzeros,especiallybadifrowlengths
varyalot

01/31/07

Math290/CS298

SegmentedSum[Blellochetal.]
DatastructureisanaugmentedformofCRS
ComputationalstructureissimilartoJDS
Eachrowistreatedasasegmentinalongvector
Underlinedelementsdenotethebeginningofeachsegment
(i.e.,arowinA)
Dimension:S*L~NNZ,whereLischosentoapproximatethe
hardwarevectorlength

01/31/07

1
2
c d
e

a
b
3
4
h
k

f
5
i
l

1 d
a 3
2 e
g
b 4
6 j
c f
7

Math290/CS298

5
g
h
i
6

j
k
l
7

10

SpMVwithSegmentedSum

1
2
c d
e

a
b
3
4
h
k

f
5
i
l

1 d
a 3
2 e
g
b 4
6 j
c f
7

5 j
g k
h l
i 7
6

doi=S,1
doj=1,L
...
enddo
enddo

2arrays:nzval(S,L)andcolind(S,L),whereS*L~NNZ
NNZreals,NNZintegers
Goodforvectorprocessors
SpMVisperformedbottomup,witheachrowsum(dot)ofAxstoredin
thebeginningofeachsegment
SimilartoJDS,butwithmorecontrollogicininnerloop
01/31/07

Math290/CS298

11

Performance(megafloprate)[Gaekeetal.]
Testmatrix:N=10000,NNZ=177782,randompattern
~18nonzerosperrowonaverage
JDSdoes4.6xmoreoperations
machine

Ultra2i

Pentium4

VIRAM

Clockrate

333MHz

1.5GHz

200MHz

Peakfloprate

667Mflops

1.5Gflops

1.6Gflops

CRS

29

209

110

JDS
(effective)

27
6

17
4

632
137

SegSum

29

165

01/31/07

Math290/CS298

12

OptimizationTechniques
Matrixreordering
ForCRSSpMV,canimprovexvectorlocalitybyreducingthe
bandwidthofmatrixA
Example:reverseCuthillMcKee(breadthfirstsearch)

Observed23ximprovement[Toledo,etal.]
01/31/07

Math290/CS298

13

OptimizationTechniques
Registerblocking
FinddenseblocksofsizerbycinA
(Ifneeded,allowsomezerostobefilledin)
A*xisproceededblockbyblock
keepcelementsofxandrelementsofyinregisters
xelementreusedrtimes,yelementreusedctimes
Amountofindexedloadandstoreisreduced
Obtainedupto2.5ximprovement[Vuducetal.]

01/31/07

Math290/CS298

14

SPARSITY[Im,Yelick]

01/31/07

Math290/CS298

15

PerformanceImprovement[Vuducetal.]

01/31/07

Math290/CS298

16

OtherRepresentations
Blockentryformats(e.g.,multipledegreesoffreedomare
associatedwithasinglephysicallocation)
Constantblocksize(BCRS)
Varyingblocksizes(VBCRS)
Skyline(orprofile)storage(SKS)
Lowertrianglestoredrowbyrow
Uppertrianglestoredcolumnbycolumn
Ineachrow(column),firstnonzero
definesaprofile
Allentrieswithintheprofile
(somemaybezeros)arestored

01/31/07

Math290/CS298

17

References
Templatesforthesolutionoflinearsystems
Barrett,etal.,SIAM,1994
BeBOP:[Link]
SparseBLASstandard:
[Link]

01/31/07

Math290/CS298

18

You might also like