0% found this document useful (0 votes)

14 views11 pages

1907.00652 DFGGB FFGGDGHVGHH HJMJBNJGHFFG

1907.00652 dfggb ffggdghvghh hjmjbnjghffg

Uploaded by

G M Kosgiker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views11 pages

1907.00652 DFGGB FFGGDGHVGHH HJMJBNJGHFFG

1907.00652 dfggb ffggdghvghh hjmjbnjghffg

Uploaded by

G M Kosgiker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Avoiding Implementation Pitfalls of “Matrix

Capsules with EM Routing” by Hinton et al.

Ashley Daniel Gritzman1r0000´0002´9949´157Xs

IBM Research, Johannesburg, South Africa

[Link]@[Link]
arXiv:1907.00652v1 [[Link]] 1 Jul 2019

Abstract. The recent progress on capsule networks by Hinton et al. has

generated considerable excitement in the machine learning community.
The idea behind a capsule is inspired by a cortical minicolumn in the
brain, whereby a vertically organised group of around 100 neurons receive
common inputs, have common outputs, are interconnected, and may well
constitute a fundamental computation unit of the cerebral cortex. How-
ever, Hinton’s paper on “Matrix Capsule with EM Routing” was unfortu-
nately not accompanied by a release of source code, which left interested
researchers attempting to implement the architecture and reproduce the
benchmarks on their own. This has certainly slowed the progress of re-
search building on this work. While writing our own implementation, we
noticed several common mistakes in other open source implementations
that we came across. In this paper we share some of these learnings,
specifically focusing on three implementation pitfalls and how to avoid
them: (1) parent capsules with only one child; (2) normalising the amount
of data assigned to parent capsules; (3) parent capsules at different posi-
tions compete for child capsules. While our implementation is a consider-
able improvement over currently available implementations, it still falls
slightly short of the performance reported by Hinton et al. (2018). The
source code for this implementation is available on GitHub at the follow-
ing URL: [Link]

Keywords: Capsules, EM routing, Hinton, CNN.

1 Introduction
Geoffrey Hinton has been talking about “capsule networks” for a long time, so
when his team published their recent progress on this topic, it naturally created
quite a stir in the machine learning community. The idea behind a capsule is
inspired by a cortical minicolumn in the brain, whereby a vertically organised
group of around 100 neurons receive common inputs, have common outputs, are
interconnected, and may well constitute a fundamental computation unit of the
cerebral cortex [2]. In the context of machine learning, a capsule is a group of
neurons whose outputs represents not only the probability that an entity exists,
but also different properties of the same entity. Capsules may encode information
such as orientation, scale, velocity, and colour. Layers in a capsule network learn
to assemble these entities to form parts of a larger whole.
2 A. Gritzman

In computer graphics, a scene is represented in abstract form comprising

objects and their corresponding instantiation parameters (e.g. x, y location, and
angle). A rendering function then converts this abstract representation into an
image. Hinton argues that the brain does ‘inverse graphics’ [3], which essentially
means deconstructing visual information received by the eyes into a hierarchical
representation of the world, and then trying to match it with already learned
patterns and relationships stored by the brain. A capsule network is basically a
neural network that tries to perform inverse graphics.
Hinton et al. [4] first introduced the concept of capsule networks in 2011 when
they used a transformation matrix in a “transforming auto-encoder” that learned
to transform a stereo pair of images into a stereo pair from a slightly different
viewpoint. But it was only towards the end of 2017 that Sabour et al. [13] pub-
lished a capsule network architecture featuring dynamic routing-by-agreement
that managed to reach state-of-the-art performance on MNIST [9], and consider-
ably better results than CNNs on MultiMNIST [13] (a variant with overlapping
pairs of different digits). Then in 2018, Hinton et al. [6] published “Matrix Cap-
sules with EM Routing” to address some of the deficiencies of Sabour et al. [13],
and reported a reduction in the test error on smallNORB [10] of 45% compared
to state-of-the-art.
The matrix capsule version of a capsule network is described as follows [6]:
“each capsule has a logistic unit to represent the presence of an entity and a
4x4 matrix which could learn to represent the relationship between that entity
and the viewer (the pose). A capsule in one layer votes for the pose matrix of
many different capsules in the layer above by multiplying its own pose matrix by
trainable viewpoint-invariant transformation matrices that could learn to repre-
sent part-whole relationships. Each of these votes is weighted by an assignment
coefficient. These coefficients are iteratively updated for each image using the
Expectation-Maximization algorithm such that the output of each capsule is
routed to a capsule in the layer above that receives a cluster of similar votes.”
Sabour et al. [13] made their code for “Dynamic Routing between Capsules”
available on GitHub [12], but unfortunately Hinton et al. [6] did not do the
same for “Matrix Capsules with EM Routing”, which may somewhat explain
the slower progress in research building on this work. While implementing this
work ourselves, we noticed several common mistakes in other implementations
that we came across, and also discovered a couple of pitfalls which may prevent
the network from operating as intended.
In this paper we share some of these learnings, specifically focusing on three
implementation pitfalls and how to avoid them: (1) parent capsules with only
one child; (2) normalising the amount of data assigned to parent capsules; (3)
parent capsules at different positions compete for child capsules.
To make this paper slightly easier to consume for readers not entirely familiar
the work, we try to simplify the terminology where possible, for example we use
the terms “child capsules” to represent capsules in lower layer L, and “parent
capsules” to represent capsules in higher layer L ` 1. However, we do assume
Avoiding Implementation Pitfalls of Matrix Capsules 3

that the reader is already familiar with Hinton’s paper [6], which is necessary in
order to understand the discussion on the pitfalls that follows below.

2 Understanding and Avoiding Pitfalls

2.1 Parent Capsules with Only One Child
The EM routing algorithm is a form of cluster finding which iteratively adjusts
the assignment probabilities between child capsules and parent capsules. Fig. 1
illustrates this process: at the start of EM routing, the output of each child
capsule is evenly distributed to all of the parent capsules. As the EM algorithm
proceeds, the affinity of parent capsules for particular child capsules increases,
and eventually each child capsule may contribute to only one parent capsule.
This phenomenon does not cause a problem if each parent capsule comprises
multiple child capsules, however the situation may arise whereby a parent capsule
comprises only one child capsule. This situation is similar to a clustering scenario
whereby a cluster has only one data point. Since the EM routing algorithm fits a
Gaussian distribution to each of the parent capsules, it is necessary to calculate
the mean µhj and variance pσjh q2 of each parent capsule. If the parent capsule
comprises only one child capsule, then the variance of the parent capsule is 0.
This causes numerical instability when calculating the activation cost in Eq (1),
as logpσjh q is undefined at σjh “ 0. Furthermore, a Gaussian distribution with a
variance of 0 is the unit impulse centered at the mean µhj , so pj pµhj q “ 8 which
causes numerical overflow.
` ˘ÿ
costh Ð βu ` logpσjh q Rij (1)
i

Parent
Capsules

.5 .5 .5 .5 .7 .3 .2 .8 1 0 0 1
Child
Capsules
Iteration 1 Iteration 2 Iteration 3

Fig. 1. Illustration of assignment probabilities between two capsule layers over three
iterations of EM routing. At iteration 3 each parent capsule receives input from only
one child capsule.

This problem of parent capsules having only one child capsule occurs more
frequently as the number of routing iterations increases, whereby the assignment
4 A. Gritzman

probabilities tend to either 0 or 1. In our experiments with the smaller capsule

network configuration of A “ 64, B “ 8, C “ D “ 16, the problem did not
occur during training with one or two iterations, but occurred consistently for
iteratations ě 3.
Furthermore, the occurrence of this problem also depends on the ratio of child
capsules to parent capsules. If the ratio is high, meaning many child capsules
feeding fewer parent capsules, then the problem only occurs at higher routing
iterations. Whereas, if the ratio is low and approaches 1:1, or even lower (i.e.
more parent capsules than child capsules), then the problem starts occurring at
a lower number of routing iterations.
To address this problem in our implementation, we impose a lower bound on
the variance pσjh q2 by adding “ 10´4 .

2.2 Normalising the Amount of Data Assigned to Parent Capsules

ř
In Eq (1) computing the activation cost, i Rij is the amount of data assigned
to parent capsule j from all child capsules.
For the convolutional capsule layers, each child capsule within the kernel
feeds to one spatial location in the parent layer containing O types of capsules.
If all child capsules within a convolutional kernel assign their data to only one
parent capsule type, then the maximum data that a parent capsule can receive
is give by:
max data “ K ˆ K ˆ I (2)
where K is the kernel size, and I is the number of input capsule types.
The mean data received by parent capsules (assuming that all child capsules
are active) is the total number of child capsules divided by the total number of
parent capsules:

child W ˆ child H ˆ I
mean data “ (3)
parent W ˆ parent H ˆ O

where W and H denote the spatial width and height of the tensors containing
the capsules, and O is the number of output capsule types.
For the final output layer denoted class caps, the spatial dimensions of the
child tensor is flattened such that the child capsules are fully connected to the
class capsules.
Table 1 shows a summary of the layers in the smaller capsule network con-
figuration. For capsule layers connected with EM routing, the maximum and
mean assignment data is calculated with equations (2) and (3). The mean as-
signment data is similar for both the conv caps1 and conv caps2 layers at 2.61
and 1.96 respectively, however notice that the mean data of the class caps layer
is «30ˆ larger at 80.0. The larger assignment data for the class caps layer occurs
since each parent capsule in this layer is fully connected to all child capsules in
conv caps2 layer. Therefore, the 5 ˆ 5 ˆ 16 “ 400 capsules in the conv caps2
layer feed to just 5 capsules in the class caps layer.
Avoiding Implementation Pitfalls of Matrix Capsules 5

Table 1. Summary of layers in the the smaller capsule network configuration (A “ 64,
B “ 8, C “ D “ 16) showing the maximum and mean assignment data between layers
with EM routing. K is the kernel size, S is the stride, Ch is the number of channels in a
regular convolution, I is the number of input capsule types, O is the number of output
capsule types, W and H are the spatial width and height. The Output shape shows the
dimensions of the tensor containing the activations (batch size, W, H, Ch or O); the
tensor containing the poses would have additional dimensions 4x4 for the pose matrix.

EM Rt. Data
Layer Details Output shape Max Mean
input (?, 32, 32, 1)
relu conv1 K=5, S=2, Ch=64 (?, 16, 16, 64)
primary caps K=1, S=1, Ch=8 (?, 16, 16, 8)
conv caps1 K=3, S=2, O=16 (?, 7, 7, 16) 72 2.61
conv caps2 K=3, S=1, O=16 (?, 5, 5, 16) 144 1.96
class caps flatten, O=5 (?, 1, 1, 5) 400 80.0

We now consider the effects of the large discrepancy in assignment data

between the class caps layer and the other layers. While the mean data was
calculated under the assumption that all child capsules are activeř(which is
unlikely), nevertheless the actual data assigned to a parent capsule i Rij will
be proportional to the mean value of the layer.
Consider the computation of the parent activations from the M-step of Pro-
cedure 1 in [6]:
˜ ˆ ˙¸
ÿ´ ¯ÿ
h
aj Ð logistic λ βa ´ βu ` logpσj q Rij
h i

Since βu is per capsule type and does not depend on h:

˜ ˆ ¸
ÿ ´ ÿ ¯˙
aj Ð logistic λ βa ´ Rij Hβu ` logpσjh q (4)
i h

Finally: ˜ ¸
ÿ ´ ÿ ¯
aj Ð logistic λβa ´ λ Rij Hβu ` logpσjh q (5)
i h

Consider the second term in Eq 4, notice that the cost of activatingřa parent
capsule is scaled by the total amount of data received by that capsule i Rij . In
Eq (5) the first term λβa sets the operating point on the logistic curve,
ř and the
second term determines the perturbations about this point. If the i Rij scaling
is too small, thenřall the output activations will not deviate from the operating
point. But if the i Rij scaling is too large, then all the parent capsules will either
be fully active or inactive. The most desirable situation occurs when the input
6 A. Gritzman

to the logistic function is nicely distributed over a useful range (e.g. r´5, 5s), so
the output of the logistic function is nicely distributed over the range r0, 1s.
The impact of the «30ˆ difference in assignment data is that the range of
the input to the logistic function will not be distributed over a useful range for
all capsule layers. The values βa and βv are learned discriminatively for each
layer, and can to some extent compensate for this effect. However, since βa and
βv are initialised randomly, the problem will be more pronounced at the start of
training and may prevent the network from learning.
This problem can be addressed to an extent by carefully initialising βa and
βv , which will need to be different for the class caps layer and the preceding
conv caps1 and conv caps2 layers. It will also be necessary to ensure that the λ
scaling value ensures a useful range for the input to the logistic function at every
layer. In our implementation we adopt a different ř approach, and instead scale
the amount of data assigned to a parent capsule p i Rij q relative to the mean
data in a particular layer (see Table 1 for scaling values). We find this approach
to be more robust to the initial values of βa and βv .

2.3 Parent Capsules at Different Positions Compete for Child

Capsules

Consider the case of a 1D convolutional capsule layer, with a kernel size of 3 and
a stride of 1, and where both child and parent layers contain only 1 capsule type.
In the M-step shown in Fig. 2, the kernel slides over the child capsules resulting
in each parent receiving votes from 3 child capsules. In the E-step, child capsules
towards the edges receive feedback from fewer parent capsules, while capsules
towards the center receive feedback from up to K parent capsules. In the example
of Fig. 2, tC1 , C5 u receive feedback from one parent capsule each, tC2 , C4 u each
receive feedback from two parent capsules, and tC3 u receives feedback from three
parent capsules.
It is clear from Fig. 2 that child capsules receive feedback from parent capsules
at different spatial positions, and therefore these parent capsules must compete
for the vote of the child capsule. The competition happens in the update of
the assignment probabilities in the E-step, where we normalise across all parent
capsules competing for a particular child capsule. This point was further clarified
by the paper authors in response to a question on [Link] [1].
We reviewed several open source implementations on GitHub, and found
that incorrect normalisation in the E-step is a common mistake. In particular,
the implementations normalise only across parent capsule types, and not across
parent capsule positions. This has the unintended effect of preventing parent
capsules at different positions from competing for child capsules. The correct
method is to normalise across all parent capsules that receive input from a
particular child capsule, which will include normalising across parent capsule
types and parent capsule positions.
We found this important detail somewhat tricky to implement, so we describe
our implementation below.
Avoiding Implementation Pitfalls of Matrix Capsules 7

Child Parent Child Parent

Capsules Capsules Capsules Capsules

𝐶" 𝐶"

𝐶$ 𝑃" 𝐶$ 𝑃"

𝐶# 𝑃$ 𝐶# 𝑃$

𝐶% 𝑃# 𝐶% 𝑃#

𝐶& 𝐶&
M-step E-step

Fig. 2. Connectivity between parent capsules and child capsules resulting from 1D
convolution with a kernel size of 3 and a stride of 1. In the M-step, all parent capsules
receive input from 3 child capsules; in E-step, child capsules towards the edges re-
ceive feedback from fewer parent capsules, while capsules at towards the center receive
feedback from up to 3 parent capsules.

With reference to Fig. 3, the first step in computing the votes is tiling the
child capsules according to the convolution kernel. The subsequent mapping
between child capsules and parent capsules is stored a 2D binary matrix called
the spatial routing map. The tiled representation is then multiplied by a tensor
containing K transformation matrices (3 in our example), which are learned
discriminatively with backpropagation during training. The tiled child capsules
at each spatial location are multiplied by the same transformation matrices. The
votes are then scaled by the corresponding assignment probabilities Rij and used
in the M-step to compute the mean µj , standard deviation σj , and activation aj
of each parent capsule.
Our implementation of the E-step is shown in Fig. 4. The probability density
pij of vote Vij is computed from the Gaussian distribution of parent capsule
Pj . The probability densities are then converted to sparse representation using
the spatial routing map that was stored during the convolution operation. In the
sparse representation the probability densities of each child capsule are aligned in
one column, thereby enabling us to normalise over all parent capsules competing
for a child capsule Ci .

Extending from 1D to 2D Convolution The above description of our im-

plementation refers to the case of 1D capsule convolution, with a stride of 1. In
order to extend to 2D convolution, we unroll the spatial dimension of the child
capsules and parent capsules. Fig. 5 shows an example of the spatial routing map
produced from 2D capsule convolution with a 3x3 kernel and of stride 2. The
rows of the spatial routing map correspond to parent capsules, and the columns
correspond to child capsules.
8 A. Gritzman

1 1 1 0 0
Spatial Routing
Map 0 1 1 1 0
0 0 1 1 1

Child
Capsules
Child Capsules Transformation Assignment Parent
Votes
𝐶' Tiled Matrices Probabilities Capsules

𝐶# 𝐶' 𝐶# 𝐶" 𝑉'' 𝑉#' 𝑉"' 𝑅)' 𝑃' 𝜇 ', 𝜎' , 𝑎'

𝐶" 𝐶# 𝐶" 𝐶$ 𝑇' 𝑇# 𝑇" 𝑉## 𝑉"# 𝑉$# 𝑅)# 𝑃# 𝜇 #, 𝜎# , 𝑎#

𝐶$ 𝐶" 𝐶$ 𝐶% 𝑉"" 𝑉$" 𝑉%" 𝑅)" 𝑃" 𝜇 ", 𝜎" , 𝑎"

𝐶%

Fig. 3. Votes are computed by tiling the child capsules feeding to each parent capsule,
and multiplying by the transformation matrix. The spatial routing map is a binary
matrix which stores the spatial connectivity between child capsules and parent capsules
resulting from the convolution operation. Vij denotes the vote from child capsule i to
parent capsule j. The votes are scaled by the corresponding assignment probabilities
Rij and used in the M-step to calculate the mean µj , standard deviation σj , and
activation aj of each parent capsule.

1 1 1 0 0
Spatial Routing
0 1 1 1 0
Map
0 0 1 1 1
Child
Capsules
Sparse Probability Probability Parent
𝐶" Density Density Capsules

𝐶$ 𝑝"" 𝑝$" 𝑝#" 0 0 𝑝"" 𝑝$" 𝑝#" 𝑃" 𝜇 ", 𝜎" , 𝑎"

𝐶# 0 𝑝$$ 𝑝#$ 𝑝%$ 0 𝑝$$ 𝑝#$ 𝑝%$ 𝑃$ 𝜇 $, 𝜎$ , 𝑎$

𝐶% 0 0 𝑝## 𝑝%# 𝑝&# 𝑝## 𝑝%# 𝑝&# 𝑃# 𝜇 #, 𝜎# , 𝑎#

𝐶&

Fig. 4. Feedback of parent capsules to child capsules in the E-step of EM routing (follow
diagram from right to left). pij denotes the probability density of vote Vij under the
Gaussian distribution of the parent capsule j. The spatial routing map, which stores
the spatial connectivity between child capsules and parent capsules, is used to convert
the probability densities to sparse representation thereby aligning by child capsule.
Finally, the probability densities are used to update the routing assignments Rij by
normalising over columns of the sparse representation.
Avoiding Implementation Pitfalls of Matrix Capsules 9

𝐶" 𝐶# 𝐶#$
𝑃" 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Spatial Routing 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
Map 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0
𝑃& 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1

𝐶" 𝐶# 𝐶' 𝐶& 𝐶$

𝐶( 𝐶) 𝐶* 𝐶+ 𝐶", 𝑇" 𝑇# 𝑇'
𝑃" 𝑃#
𝐶"" 𝐶"# 𝐶"' 𝐶"& 𝐶"$ 𝑇& 𝑇$ 𝑇(
𝑃' 𝐶&
𝐶"( 𝐶") 𝐶"* 𝐶"+ 𝐶#, 𝑇) 𝑇* 𝑇+

𝐶#" 𝐶## 𝐶#' 𝐶#& 𝐶#$

Child Capsules 2D Capsule Convolution Parent

Kernel = 3 x 3 Capsules
Stride = 2

Fig. 5. Example of spatial routing map produced by 2D capsule convolution with a

3x3 kernel and stride of 2.

3 Experiments
We implement “Matrix Capsules with EM Routing” by Hinton et al. [6] in Ten-
sorFlow, and test the smaller capsule network configuration (A “ 64, B “ 8,
C “ D “ 16) on the smallNORB [10] benchmark. We follow hyperparameter
suggestions of the authors [7] and use a weight decay of 2 ˆ 10´7 , and a learning
rate of 3 ˆ 10´3 with exponential decay: decay steps “ 2000, decay rate “ 0.96.
Lambda is set as follows [5]:
λ “ 0.01 ˚ p1 ´ 0.95i`1 q
where i is the routing iteration number (e.g. 0–2). The schedule for the increasing
the margin in the spread loss is set as follows [5]:
` ˘
margin “ 0.2 ` 0.79 ˚ sigmoid minp10, step{50000 ´ 4q
where step is the training step. The batch size is set to 64.
Fig. 6 shows the test accuracy of the matrix capsule model after each train-
ing epoch for 1–3 iterations of EM routing. Whereas Hinton et al. [6] report
the maximum test accuracy at 3 routing iterations, in our implementation the
maximum test accuracy of 95.4% occurs with 2 routing iterations, and with 3
iterations we record an accuracy of 93.7%.
In Table 2 we compare our implementation to other open source implementa-
tions available on GitHub. The accuracy of our implementation at 95.4% is a 3.8
10 A. Gritzman

percentage point improvement on the previous best open source implementation

by Zhang (www0wwwjs1) [14] at 91.8%, however it is still below the accuracy
reported in Hinton et al. [6]. At this time, our implementation is currently the
best open source implementation available.

Table 2. Comparison of test accuracy on smallNORB dataset for different implemen-

tations of “Matrix Capsules with EM Routing” by Hinton et al. [6]. For the open source
implementations on GitHub, the test accuracy is reported as at 28/05/2019, and the
specific commit is noted in the reference.

Implementation Framework Routing iterations Test accuracy

Hinton [6] Not available 3 97.8%
yl-1993 [11] PyTorch 1 74.8%
yl-1993 [11] PyTorch 2 89.5%
yl-1993 [11] PyTorch 3 82.5%
www0wwwjs1 [14] Tensorflow 2 91.8%
Officium [8] PyTorch 3 90.9%
Ours TensorFlow 1 86.2%
Ours TensorFlow 2 95.4%
Ours TensorFlow 3 93.7%

acc: 0.9540 acc: 0.9368

epoch: 362 epoch: 467
1.0
0.9
0.8
acc: 0.8618
0.7 epoch: 128
0.6
Accuracy

0.5
0.4
1 iter
0.3 1 iter (smooth)
0.2 2 iters
2 iters (smooth)
0.1 3 iters
3 iters (smooth)
0.0
0 100 200 300 400 500
Epoch

Fig. 6. Test accuracy of our implementation with 1–3 iterations of EM routing after
each training epoch. Smoothed with exponentially-weighted moving window α “ 0.25.
Avoiding Implementation Pitfalls of Matrix Capsules 11

4 Conclusion

In this paper we discuss three common pitfalls when implementing “Matrix Cap-
sules with EM Routing” by Hinton et al., and how to avoid them. While our im-
plementation performs considerably better than other open source implementa-
tions, nevertheless it still falls slightly short of the performance reported by Hin-
ton et al. (2018). The source code for this implementation is available on GitHub
at the following URL: [Link]

References
1. Calvano, G.S.: Some clarification on the convolution topology? (Jul 2018), https:
//[Link]/forum?id=HJWLfGWRb&noteId=BJgX7Iy04m
2. Cruz, L., Buldyrev, S.V., Peng, S., Roe, D.L., Urbanc, B., Stanley, H., Rosene,
D.L.: A statistically based density map method for identification and quantifica-
tion of regional differences in microcolumnarity in the monkey brain. Journal of
Neuroscience Methods 141(2), 321–332 (2005)
3. Hinton, G., Krizhevsky, A., Jaitly, N., Tieleman, T., Tang, Y.: Does the brain do
inverse graphics? In: Brain and Cognitive Sciences Fall Colloquium. vol. 2 (2012)
4. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Inter-
national Conference on Artificial Neural Networks. pp. 44–51. Springer (2011)
5. Hinton, G.E., Sabour, S., Frosst, N.: Lambda and margin (Jul 2018), https://
[Link]/forum?id=HJWLfGWRb&noteId=BkelcSxC47
6. Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: Inter-
national Conference on Learning Representations (2018), [Link]
forum?id=HJWLfGWRb
7. Hinton, G.E., Sabour, S., Frosst, N.: Regularization and learning rate? (Oct 2018),
[Link]
8. Huang, Y.: Capsules. GitHub (Apr 2019), [Link]
commit: e1f02d3
9. LeCun, Y., Cortes, C., Burges, C.J.: The mnist database of handwritten digits
(1998)
10. LeCun, Y., Huang, F.J., Bottou, L., et al.: Learning methods for generic object
recognition with invariance to pose and lighting. In: CVPR (2). pp. 97–104. Citeseer
(2004)
11. Lei, J.Y.: Matrix-capsules-em-pytorch. GitHub (Mar 2019), [Link]
yl-1993/Matrix-Capsules-EM-PyTorch, commit: c4547bf
12. Sabour, S.: Code for capsule model used in dynamic routing between capsules”.
GitHub (Jan 2018), [Link]
capsules, commit: cac8804
13. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Ad-
vances in neural information processing systems. pp. 3856–3866 (2017)
14. Zhang, S.: Matrix-capsules-em-tensorflow. GitHub (Feb 2018), [Link]
com/www0wwwjs1/Matrix-Capsules-EM-Tensorflow, commit: 0196ead

Pria 2019 8785981
No ratings yet
Pria 2019 8785981
5 pages
AI Capsule Networks Explained
No ratings yet
AI Capsule Networks Explained
16 pages
Dynamic Routing Between Capsules: Hinton Et Al. 2000a Hinton Et Al. 2011
No ratings yet
Dynamic Routing Between Capsules: Hinton Et Al. 2000a Hinton Et Al. 2011
11 pages
Dynamic Routing Between Capsules: Hinton Et Al. 2000 Hinton Et Al. 2011
No ratings yet
Dynamic Routing Between Capsules: Hinton Et Al. 2000 Hinton Et Al. 2011
11 pages
Capsnets Slides
100% (1)
Capsnets Slides
64 pages
Understanding Capsule Network Architecture
No ratings yet
Understanding Capsule Network Architecture
12 pages
1 - Neural Network Encapsulation
No ratings yet
1 - Neural Network Encapsulation
18 pages
Paik 19 A
No ratings yet
Paik 19 A
14 pages
Deepcaps: Going Deeper With Capsule Networks: Suranga - Seneviratne@Sydney - Edu.Au, Ranga@Uom - LK
No ratings yet
Deepcaps: Going Deeper With Capsule Networks: Suranga - Seneviratne@Sydney - Edu.Au, Ranga@Uom - LK
9 pages
Capsule Networks for Researchers
No ratings yet
Capsule Networks for Researchers
69 pages
Non-Iterative Cluster Routing - Analysis and Implementation Strategies
No ratings yet
Non-Iterative Cluster Routing - Analysis and Implementation Strategies
16 pages
Capsule Networks by Michael Dorkenwald
No ratings yet
Capsule Networks by Michael Dorkenwald
11 pages
Capsule Network - Kumar Shaswat
No ratings yet
Capsule Network - Kumar Shaswat
21 pages
Overview of Capsule Networks in AI
No ratings yet
Overview of Capsule Networks in AI
7 pages
NeurIPS 2019 Self Routing Capsule Networks Paper
No ratings yet
NeurIPS 2019 Self Routing Capsule Networks Paper
10 pages
W C: A W A C N I C: IDE APS IDE Ttention Based Apsule Etwork FOR Mage Lassification
No ratings yet
W C: A W A C N I C: IDE APS IDE Ttention Based Apsule Etwork FOR Mage Lassification
13 pages
CAPSULE NETWORK Project Research
No ratings yet
CAPSULE NETWORK Project Research
6 pages
Quantum Capsule Networks
No ratings yet
Quantum Capsule Networks
18 pages
Enhancing CNNs with Capsule Networks
No ratings yet
Enhancing CNNs with Capsule Networks
8 pages
Wasserstein Embedding For Capsule Learning
No ratings yet
Wasserstein Embedding For Capsule Learning
11 pages
Overview of Capsule Neural Networks
100% (1)
Overview of Capsule Neural Networks
42 pages
Capsule Networks: A Research Review
No ratings yet
Capsule Networks: A Research Review
16 pages
Matrix Capsules With em Routing
No ratings yet
Matrix Capsules With em Routing
12 pages
Capsule Networks
No ratings yet
Capsule Networks
57 pages
Understanding Capsule Networks
No ratings yet
Understanding Capsule Networks
2 pages
1544 Capsule Graph Neural Network
No ratings yet
1544 Capsule Graph Neural Network
16 pages
10.1007@s11760 020 01671 X
No ratings yet
10.1007@s11760 020 01671 X
9 pages
Transforming Auto-Encoders Explained
No ratings yet
Transforming Auto-Encoders Explained
8 pages
Capsule Network by Harsh
No ratings yet
Capsule Network by Harsh
5 pages
EncapNet-3D and U-EncapNet For Cell Segmentation
No ratings yet
EncapNet-3D and U-EncapNet For Cell Segmentation
7 pages
Zhang-Zhao2019 Article FluorescenceMicroscopyImageCla
No ratings yet
Zhang-Zhao2019 Article FluorescenceMicroscopyImageCla
12 pages
Capsule Networks for Object Annotation
No ratings yet
Capsule Networks for Object Annotation
10 pages
Handwritten Japanese Kanji Character Recognitionusing Different Pruning Algorithm
No ratings yet
Handwritten Japanese Kanji Character Recognitionusing Different Pruning Algorithm
7 pages
ICANN21 Capsule
No ratings yet
ICANN21 Capsule
14 pages
Edge Detection Using Model-Based Neural Networks: 200 2 CRC Press LLC
No ratings yet
Edge Detection Using Model-Based Neural Networks: 200 2 CRC Press LLC
24 pages
Pharma Capsule Defect Detection
No ratings yet
Pharma Capsule Defect Detection
5 pages
Graphics Capsule Learning Hierarchical 3D Face Representations
No ratings yet
Graphics Capsule Learning Hierarchical 3D Face Representations
10 pages
Approximation - and Quantization-Aware Training For Graph Neural Networks
No ratings yet
Approximation - and Quantization-Aware Training For Graph Neural Networks
14 pages
McIntosh Visual-Textual Capsule Routing For Text-Based Video Segmentation CVPR 2020 Paper
No ratings yet
McIntosh Visual-Textual Capsule Routing For Text-Based Video Segmentation CVPR 2020 Paper
10 pages
29203-Article Text-33257-1-2-20240324
No ratings yet
29203-Article Text-33257-1-2-20240324
10 pages
On The Bottleneck of Graph Neural Networks
No ratings yet
On The Bottleneck of Graph Neural Networks
16 pages
AlonAndYahav 2021 On The Bottleneck of Graph Neu
No ratings yet
AlonAndYahav 2021 On The Bottleneck of Graph Neu
16 pages
Featgraph: A Flexible and Efficient Backend For Graph Neural Network Systems
No ratings yet
Featgraph: A Flexible and Efficient Backend For Graph Neural Network Systems
12 pages
Neural Networks for IoT Devices
No ratings yet
Neural Networks for IoT Devices
12 pages
Harley Vis Isvc15
No ratings yet
Harley Vis Isvc15
11 pages
Topographic VAEs Learn Equivariant Capsules
No ratings yet
Topographic VAEs Learn Equivariant Capsules
27 pages
Graph Neural Networks
100% (1)
Graph Neural Networks
27 pages
Research Proposal
No ratings yet
Research Proposal
23 pages
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
No ratings yet
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
41 pages
Deep Learning Model Compression Survey
No ratings yet
Deep Learning Model Compression Survey
10 pages
Convolutional Neural Networks - Deeplearning-Notes
No ratings yet
Convolutional Neural Networks - Deeplearning-Notes
43 pages
You Can't Stop The Clock
No ratings yet
You Can't Stop The Clock
14 pages
Distributing TensorFlow for Neural Networks
No ratings yet
Distributing TensorFlow for Neural Networks
47 pages
Li Et Al. - 2021 - Learning Point Clouds in EDA
No ratings yet
Li Et Al. - 2021 - Learning Point Clouds in EDA
8 pages
Applsci 12 05285 With Cover
No ratings yet
Applsci 12 05285 With Cover
14 pages
Lecture 08
No ratings yet
Lecture 08
43 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Choi Attention Routing Between Capsules ICCVW 2019 Paper
No ratings yet
Choi Attention Routing Between Capsules ICCVW 2019 Paper
9 pages
Pre-Thesis Presentation of Segcaps Segmentation
No ratings yet
Pre-Thesis Presentation of Segcaps Segmentation
66 pages
Progress Report3
No ratings yet
Progress Report3
3 pages
Mechanical Dept. Lab Instruments
No ratings yet
Mechanical Dept. Lab Instruments
1 page
As Per Chairperson Directive
No ratings yet
As Per Chairperson Directive
60 pages
MBA Dept
No ratings yet
MBA Dept
24 pages
Faculty Profile New (1) - 2
No ratings yet
Faculty Profile New (1) - 2
1 page
FDP - Brochure
No ratings yet
FDP - Brochure
2 pages
List of Experiment and COs PPT New Format
No ratings yet
List of Experiment and COs PPT New Format
1 page
E&TC
No ratings yet
E&TC
29 pages
Homomorphicfiltering 191012063034
No ratings yet
Homomorphicfiltering 191012063034
9 pages
Staff Information GMK
No ratings yet
Staff Information GMK
1 page
Image Segmentation Presentation
No ratings yet
Image Segmentation Presentation
53 pages
Part 9
No ratings yet
Part 9
21 pages
FDP - Brochure
No ratings yet
FDP - Brochure
2 pages
CSE6366 - 11 (Enhancement in Frequency Domain 2)
No ratings yet
CSE6366 - 11 (Enhancement in Frequency Domain 2)
23 pages
Operation Research MCQ 6
No ratings yet
Operation Research MCQ 6
3 pages
Order Statistic Filters
No ratings yet
Order Statistic Filters
6 pages
Hepatitis C Hafizabad Research Paper DR Haider Awan
No ratings yet
Hepatitis C Hafizabad Research Paper DR Haider Awan
5 pages
UTPL - Language Testing Assigment (2019-2020)
No ratings yet
UTPL - Language Testing Assigment (2019-2020)
7 pages
Testing Second Language Speaking 1st Edition Glenn Fulcher PDF Download
No ratings yet
Testing Second Language Speaking 1st Edition Glenn Fulcher PDF Download
55 pages
Equity Research Financial Modeling
No ratings yet
Equity Research Financial Modeling
6 pages
SSIGL-2 Site Identification and Prioritization Ver-6
No ratings yet
SSIGL-2 Site Identification and Prioritization Ver-6
142 pages
Heutagogy in Post-Secondary Education
No ratings yet
Heutagogy in Post-Secondary Education
95 pages
Anam City Master Plan Overview
No ratings yet
Anam City Master Plan Overview
15 pages
Contextualizing Adult Education Instruction To Career Pathways
No ratings yet
Contextualizing Adult Education Instruction To Career Pathways
136 pages
OPM Ques Bank
No ratings yet
OPM Ques Bank
9 pages
HLTH 301: Evidence in Health Care
No ratings yet
HLTH 301: Evidence in Health Care
8 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
Statistical Analysis of Rice Procurement Factors
No ratings yet
Statistical Analysis of Rice Procurement Factors
3 pages
Ground Anchors and Soil Nails in Construction
100% (1)
Ground Anchors and Soil Nails in Construction
12 pages
Seminar Research Design
No ratings yet
Seminar Research Design
12 pages
Verification and Validation of Simulation Models: Chapter 10-2
No ratings yet
Verification and Validation of Simulation Models: Chapter 10-2
24 pages
Lesson Plan For Sped 311 Animal Cell
No ratings yet
Lesson Plan For Sped 311 Animal Cell
4 pages
Student Performance Rubric
100% (1)
Student Performance Rubric
2 pages
44 Stigmashortbattery
No ratings yet
44 Stigmashortbattery
7 pages
Thesis Topics in Emergency Medicine
100% (3)
Thesis Topics in Emergency Medicine
6 pages
EFL Learners' Learning Styles Study
No ratings yet
EFL Learners' Learning Styles Study
23 pages
SPSS 17 Guide: Data Analysis Basics
100% (4)
SPSS 17 Guide: Data Analysis Basics
77 pages
Evidence-Based Care Sheet: Conflict Management (The Joint Commission, 2018)
No ratings yet
Evidence-Based Care Sheet: Conflict Management (The Joint Commission, 2018)
3 pages
14.5.6 Minitab Time Series and Forecasting
No ratings yet
14.5.6 Minitab Time Series and Forecasting
9 pages
302-Article Text-775-1-10-20181222 PDF
No ratings yet
302-Article Text-775-1-10-20181222 PDF
13 pages
Demystifying Competency Modeling A Software Engineering Case Study
No ratings yet
Demystifying Competency Modeling A Software Engineering Case Study
11 pages
Teacher Training Seminars 2017
No ratings yet
Teacher Training Seminars 2017
1 page
InfinintyQS SPC Boot Camp Training Manual
100% (1)
InfinintyQS SPC Boot Camp Training Manual
27 pages
Impact of Brand Awareness On Consumer Buying Behavior: Footwear Industry of Pakistan
No ratings yet
Impact of Brand Awareness On Consumer Buying Behavior: Footwear Industry of Pakistan
31 pages
Chi Square
No ratings yet
Chi Square
6 pages

1907.00652 DFGGB FFGGDGHVGHH HJMJBNJGHFFG

Uploaded by

1907.00652 DFGGB FFGGDGHVGHH HJMJBNJGHFFG

Uploaded by

Avoiding Implementation Pitfalls of “Matrix

Capsules with EM Routing” by Hinton et al.

Ashley Daniel Gritzman1r0000´0002´9949´157Xs

IBM Research, Johannesburg, South Africa

Abstract. The recent progress on capsule networks by Hinton et al. has

Keywords: Capsules, EM routing, Hinton, CNN.

In computer graphics, a scene is represented in abstract form comprising

2 Understanding and Avoiding Pitfalls

probabilities tend to either 0 or 1. In our experiments with the smaller capsule

2.2 Normalising the Amount of Data Assigned to Parent Capsules

We now consider the effects of the large discrepancy in assignment data

Since βu is per capsule type and does not depend on h:

2.3 Parent Capsules at Different Positions Compete for Child

Child Parent Child Parent

Extending from 1D to 2D Convolution The above description of our im-

𝐶" 𝐶# 𝐶" 𝐶$ 𝑇' 𝑇# 𝑇" 𝑉## 𝑉"# 𝑉$# 𝑅)# 𝑃# 𝜇 #, 𝜎# , 𝑎#

𝐶$ 𝐶" 𝐶$ 𝐶% 𝑉"" 𝑉$" 𝑉%" 𝑅)" 𝑃" 𝜇 ", 𝜎" , 𝑎"

𝐶# 0 𝑝$$ 𝑝#$ 𝑝%$ 0 𝑝$$ 𝑝#$ 𝑝%$ 𝑃$ 𝜇 $, 𝜎$ , 𝑎$

𝐶% 0 0 𝑝## 𝑝%# 𝑝&# 𝑝## 𝑝%# 𝑝&# 𝑃# 𝜇 #, 𝜎# , 𝑎#

𝐶" 𝐶# 𝐶' 𝐶& 𝐶$

𝐶#" 𝐶## 𝐶#' 𝐶#& 𝐶#$

Child Capsules 2D Capsule Convolution Parent

Fig. 5. Example of spatial routing map produced by 2D capsule convolution with a

percentage point improvement on the previous best open source implementation

Table 2. Comparison of test accuracy on smallNORB dataset for different implemen-

Implementation Framework Routing iterations Test accuracy

acc: 0.9540 acc: 0.9368

You might also like