The Positional Encoding Blog
The Positional Encoding Blog
Let's use sinusoidal functions to inject the order of words in our model
Posted by September 20, 2019 · 17 mins read
Table of Content
What is positional encoding and Why do we need it in the first place?
Proposed method
The intuition
Other details
Relative Positioning
FAQ
Summary
References
Thanks to the several implementations in common deep learning frameworks, it became an easy option
to experiment with for many students (including myself). Even though making it more accessible is a
great thing, but on the downside it may cause the details of the model to be ignored.
In this article, I don’t plan to explain its architecture in depth as there are currently several great
tutorials on this topic (here, here, and here), but alternatively, I want to discuss one specific part of the
transformer’s architecture - the positional encoding.
When I read this part of the paper, it raised some questions in my head, which unfortunately the author
had not provided sufficient information to answer them. So in this article, I want to try to break this
module apart and look at how it works.
NOTE: To understand the rest of this post, I highly suggest you read one those tutorials to get familiar
with the transformer architecture.
Figure 1 - The Transformer Architecture
But the Transformer architecture ditched the recurrence mechanism in favor of multi-head self-
attention mechanism. Avoiding the RNNs’ method of recurrence will result in massive speed-up in the
training time. And theoretically, it can capture longer dependencies in a sentence.
As each word in a sentence simultaneously flows through the Transformer’s encoder/decoder stack,
The model itself doesn’t have any sense of position/order for each word. Consequently, there’s still the
need for a way to incorporate the order of the words into our model.
One possible solution to give the model some sense of order is to add a piece of information to each
word about its position in the sentence. We call this “piece of information”, the positional encoding.
The first idea that might come to mind is to assign a number to each time-step within the [0, 1] range in
which 0 means the first word and 1 is the last time-step. Could you figure out what kind of issues it
would cause? One of the problems it will introduce is that you can’t figure out how many words are
present within a specific range. In other words, time-step delta doesn’t have consistent meaning across
different sentences.
Another idea is to assign a number to each time-step linearly. That is, the first word is given “1”, the
second word is given “2”, and so on. The problem with this approach is that not only the values could
get quite large, but also our model can face sentences longer than the ones in training. In addition, our
model may not see any sample with one specific length which would hurt generalization of our model.
It should output a unique encoding for each time-step (word’s position in a sentence)
Distance between any two time-steps should be consistent across sentences with different
lengths.
Our model should generalize to longer sentences without any efforts. Its values should be
bounded.
It must be deterministic.
Proposed method
The encoding proposed by the authors is a simple yet genius technique which satisfies all of those
criteria. First of all, it isn’t a single number. Instead, it’s a d-dimensional vector that contains
information about a specific position in a sentence. And secondly, this encoding is not integrated into
the model itself. Instead, this vector is used to equip each word with information about its position in a
sentence. In other words, we enhance the model’s input to inject the order of words.
→
Let t be the desired position in an input sentence, p ∈ R be its corresponding encoding, and d be the
t
d
encoding dimension (where d ≡ 0) Then f : N → R will be the function that produces the output
2
d
→
vector pt and it is defined as follows:
(i)
→ (i)
sin(ωk . t), if i = 2k
pt = f (t) := {
cos(ωk . t), if i = 2k + 1
where
1
ωk =
2k/d
10000
As it can be derived from the function definition, the frequencies are decreasing along the vector
dimension. Thus it forms a geometric progression from 2π to 10000 ⋅ 2π on the wavelengths.
→
You can also imagine the positional embedding pt as a vector containing pairs of sines and cosines for
each frequency (Note that d is divisble by 2):
sin(ω1 . t)
⎡ ⎤
⎢ cos(ω1 . t) ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ sin(ω . t) ⎥
2
⎢ ⎥
⎢ ⎥
⎢ cos(ω2 . t) ⎥
→ ⎢ ⎥
pt = ⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ sin(ωd/2 . t) ⎥
⎣ ⎦
cos(ωd/2 . t)
d×1
The intuition
You may wonder how this combination of sines and cosines could ever represent a position/order? It is
actually quite simple, Suppose you want to represent a number in binary format, how will that be?
0 : 0 0 0 0 8 : 1 0 0 0
1 : 0 0 0 1 9 : 1 0 0 1
2 : 0 0 1 0 10 : 1 0 1 0
3 : 0 0 1 1 11 : 1 0 1 1
4 : 0 1 0 0 12 : 1 1 0 0
5 : 0 1 0 1 13 : 1 1 0 1
6 : 0 1 1 0 14 : 1 1 1 0
7 : 0 1 1 1 15 : 1 1 1 1
You can spot the rate of change between different bits. The LSB bit is alternating on every number, the
second-lowest bit is rotating on every two numbers, and so on.
But using binary values would be a waste of space in the world of floats. So instead, we can use their
float continous counterparts - Sinusoidal functions. Indeed, they are the equivalent to alternating bits.
Moreover, By decreasing their frequencies, we can go from red bits to orange ones.
→
Figure 2 - The 128-dimensional positonal encoding for a sentence with the maximum lenght of 50. Each row represents the embedding vector pt
Other details
Earlier in this post, I mentioned that positional embeddings are used to equip the input words with
their positional information. But how is it done? In fact, the original paper added the positional
encoding on top of the actual embeddings. That is for every word w in a sentence [w , . . . w ],
t 1 n
′
→
ψ (wt ) = ψ(wt ) + pt
To make this summation possible, we keep the positional embedding’s dimension equal to the word
embeddings’ dimension i.e. d = d
word embedding postional embedding
Relative Positioning
Another characteristic of sinusoidal positional encoding is that it allows the model to attend relative
positions effortlessly. Here is a quote from the original paper:
We chose this function because we hypothesized it would allow the model to easily learn to attend by
relative positions, since for any fixed offset k, PEpos+k can be represented as a linear function of PEpos.
But why does this statement hold? To fully understand why, please refer to this great article to read the
detailed proof. However I’ve prepared a shorter version here.
Proof:
By applying the addition theorem, we can expand the right hand side as follows:
u1 = cos(ωk . ϕ) v1 = sin(ωk . ϕ)
u2 = − sin(ωk . ϕ) v2 = cos(ωk . ϕ)
So the final transformation matrix M is:
cos(ωk . ϕ) sin(ωk . ϕ)
Mϕ,k = [ ]
− sin(ωk . ϕ) cos(ωk . ϕ)
As you can see, the final transformation does not depend on t. Note that one can find the matrix M
very similar to the rotation matrix.
→
Similarly, we can find M for other sine-cosine pairs, which eventually allows us to represent p t+ϕ as a
→
linear function of p for any fixed offset ϕ. This property, makes it easy for the model to learn to attend
t
by relative positions.
Another property of sinusoidal position encoding is that the distance between neighboring time-steps
are symmetrical and decays nicely with time.
FAQ
Why positional embeddings are summed with word embeddings instead of
concatenation?
I couldn’t find any theoretical reason for this question. Since summation (in contrast to concatenation)
saves the model’s parameters, it is reasonable to reform the initial question to “Does adding the
positional embeddings to words have any disadvantages?”. I would say, not necessarily!
Initially, if we pay attention to the figure 2, we will find out that only the first few dimensions of the
whole embedding are used to store the information about the positions (Note that the reported
embedding dimension is 512 despite our small toy example). And since the embeddings in the
Transfomer are trained from scratch, the parameters are probably set in a way that the semantic of
words does not get stored in the first few dimensions to avoid interfering with the positional encoding.
With the same reason, I think the final Transformer can separate the semantic of words from their
positional information. Moreover, there is no reason to consider the separability as an advantage.
Maybe the summation provides a good source of feature for the model to learn from.
For more information, I recommend you to check these links: link 1, link 2.
Doesn't the position information get vanished once it reaches the upper layers?
Fortunately, the Transformer architecture is equipped with residual connections. Therefore the
information from the input of the model (which contains positional embeddings) can efficiently
propagate to further layers where the more complex interactions are handled.
Summary
Thank you for staying with me until the end of this article. I hope you’ve found this useful for answering
your question. Please feel free to provide any corrections or feedbacks, the comment section is at your
disposal.
Cited as
@article{kazemnejad2019:pencoding,
title = "Transformer Architecture: The Positional Encoding",
author = "Kazemnejad, Amirhossein",
journal = "[Link]",
year = "2019",
url = "[Link]
}
References
The Illustrated Transformer
Attention Is All You Need - The Transformer
Linear Relationships in the Transformer’s Positional Encoding
position_encoding.ipynb
Tensor2Tensor Github issue #1591
Reddit thread - Positional Encoding in Transformer
Reddit thread - Positional Encoding in Transformer model
Secure Your Child’s Future with
Planet Spark Learn More
Sponsored
Neurologist: 97% of People With Neuropathy Don't Know This Crucial Thing
Nerve Relief Read More
67 Comments
1 Login
Name
J
Jiaxuan Wang
5 years ago
The information is great! I think a more intuitive explanation of positional embedding is to think about it as a clock (as cos and sin
are just concept from unit circle). Every two dimension of the positional embedding just specifies one of the clock's hand (the hour
hand, the minute hand, the second hand, for example). Then moving from one position to the next position is just rotating those
hands at different frequencies. Thus, without formal proof, it immediately tells you why a rotation matrix exist.
26 1 Reply Share ›
J
Jiaxuan Wang 5 years ago
− ⚑
🏆 Featured by [Link]
The information is great! I think a more intuitive explanation of positional embedding is to think about it as a clock (as cos and sin
are just concept from unit circle). Every two dimension of the positional embedding just specifies one of the clock's hand (the hour
hand, the minute hand, the second hand, for example). Then moving from one position to the next position is just rotating those
hands at different frequencies. Thus, without formal proof, it immediately tells you why a rotation matrix exist.
26 1 Reply Share ›
Can you please explain it with help of a diagram? I am not able to visualise it.
1 0 Reply Share ›
A
Abhishek Kumar Dubey > Jiaxuan Wang − ⚑
5 years ago
what happens when the clock hand rolls over, In that case will 2 position will have same encoding?
1 0 Reply Share ›
I think you misunderstood Jiaxuan Wang's explanation. Let's use K=4 as the embedding dimension for easy
explanation. What the user is saying is that -
a. with the PE in the form of [s1, s2, s3, s4], s1 and s3 are sines, while s2 and s4 are cosines (stating the obvious).
b. s1 and s2 are hour and minute hands in one clock; s3 and s4 are in another. Let's say we are in a 24 hour
timeframe, s1 and s2 are hours and minutes in the first 12 hours, while s3 and s4 are in the second 12 hours.
c. what the blog author demonstrated is that each point in time within that 24 hours can be expressed with these
two clocks combined. Distance in time between any two positions are independent of where those positions are in
actual time.
I honestly don't think we need to borrow the clock metaphor. If you look at the code implementation of PE
(search for 'Jay Alammar illustrated transformer'), latent dimension K dictates the angle rate of rotation, so it's
fixed as long as we don't change K. It has nothing to do with sequence length. Each position is then represented
as a vector in the same K dimension as the word embedding (as the author pointed out, for summation
purposes). The distance between any two positions is degree of counter-clock wise rotations from the beginning
positions to the ending position. Also in this implementation, the geometric progress is from 0 to 9999 * 2(pi),
instead of 2(pi) to 10000*2(pi). If the latent dimension K exceeds 10,000, we may have to update the angle rate
calculation. But that's rarely necessary in real life application
1 0 Reply Share ›
B
Blazej Fiderek > Abhishek Kumar Dubey − ⚑
5 years ago edited
This kind of behaviour is preserved by having large value (namely 10000) in denominator for the biggest w_k
p y g g y gg
(omega). It ensures that for each possible time-step (word position in sequence), representation will differ at
least at one position in the embedding vector. Rolling over of clock could occur if you had 10000 in the
denominator of omega, but number of time-steps, that is number of words in sequence will be higher, say 12000.
0 0 Reply Share ›
No it won't the smallest w_k (namely 1/10000) the frequency is 10000×2pi which is bigger than 12000
and even if it was bigger than 10000×2pi say 80000 sin(80000/10000) = sin(8) = sin(8-2pi) and since
8-2pi isn't rational it can't be any of the PE of previous positions
0 0 Reply Share ›
I believe it's not about the *exact* equality of embedding vectors but being *approximately* equal. E.g.
in your example, specific value of `sin(80000/10000)`, indeed, would have not been seen before, but
`sin(17681/10000)` is equal to it when rounded to 2 decimal places.
1 0 Reply Share ›
P
Philip Voinea > Yacine Fitta
− ⚑
a year ago
@Yacine Fitta The transformer can only be so sensitive to small changes in vectors. The difference
between sin(80,000/10,000) and sin(17,168/10,000), for example, while not exactly 0, is imperceptible
to the transformer because such a tiny difference is at the scale of noise.
0 0 Reply Share ›
Yes, this is a really elegent intuition. I've featured your comment so that it can help others.
1 0 Reply Share ›
Actually, the binary bits change is equal to the clock assumption. In binary bits, the last position changes most
frequently which is equal to the second hand.
0 0 Reply Share ›
katy sei − ⚑
5 years ago
Yacine Fitta − ⚑
4 years ago edited
Why do we need the whole d-dimentional PE? Doesn't a single pair suffice. They can be linearly transformed to any other PE, and they
provide a unique value for any position i (since pi is a transandental number). It is mentioned in the end that ots because we need to
add the word embeddings to the PEs but why add and not just concatenate? Or why not just add it to the first 2 dimensions?
3 0 Reply Share ›
Matt Hill − ⚑
5 years ago
>we will find out that only the first few dimensions of the whole embedding are used to store the information about the positions
This sort of makes sense, in that the positional encoding varies little in the higher dimensions. However, it does alternate between 0 and
1, so when added to the semantic embedding, it seems like it would disrupt / distort the semantic information in half the positions
significantly? Why does this not happen?
2 0 Reply Share ›
D
disqus_0WriWs1f0L − ⚑
5 years ago
I have some concern regarding how it effective it would be when we do the positional encoding by adding the positional vector to the
word embedding vector I'm not sure if I got this correctly I understand the position encoding itself has all these great properties But
word embedding vector. I m not sure if I got this correctly. I understand the position encoding itself has all these great properties. But
when it's added with the word vector, it seems the positional information would be mixed with the word embedding information that I
am not sure whether the model could distinguish them. For example, if a word vector is [0.2, -0.1, 0.9, 0.03], with positional vector [0,
1, 0, 1], and another word vector is [-0.5, 0.7, 0.8, 0.7], with positional vector [0.7, 0.2, 0.1, 0.33]. Then their sum vector would be the
same. In this case how does the model tell the differences from their words and positions? Thanks a lot for your time!
2 0 Reply Share ›
Riccardo Di Sipio − ⚑
5 years ago
Hi,
great article! Having a background in physics, I noted that there is a certain similarity between the position encoding and the Fourier
transform. You would get different FT's for inputs that differ only in the order of the elements, e.g.:
>>> [Link](x)
<[Link]: shape="(4,)," dtype="complex64," numpy="array([12665.999" +6.1035156e-05j,="" -3108.4998+2.1260920e+03j,="" -3187.4995-
2.1901782e+03j,="" 532.0007+1.2207031e-04j],="" dtype="complex64)">
>>> [Link](y)
<[Link]: shape="(4,)," dtype="complex64," numpy="array([12666." -3.6621094e-04j,="" -630.4998+2.1260923e+03j,=""
-3187.4995+2.1018433e+03j,="" -4423.9995-2.4414062e-04j],="" dtype="complex64)">
And of course, the inverse FT would give you back the original sequence (modulo some rounding due to int <-> float conversion).
Are you aware of any attempt at using such encoding? Incidentally, the FT of a sequence of n real numbers is a sequence of 2 complex
numbers, i.e. 2n real numbers that can be seen as the cos and sin projections of a vector in the Complex plane.
2 0 Reply Share ›
A
Anish Tondwalkar > Riccardo Di Sipio − ⚑
5 years ago
The input is 2-d: one time dimension (position in the sentence) and one _semantic_ dimension (the word embedding)
What we're doing here is explicitly including the Fourier transform of the position (remember, the FT of $\delta(\omega - t)$
is $sin(\omega t) + i cos(\omega t)$ )
I'm not sure what it would mean to take a FT in the semantic dimension.
0 0 Reply Share ›
Hi,
Thank you so much for your comment.
Wow, that's a very interesting observation. I have not actually seen someone following this path. But, I found this paper some
time ago that tries to provide a more general framework for positional encodings. I think it can be helpful
([Link]
Regarding the use of FT in positional encoding, I think there are some gray areas that need more discussion. For example, I
am not sure whether a NN can infer about positions only by seeing a single FT sequence. I think (please correct me if I'm
wrong) the order information is only available when there are multiple FT sequences in comparison to each other. Moreover,
the design of a transformer stack that uses an FT sequence is debatable. That is, the stack needs to know about the position t.
How can we extract such information from an FT sequence? (Maybe, we can feed an FT input that is calculated from a
manipulated original sequence based on t, such as removing the word at t). Additionally, what kinds of benefits it will bring in
terms of relative positioning (Maybe, some particular mathematic property).
0 0 Reply Share ›
J
Jauhar − ⚑
3 months ago
This is great. Thank you for the explanation. Just a small query however, since P_t is a vector with sin and cosine values, both
dependent on t, and since sin and cosine are periodic functions, wouldn't the vectors be the same for many different ts?
1 0 Reply Share ›
R rinku jadhav − ⚑
2 years ago
You say that the naïve idea of providing position as 1/n, 2/n, 3/n....k/n....n/n is not feasible because n (sequence length) varies over
examples. But, practically, transformers have an upper bound of how many tokens they can handle (typically 512 or 1024) so we could
just fix the value of n to be either that limit or that limit times some constant so that the largest value is not 1.
I think a potential problem (maybe) then would be that transformer would need to learn that the extra value provided is not an
extension of embedding but rather a position token.
1 0 Reply Share ›
Kiers − ⚑
5 years ago
1 0 Reply Share ›
Artur Tanona − ⚑
5 years ago
1 0 Reply Share ›
1 0 Reply Share ›
Z
Zephyr − ⚑
5 years ago edited
Thanks for the blog. I could not understand the sentence "But using binary values would be a waste of space in the world of floats." Is it
because integer is 4 bytes and binary values take only ones and zeroes? If that's the case, we can change the data type to "bit" right?
1 0 Reply Share ›
I'm really sorry for the late reply! Yes, my statement can be inaccurate if we go into exact details. At the intuition level, I meant
that we are not forced to use the Square wave (binary values). Moreover, word embeddings are represented using float32.
Therefore, we have to always cast the hypothetical bit-represented positional encoding to float32 before calculating the sum.
From another point of view, if we represent the positional encoding in integers, then we have to go in discrete values for
incrementing the wavelength, which is not the case for floats since a finer change in values can be detected by neural networks.
Thus, it would be an under usage of floats' potential.
I hope I could clarify that. Please feel free to drop a comment if you have further questions.
1 0 Reply Share ›
S soarer − ⚑
9 months ago
I don't understand why your first idea (assigning 0/n, 1/n, ...) doesn't work. Transformer models need to have a fixed context length
size, and shorter input has to be padded anyway, so as long as the time-step size is 1/n where n is context length size, the problem of
inconsistency across sentence isn't a real issue. This scheme clearly preserves relative distance as well. Are there other issues that you
can see? Thanks!
0 0 Reply Share ›
M
Mike Gates − ⚑
a year ago
Here, I have a supplement that could add to the article because the author is very modest and hides some explanations.
The positional information is added to its corresponding token embedding, quite naturally. The whole network is a Neural Network,
each layer of which accepts its input linearly, multiplying input by weights. So the network can easily "decompose" and "understand" the
(linear) sum of positional vector and embedding vector, through millions of repeated inputs. If the two were concatenated, the NN
would have had to spend multiple layers to understand the positions. The linear relationship between positions, the linear addition of
positional vector into embedding vector, (inherently linear) Neural networks, and the residual connections are a good combination that
allows us to get rid of RNNs in sequence-to-sequence learning. That's my intuition.
0 0 Reply Share ›
Aryan Pandey − ⚑
a year ago
Thank you, got here from a reddit thread from 4-5yrs back.
Been trying to understand attention from over 4months, but been pushing myself from the past few weeks to finally understand this in
depth. This article cleared a lot of stuff, just like reading about Xavier initialization feels after understanding Basics of Neural Networks
!
R l Sh
0 0 Reply Share ›
WIlfo − ⚑
a year ago
0 0 Reply Share ›
D
David Ireland − ⚑
a year ago
Looks like there are some rendering errors with the equations as of 2024?
0 0 Reply Share ›
S
sirk390 − ⚑
2 years ago
Thanks for the greate article. Just one comment: You say "But using binary values would be a waste of space in the world of floats. " .
Are you sure this is correct? For me it doesn't seem to waste more space. Is the real reason not the linerarity explained later? "we
hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, PEpos+k can be
represented as a linear function of PEpos"
0 0 Reply Share ›
MichaelSB − ⚑
2 years ago
Very nice explanation, thank you! One question - why not represent positional encoding using binary vectors? For example, concatenate
16 elements to an input vector, where each of those elements is 0 or 1. Such positional encoding represents a 16 bit integer (0-65,535),
where each vector element is 0 or 1. We avoid the problem of having large numerical values in our input embedding, and the number of
positions that can be represented is bounded.
One potential problem I see is the weights corresponding to these 16 vector positions might end up to be increasing proportionally (e.g.
weight for the MSB position could be 65k times larger than the weight corresponding to LSB position), so instead of large input values
we could end up with large weight values (and require a large dynamic range for the weights, which means more precision). But the
same problem should affect the sin/cosine representation, right?
0 0 Reply Share ›
Boris Burkov − ⚑
3 years ago
Fourier transform is ubiquitous, but I have a "theory" that angle encoding in quantum machine learning could've been the source of
inspiration for positional encoding: [Link]
0 0 Reply Share ›
M Mahdi Amrollahi − ⚑
3 years ago
That was so Helpful to me, I used to have several questions about why this kind of encoding works. Thanks Amirhossein...
0 0 Reply Share ›
A
Amin Mansouri − ⚑
3 years ago
0 0 Reply Share ›
Lawrence − ⚑
4 years ago
0 0 Reply Share ›
It means "modulo 2". So "d ≡2 0" means d = 0 modulo 2, i.e. that d is even. :)
0 0 Reply Share ›
ypocat − ⚑
4 years ago
D
Diego Quintana − ⚑
4 years ago edited
Hey great article, thanks! How did you produce the last plot for the distances? Can you show the code used? Thank you!
0 0 Reply Share ›
S
sober reflection − ⚑
4 years ago
If we want to encoding the position information is not sequence, how can we do ? For example, we want to use sin/cos to represent the
relative position of image pixels, position embedding may not friendly to space postion, because left bottom, right bottom, left up, right
up, center has strong correlation.
0 0 Reply Share ›
Ben − ⚑
4 years ago
Thanks for your insight on why they use summation versus concatenation! Had also been wondering about that myself.
0 0 Reply Share ›
S
s.i. − ⚑
4 years ago
I dont understand how the vector p(t) starts with the first term being sine and then cosine.
Making k=1 means i=2 so what about indices i=0 and i=1?
Shouldn't things start from k=0? And if they do then they won't neatly go to k=d/2 rather k=d/2 - 1 unless... we start not with i=0 but
i=1 and take the first term to be the cosine.
0 0 Reply Share ›
M
Melvin > s.i.
− ⚑
4 years ago
If you start with i=2, your last term in the p(t) vector will be with i=d+1 since you need a d-dimensional vector, and you'll get
the same vector as in the article.
Whether you start with i=0 or i=1 or whatever the value is, you get a unique encoding (p(t)) for each word (t) and that's the
whole point of positional embedding.
1 0 Reply Share ›
Leap of Faith − ⚑
5 years ago
Why could this linear property makes it easy for the model to learn to attend ?
0 0 Reply Share ›
Please refer to my answer to Rajesh R's comment. [Link] and let me know if you have any further questions.
0 0 Reply Share ›
M
mb19029 − ⚑
5 years ago
Can you please fix the table mapping numbers 0..15 to their binary representation by replacing the second 2 with the intended 10?
0 0 Reply Share ›
0 0 Reply Share ›
K
kamal fara − ⚑
5 years ago
Dear Amirhossein ,Thanks for the blog. can you elaborate on "This property, makes it easy for the model to learn to attend by relative
positions"?
0 0 Reply Share ›
A i h i K j d M d >k lf ⚑
Amirhossein Kazemnejad Mod > kamal fara − ⚑
5 years ago
This is an interesting question. Please refer to my answer to Rajesh R's comment. [Link]
0 0 Reply Share ›
J Jimmy6 − ⚑
5 years ago
0 0 Reply Share ›
K
kamatikos − ⚑
5 years ago
Good explanation that addresses a lot of the questions that were left out of basically all other breakdowns. Thank you.
0 0 Reply Share ›
Neurologist: 97% of People With Neuropathy Don't Know This Crucial Thing
Nerve Relief