Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Conversation

@cjolivier01
Copy link
Member

@cjolivier01 cjolivier01 commented Oct 18, 2017

Description

Optimizations to set_to, copy, fill, full ops

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • For user-facing API changes, API doc string has been updated.
  • To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Intersting edge cases to note here

@szha
Copy link
Member

szha commented Oct 18, 2017

Is it intended to only support fill for zero and one going forward?

@cjolivier01
Copy link
Member Author

I don't understand your question. Which item are you referring to and what gives you that assumption?

@szha
Copy link
Member

szha commented Oct 18, 2017

Having the value to fill as the template parameter.

@cjolivier01
Copy link
Member Author

There's more than one thing you could be referring to in this PR. There's a FillCompute() and a set_to template, which are somewhat independent. Currently, the two use-cases are zero and one for the fill value for both of these. This does not mean that "going forward" another value can't be used if the need arises.

@szha
Copy link
Member

szha commented Oct 18, 2017

There's an op in NDArray which can benefit from fill https://mxnet.incubator.apache.org/versions/master/api/python/ndarray.html?highlight=full#mxnet.ndarray.full. Currently it is done in the frontend through empty and in-place assignment. And this op is not included in symbol (though it's doable through ones and multiply). It's worth considering the full use case in this PR since doing it with fill will be faster, though it wouldn't be compatible with the current template implementation of set_to.

@cjolivier01
Copy link
Member Author

Making a fill with some runtime-determined value such as OpBase::SetToScalar is a trivial addition. For filling with a predefined constant scalar such as zeroes and ones, using an immediate such as with set_to is faster.

@szha
Copy link
Member

szha commented Oct 18, 2017

Agreed. I'm basically asking to keep both fill and set_to.

@cjolivier01
Copy link
Member Author

You can re-add it if you need it sometime, although it's better to use OpBase::SetToScalar or op_with_req<mshadow_op::identity, Req> (op_with_req override Map() for setting a scalar is in a separate PR) because those properly handle Req

@szha
Copy link
Member

szha commented Oct 18, 2017

As long as full is still on the radar it's fine. Would you make the change to properly support full in that PR then?

if (req[0] != kNullOp) {
mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
MSHADOW_TYPE_SWITCH(outputs[0].type_flag_, DType, {
mxnet_op::Kernel<mxnet_op::set_to<value>, xpu>::Launch(s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if req[0] is kAddTo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

op_with_req<> wrapper will handle this along with the other changes

const size_t size = outputs[0].Size();
if (size) {
MSHADOW_TYPE_SWITCH(outputs[0].type_flag_, DType, {
memset(outputs[0].dptr<DType>(), 0, size * sizeof(DType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputs[0].dptr_ is more efficient here than outputs[0].dptr<DType>().
Question: How much faster is this compared to the original implementation of filling up an TBlob?

Copy link
Member Author

@cjolivier01 cjolivier01 Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot faster. assembly for memset looks something like:
; for 32 bit, load each eax register register byte with destination value
mov eax,
shl eax, 8
or al, ah
shl eax, 8
or al, ah
shl eax, 8
mov edi, [pointer]
mov ecx, [size]
shr ecx, 2
rep stosd ; clock cycle count, size/4 * https://web.itu.edu.tr/kesgin/mul06/intel/instr/stos.html
mov ecx, [size]
and ecx, 3
rep stosb ; clock cycle count, (size & 3 ) * ttps://web.itu.edu.tr/kesgin/mul06/intel/instr/stos.html

This is done without any change in code execution path (jmp, jle, jge, jz, etc) or compares
It is also very cache-friendly since it is perfectly sequential.

@cjolivier01
Copy link
Member Author

cjolivier01 commented Oct 19, 2017

'full' on the radar? that struct isn't used. if someone wants to use it, they can add it. I am not going to add an unused kernel. OpBase::set_to_scalar is available.

@cjolivier01
Copy link
Member Author

Ok @szha and I spoke offline. I added the new operator _full to be used for ndarray.full() function.
It is tested in test_ndarray.test_outputs()

@szha
Copy link
Member

szha commented Oct 22, 2017

@cjolivier01
Copy link
Member Author

done

@cjolivier01 cjolivier01 changed the title Fill optimizations Optimizations to set_to, copy, fill ops Oct 23, 2017
cjolivier01 and others added 6 commits October 23, 2017 10:33
…che#8363)

* Timing output for test_factorization_module when Verbose enabled

* Trigger build

* Trigger build

* Trigger build

* Misc fixes for sparse distributed training (apache#8345)

* remove mshadow::range in init_op.h

* add unit test

* remove pass by ptr, add unit test for pull empty wieghts

* fix range in key partition

* remove wrong comment

* remove change for partition

* remove unused var

* add int64 to arange. add checkpointing example

* Fix the Readme (apache#8369)

* Allow test to converge (apache#8351)

* Allow test to converge

* Trigger build

* Trigger build

* Trigger build

* Update cudnn_algoreg-inl.h (apache#7988)

* [Perl] emulate Python zip() for Perl (apache#8192)

* [Perl] emulate Python zip() for Perl

* [Perl] retool zip() uses away from the callback form

* add profile option for frontend profiling to image script (apache#8171)

* add profile option for frontend profiling to image script

* Update image_classification.py

* Update image_classification.py

* Fix Typo (classification) (apache#8376)

Fix a typo in the example readme.
* CPU optimization for ActivationOp

Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass).
Very slight improvement on GPU.

OLD MSHADOW APPROACH
--------------------

CPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU:  Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU:  Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU:  Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU:  Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU:  Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes

GPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU:  Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU:  Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU:  Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU:  Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU:  Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes

NEW MXNET_OP APPROACH
---------------------

CPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU:  Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU:  Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU:  Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU:  Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU:  Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes

GPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU:  Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU:  Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU:  Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU:  Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU:  Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes

* lint

* Trigger build

* Trigger build

* Negative begin and end support for csr slice (apache#8241)

* negative index support for sparse slice

* fix lint

* getitem(int) for csr ndarray, support a[-1]

* remove unneccessary argument

* unittest and doc update

* Preparing for 0.12.0.rc0: Final changes before RC (apache#8301)

* Final changes before RC

* Updates to NEWS.md

* Updates

* Enable smoothing in softmax operator (apache#8125)

* v0.12 regression: Fix registration of children for Block (apache#8277)

* Fix Block not registering children

If the attribute was already set to something different than Block (e.g. None),
it was not being registered.

* fix if / elif for block children registration

* trigger test

* Add fix from apache#8152

* Add tests from apache#8152

* Revert "[CMAKE] Fix windows cmake build" (apache#8311)

* Revert "Added my code signing key (apache#8293)"

This reverts commit 22ab185.

* Revert "[CMAKE] Fix windows cmake build (apache#8227)"

This reverts commit 1c1c788.

* fixed broken links. https was pointing to http for mxnet.io (apache#8300)

* Update rnn.md (apache#8320)

* fluent methods for missed ops (apache#8329)

* update ps lite (apache#8327)

* Fix unused type warning (apache#8316)

* Trigger build

* Trigger build

* Misc fixes for sparse distributed training (apache#8345)

* remove mshadow::range in init_op.h

* add unit test

* remove pass by ptr, add unit test for pull empty wieghts

* fix range in key partition

* remove wrong comment

* remove change for partition

* remove unused var

* add int64 to arange. add checkpointing example

* Fix the Readme (apache#8369)

* Allow test to converge (apache#8351)

* Allow test to converge

* Trigger build

* Trigger build

* Trigger build

* Update cudnn_algoreg-inl.h (apache#7988)

* [Perl] emulate Python zip() for Perl (apache#8192)

* [Perl] emulate Python zip() for Perl

* [Perl] retool zip() uses away from the callback form

* add profile option for frontend profiling to image script (apache#8171)

* add profile option for frontend profiling to image script

* Update image_classification.py

* Update image_classification.py

* Fix Typo (classification) (apache#8376)

Fix a typo in the example readme.
@cjolivier01
Copy link
Member Author

Any further comments on this, or everyone ok with me merging once it passes CI?

if ctx is None:
ctx = Context.default_ctx
dtype = mx_real_t if dtype is None else dtype
out = _internal._full(shape=shape, ctx=ctx, dtype=dtype, value=val, out=out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about change setitem to use full instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is setitem? Is that python code? I’m not especially familiar with the python parts.

Copy link
Member Author

@cjolivier01 cjolivier01 Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at __setitem__ in ndarray.py. It is not clear how full() would work there. Can you elaborate on what you have in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would be replacing some value assignment functions used in __setitem__ with _full.

In order to avoid code conflict, I can make that change since I am working on __setitem__ right now for advanced indexing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reminisce remember to change this back to out[:] = val after you've made the change

@cjolivier01
Copy link
Member Author

Going to close and reopen to see if Jenkins starts to see this PR

@cjolivier01 cjolivier01 reopened this Oct 27, 2017
@cjolivier01
Copy link
Member Author

That worked :)

@cjolivier01 cjolivier01 merged commit 3552b95 into apache:master Oct 28, 2017
@cjolivier01 cjolivier01 deleted the eric_perf branch October 28, 2017 01:50
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* Fill optimizations

* Optimize IdentityCompute for CPU

* lint

* Fix unused type warning (apache#8316)

* remove unused variable

* CR comments

* CR comments

* Added _full operator

* Trigger build

* Trigger build

* Add _full to symbolic

* Merge conflict resolution fix

* lint

* Timing output for test_factorization_module when Verbose enabled (apache#8363)

* Timing output for test_factorization_module when Verbose enabled

* Trigger build

* Trigger build

* Trigger build

* Misc fixes for sparse distributed training (apache#8345)

* remove mshadow::range in init_op.h

* add unit test

* remove pass by ptr, add unit test for pull empty wieghts

* fix range in key partition

* remove wrong comment

* remove change for partition

* remove unused var

* add int64 to arange. add checkpointing example

* Fix the Readme (apache#8369)

* Allow test to converge (apache#8351)

* Allow test to converge

* Trigger build

* Trigger build

* Trigger build

* Update cudnn_algoreg-inl.h (apache#7988)

* [Perl] emulate Python zip() for Perl (apache#8192)

* [Perl] emulate Python zip() for Perl

* [Perl] retool zip() uses away from the callback form

* add profile option for frontend profiling to image script (apache#8171)

* add profile option for frontend profiling to image script

* Update image_classification.py

* Update image_classification.py

* Fix Typo (classification) (apache#8376)

Fix a typo in the example readme.

* Use omp_get_max_threads() when OMP_NUM_THREADS environment variable is set (apache#8379)

* CPU optimization for ActivationOp (apache#8296)

* CPU optimization for ActivationOp

Significant improvement on CPU (several magnitudes of order in some cases, especially on backward pass).
Very slight improvement on GPU.

OLD MSHADOW APPROACH
--------------------

CPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU:  Timing [Forward] 18.948 ms, avg: 0.037896 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 1.658 ms, avg: 0.003316 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU:  Timing [Forward] 57.973 ms, avg: 0.115946 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 4.748 ms, avg: 0.009496 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU:  Timing [Forward] 703.446 ms, avg: 1.40689 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 56.255 ms, avg: 0.11251 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU:  Timing [Forward] 2107.77 ms, avg: 4.21554 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 168.483 ms, avg: 0.336966 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU:  Timing [Forward] 24122.2 ms, avg: 48.2443 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 1908.7 ms, avg: 3.8174 ms X 500 passes

GPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU:  Timing [Forward] 1.637 ms, avg: 0.003274 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.665 ms, avg: 0.00333 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU:  Timing [Forward] 1.562 ms, avg: 0.003124 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.661 ms, avg: 0.003322 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU:  Timing [Forward] 1.635 ms, avg: 0.00327 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.702 ms, avg: 0.003404 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU:  Timing [Forward] 1.83 ms, avg: 0.00366 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 2.041 ms, avg: 0.004082 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU:  Timing [Forward] 2.08 ms, avg: 0.00416 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 2.688 ms, avg: 0.005376 ms X 500 passes

NEW MXNET_OP APPROACH
---------------------

CPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator CPU:  Timing [Forward] 80.748 ms, avg: 0.161496 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 1.176 ms, avg: 0.002352 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator CPU:  Timing [Forward] 7.881 ms, avg: 0.015762 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 2.181 ms, avg: 0.004362 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator CPU:  Timing [Forward] 111.48 ms, avg: 0.22296 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 5.408 ms, avg: 0.010816 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator CPU:  Timing [Forward] 333.439 ms, avg: 0.666878 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 21.331 ms, avg: 0.042662 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator CPU:  Timing [Forward] 3429.19 ms, avg: 6.85837 ms X 500 passes
Activation Operator CPU:  Timing [Backward] 286.324 ms, avg: 0.572648 ms X 500 passes

GPU
===

Timing: 50 iterations of 10 calls, shape = [1,1,28,28]
Activation Operator GPU:  Timing [Forward] 1.618 ms, avg: 0.003236 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.671 ms, avg: 0.003342 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [1,3,28,28]
Activation Operator GPU:  Timing [Forward] 1.629 ms, avg: 0.003258 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.728 ms, avg: 0.003456 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,1,18,32]
Activation Operator GPU:  Timing [Forward] 1.753 ms, avg: 0.003506 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.756 ms, avg: 0.003512 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [50,3,18,32]
Activation Operator GPU:  Timing [Forward] 1.704 ms, avg: 0.003408 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 1.791 ms, avg: 0.003582 ms X 500 passes

Timing: 50 iterations of 10 calls, shape = [20,3,128,128]
Activation Operator GPU:  Timing [Forward] 2.032 ms, avg: 0.004064 ms X 500 passes
Activation Operator GPU:  Timing [Backward] 2.143 ms, avg: 0.004286 ms X 500 passes

* lint

* Trigger build

* Trigger build

* Negative begin and end support for csr slice (apache#8241)

* negative index support for sparse slice

* fix lint

* getitem(int) for csr ndarray, support a[-1]

* remove unneccessary argument

* unittest and doc update

* Preparing for 0.12.0.rc0: Final changes before RC (apache#8301)

* Final changes before RC

* Updates to NEWS.md

* Updates

* Enable smoothing in softmax operator (apache#8125)

* v0.12 regression: Fix registration of children for Block (apache#8277)

* Fix Block not registering children

If the attribute was already set to something different than Block (e.g. None),
it was not being registered.

* fix if / elif for block children registration

* trigger test

* Add fix from apache#8152

* Add tests from apache#8152

* Revert "[CMAKE] Fix windows cmake build" (apache#8311)

* Revert "Added my code signing key (apache#8293)"

This reverts commit 22ab185.

* Revert "[CMAKE] Fix windows cmake build (apache#8227)"

This reverts commit 1c1c788.

* fixed broken links. https was pointing to http for mxnet.io (apache#8300)

* Update rnn.md (apache#8320)

* fluent methods for missed ops (apache#8329)

* update ps lite (apache#8327)

* Fix unused type warning (apache#8316)

* Trigger build

* Trigger build

* Misc fixes for sparse distributed training (apache#8345)

* remove mshadow::range in init_op.h

* add unit test

* remove pass by ptr, add unit test for pull empty wieghts

* fix range in key partition

* remove wrong comment

* remove change for partition

* remove unused var

* add int64 to arange. add checkpointing example

* Fix the Readme (apache#8369)

* Allow test to converge (apache#8351)

* Allow test to converge

* Trigger build

* Trigger build

* Trigger build

* Update cudnn_algoreg-inl.h (apache#7988)

* [Perl] emulate Python zip() for Perl (apache#8192)

* [Perl] emulate Python zip() for Perl

* [Perl] retool zip() uses away from the callback form

* add profile option for frontend profiling to image script (apache#8171)

* add profile option for frontend profiling to image script

* Update image_classification.py

* Update image_classification.py

* Fix Typo (classification) (apache#8376)

Fix a typo in the example readme.

* Fix GPU copy

* Remove duplicate

* Trigger build
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants