Skip to content

Conversation

@mysablehats
Copy link
Owner

No description provided.

zou3519 and others added 30 commits September 25, 2019 07:04
…26563)

Summary:
Pull Request resolved: #26563

This adds name inference rules for pre-existing logsumexp, mode,
kthvalue, and median ops. Also adds overloads so that they can take
`Dimname` dimensions.

There are a lot of min/max overloads. This PR adds name inference to
the following overloads for (both) min and max:
- min(Tensor, int dim)
- min(Tensor, Dimname dim)
- min(Tensor)  (full reduction)

Test Plan: - new tests and [namedtensor ci]

Differential Revision: D17557050

Pulled By: zou3519

fbshipit-source-id: a099a0ef04ad90d021a38a0668fc44902e1c7171
Summary:
Pull Request resolved: #25914

Signed-off-by: Edward Z. Yang <[email protected]>

Test Plan: Imported from OSS

Differential Revision: D17284083

Pulled By: ezyang

fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec
Summary:
Pull Request resolved: #26118

Signed-off-by: Edward Z. Yang <[email protected]>

Test Plan: Imported from OSS

Differential Revision: D17404367

Pulled By: ezyang

fbshipit-source-id: 14a16baa4b59f97182725092531a54603f3d92b8
Summary:
Pull Request resolved: #26360

This is not just for aesthetics: this include blocks the inclusion
of headers like ivalue.h from ATenDispatch.h (as it causes an
include cycle.)

Signed-off-by: Edward Z. Yang <[email protected]>

Test Plan: Imported from OSS

Differential Revision: D17429163

Pulled By: ezyang

fbshipit-source-id: 03feb210c12bc891d95bbb5a11ffd694ec05005c
Summary:
Pull Request resolved: #26718

Signed-off-by: Edward Z. Yang <[email protected]>

Test Plan: Imported from OSS

Differential Revision: D17549623

Pulled By: ezyang

fbshipit-source-id: 8880c09d85a15b2a63dcf0c242ba6a2dd941decb
Summary:
GitHub commits:

facebook/litho@6668c21
pytorch/FBGEMM@189aebb

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: f2037290b58ac295eeb94626e172491a8526875d
Test Plan: revert-hammer

Differential Revision:
D17549623

Original commit changeset: 8880c09d85a1

fbshipit-source-id: 002bb1173dbcf6a1d18e1c4b84b4365f145c38dd
Summary:
Resubmit of #25980.
Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles.
We can still manage to get it work with tarfile, but let's delay it when there's an ask.
Pull Request resolved: #26723

Differential Revision: D17551795

Pulled By: ailzhang

fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e
Summary:
This reset the sleef submodule to upstream, since everything else except
a small build sanity fix
<zdevito/sleef@191f655>
has been merged to upstream. The new release includes an important fix
for trigonometric functions on MacOS, which would unblock #26431.

This should supersede #20536.

Close #20536.

cc colesbury resistor
Pull Request resolved: #26749

Differential Revision: D17572783

Pulled By: ezyang

fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc
Summary:
GitHub commits:

facebook/litho@5096b0a
facebook/proxygen@ecd6c10
facebook/mvfst@67abe5d
facebookarchive/profilo@90580f7
facebookresearch/PyTorch-BigGraph@7f98961
pytorch/FBGEMM@f8da6e6

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 60ce61531cf6d4ac8616b3986b40b423abc7de15
Summary:
Pull Request resolved: #26773

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17563673

fbshipit-source-id: 5a6fb4238b6886695c2d25db11fec22ebe5d0c08
Summary: Pull Request resolved: #25397

Differential Revision: D17565747

Pulled By: Krovatkin

fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed
Summary:
- Separates device type from default (test) device
- Adds multidevice decorator
- Updates generic tests to use multidevice decorator where applicable

TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality.

Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types.
Pull Request resolved: #26594

Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'.

Differential Revision: D17568910

Pulled By: mruberry

fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128
Summary:
Pull Request resolved: #26784

Previously we are using empty to generate test tensors, this PR changes the test tensors to use
randint so that we can test things properly
Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the
original size and strides

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D17566575

fbshipit-source-id: 89379fb09b500dd156118e6ee0709df59f169990
…26290)

Summary:
Pull Request resolved: #26290

Fixes #26206

Happily, I also can delete the dead Dense***Tensor cases, since they
are for the defunct THS backend.

Signed-off-by: Edward Z. Yang <[email protected]>

Test Plan: Imported from OSS

Differential Revision: D17404368

Pulled By: ezyang

fbshipit-source-id: 79d71ad40c4325c9f52d2825aceb65074d2e20e8
…6556)

Summary:
Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK.
Pull Request resolved: #26556

Test Plan:
_Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch.

_Performance_ - All measurements below on Pixel 2

**Before**:

Multi-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25"
>
> Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155

Single-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>  --caffe2_threadpool_force_inline=true"
>
> Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671

**After**:

Multi-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>
> Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042

Single-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>  --caffe2_threadpool_force_inline=true"
> Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425
>

Differential Revision: D17533311

Pulled By: AshkanAliabadi

fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d
Summary:
Closes #24562
Pull Request resolved: #26598

Differential Revision: D17531503

Pulled By: VitalyFedyunin

fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48
Summary:
Pull Request resolved: #26583

Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be
used for logging NCCL version in exception messages.

Test Plan: See above

Differential Revision: D17473200

fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64
…26816)

Summary:
Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain.

BEFORE

```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

AFTER

```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
Pull Request resolved: #26816

Differential Revision: D17573455

Pulled By: VitalyFedyunin

fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2
Summary:
Pull Request resolved: #26751

### Summary

We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below:

1.  Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading.
2. Verify the binary locally  - Run tests on both arm64 and simulator
3. Publish the cocoapods officially

### Test plan

- podspec lint command succeeds
    - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation`

Test Plan: Imported from OSS

Differential Revision: D17577131

Pulled By: xta0

fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac
Summary:
Pull Request resolved: #26496

It is a BAD BAD idea to deploy Docker versions which are not deployed
(per ossci-job-dsl) because those versions will get GC'ed after two
weeks.  At the moment, there is no verification that your Docker version
is deployed.  This adds an Azure job to check this.

Signed-off-by: Edward Z. Yang <[email protected]>

Test Plan: Imported from OSS

Differential Revision: D17575100

Pulled By: ezyang

fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a
Summary:
Pull Request resolved: #26704

nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :(

ghstack-source-id: 90714191

Test Plan: build docker images on Jenkins

Differential Revision: D17543120

fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25
Summary:
Added ONNX export for baddbmm in opset9
Pull Request resolved: #25738

Reviewed By: hl475

Differential Revision: D17565828

Pulled By: houseroad

fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5
Summary: Pull Request resolved: #26739

Test Plan: Imported from OSS

Differential Revision: D17577908

Pulled By: bwasti

fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8
Summary:
We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR.

For nhwc performance improvement:
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
    #  torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

===========without nhwc handling===========
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
1.999044418334961       2.5860953330993652      1.2936657681940702
GB/s float      GB/s quant
1.6192056416115257      0.3129103516188541
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.02730655670166        2.6061582565307617      1.2855274639721328
GB/s float      GB/s quant
1.596632728927902       0.3105014816242217
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0180463790893555      2.4047350883483887      1.1916153728010588
GB/s float      GB/s quant
1.603959172365819       1.3460376636426636

===========with nhwc handling===========

**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0913314819335938      0.09696483612060547     0.04636512047863123
GB/s float      GB/s quant
1.5477527249803915      8.345458337015
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.1065664291381836      0.09959936141967773     0.04728042754408879
GB/s float      GB/s quant
1.5365591871338384      8.124710725706763
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.044203281402588       0.6003522872924805      0.29368521846837126
GB/s float      GB/s quant
1.5834354779917448      5.391607675216635
Pull Request resolved: #26631

Differential Revision: D17521498

Pulled By: llyfacebook

fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103
…26453)

Summary:
Pull Request resolved: #26453

Previously, schema matching would incorrectly widen typevar bindings
when later occurrences were supertypes of earlier ones. This allowed
callsites like `floatlist.append(tensor.item())` to pass the typechecker,
causing a runtime assert (issue #24856).

An earlier, reverted fix (#25136) insisted on strict equality across all
occurrences of a typevar, necessitating explicit casts around Scalar-typed
arguments to int- or float-typed parameters, like `tensor.item()` above.
This was per the original type system design, but turned out to break
existing user code that relied on the de facto dynamic downcast. (The
error required a specialized list representation.)

The current fix includes the prevention of typevar widening, but
adds logic to insert implicit conversions from Scalar to float or int
as needed to satisfy a matched schema.

Test Plan: Imported from OSS

Differential Revision: D17470598

Pulled By: bhosmer

fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659
Summary:
This PR makes the following improvements:
1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`)
2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically).
3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`.
Pull Request resolved: #26521

Differential Revision: D17507358

Pulled By: yf225

fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9
Test Plan: revert-hammer

Differential Revision:
D17565828

Original commit changeset: 85f605a7b3fa

fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37
Summary:
test run: #26732
Pull Request resolved: #26823

Reviewed By: soumith

Differential Revision: D17576095

Pulled By: mingbowan

fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b
… to std::function. (#26592)

Summary:
function_ref is pulled over from LLVM.  It is to callables what StringRef is to strings.
This allows it to be substantially lighter weight, particularly in code size.  That comes
at the cost of not being usable in situations where the callable's lifetime is shorter
than the function_ref.  This means it is suitable for callback-like scenarios, but not
for situations where the callable needs to be stored.  In converting TensorIterator,
I only encountered one situation that required refactoring to comply with function_ref's
constraints.

In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB.
Pull Request resolved: #26592

Differential Revision: D17516202

fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48
ailzhang and others added 29 commits October 8, 2019 16:55
…27453)

Summary:
All of the test cases move into a base class that is extended by the
intrumentation test and a new "HostTests" class that can be run in
normal Java.  (Some changes to the build script and dependencies are
required before the host test can actually run.)

ghstack-source-id: fe1165b
Pull Request resolved: #27453

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800410

fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181
Summary:
Pull Request resolved: #27454

See detailed discussion at
#27350

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800480

Pulled By: dreiss

fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0
Summary: Pull Request resolved: #27455

Test Plan: Imported from OSS

Differential Revision: D17800658

Pulled By: dreiss

fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd
Summary: Pull Request resolved: #27363

Test Plan: Imported from OSS

Differential Revision: D17758907

Pulled By: zafartahirov

fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2
`docs/source/named_tensor.rst` is the entry point; most users will land
either here or the named tensor tutorial when looking to use named
tensors. We should strive to make this as readable, concise, and understandable
as possible.

`docs/source/name_inference.rst` lists all of the name inference rules.
It should be clear but it's hard to make it concise.

Please let me know if anything doesn't make sense and please propose
alternative wordings and/or restructuring to improve the documentation.
This should ultimately get cherry-picked into the 1.3 branch as one
monolithic commit so it would be good to get all necessary changes made
in this PR and not have any follow ups.

Test Plan:
- built and reviewed locally with `cd docs/ && make html`.

ghstack-source-id: dc2ca7a
Pull Request resolved: #27173
This PR updates the docs CI. After this is merged, we open a PR from
1.3.0 -> master. That open PR will build docs on this branch and push
them to pytorch.github.io:site-v1.3.0. This is done in dry_run mode
so the pushing won't actually happen; I will follow up with a
subsequent change to drop dry_run mode after verifying that everything
builds correctly.
* Add javadocs for v1.3.0

* Delete Tensor-Tensor_float32 because it is not public

* Delete Tensor-Tensor_float64 because it is not public

* Delete Tensor-Tensor_int32 because it is not public

* Delete  Tensor-Tensor_int64 because it is not public

* Delete Tensor-Tensor_int8 because it is not public

* Delete Tensor-Tensor_uint8 because it is not public

* Add reference to DType and TensorImageUtils
* Cherry picked in changes from Jessica's branch.

Consolidate all quantization docs in quantization.rst. Add a link to quantization docs from torch.rst. Order quantization.rst alphabetically in index.rst

* Fix Quantized reference

* Add prose for Quantized Functions in the torch.nn docs

* Remove Quantization section

* Updates to index for v1.3.0

* Update "Package Reference" to "Python API"
* Add in torchaudio and torchtext reference links so they show up across all docs not just the main page
* Add "Other Languages" section, add in C++ docs, add in Javadocs
* Add link to XLA docs under Notes: http://pytorch.org/xla/

* Doc tests caught that we'd somehow dropped documenting a few functions like
result_type, can_cast, promote_types

* Add javasphinx extension
…erver (#27574)

* docstring only formatting changes in the quantize.py and fake_quantization.py files to render better in HTML.

* docstring change on observer.py as well

* just kind of tweaking the docstrings a bit more.

* switching to r""" for the mult-line string. Per Zafar's suggestion.

* trying to resolve the merge conflict soumith saw

* trying to avoid a conflict when this gets merged back to master
This was written by Raghu, Jessica, Dmytro and myself.
Organize APIs logically in subsections. Fix typos.

This is the v1.3.0 version of a 3 Part PR originally made to master PR: #27677
originally by @dzhulgakov
This is the v1.3.0 version of a 3 Part PR originally made to master PR: #27677
originally by @dzhulgakov
This is the v1.3.0 version of a 3 Part PR originally made to master PR: #27677
Originally by @dzhulgakov
Summary:
People get confused with partial support otherwise: #27811 #27729

Suggestions on where else put warnings are welcomed (probably in tutorials - cc SethHWeidman )
Pull Request resolved: #27829

Differential Revision: D17910931

Pulled By: dzhulgakov

fbshipit-source-id: 37a169a4bef01b94be59fe62a8f641c3ec5e9b7c
@mysablehats mysablehats merged commit 886f57f into mysablehats:master Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.