forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 0
V1.3.0 #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
V1.3.0 #1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…26563) Summary: Pull Request resolved: #26563 This adds name inference rules for pre-existing logsumexp, mode, kthvalue, and median ops. Also adds overloads so that they can take `Dimname` dimensions. There are a lot of min/max overloads. This PR adds name inference to the following overloads for (both) min and max: - min(Tensor, int dim) - min(Tensor, Dimname dim) - min(Tensor) (full reduction) Test Plan: - new tests and [namedtensor ci] Differential Revision: D17557050 Pulled By: zou3519 fbshipit-source-id: a099a0ef04ad90d021a38a0668fc44902e1c7171
Summary: Pull Request resolved: #25914 Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Differential Revision: D17284083 Pulled By: ezyang fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec
Summary: Pull Request resolved: #26118 Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Differential Revision: D17404367 Pulled By: ezyang fbshipit-source-id: 14a16baa4b59f97182725092531a54603f3d92b8
Summary: Pull Request resolved: #26360 This is not just for aesthetics: this include blocks the inclusion of headers like ivalue.h from ATenDispatch.h (as it causes an include cycle.) Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Differential Revision: D17429163 Pulled By: ezyang fbshipit-source-id: 03feb210c12bc891d95bbb5a11ffd694ec05005c
Summary: Pull Request resolved: #26718 Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Differential Revision: D17549623 Pulled By: ezyang fbshipit-source-id: 8880c09d85a15b2a63dcf0c242ba6a2dd941decb
Summary: GitHub commits: facebook/litho@6668c21 pytorch/FBGEMM@189aebb Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: f2037290b58ac295eeb94626e172491a8526875d
Test Plan: revert-hammer Differential Revision: D17549623 Original commit changeset: 8880c09d85a1 fbshipit-source-id: 002bb1173dbcf6a1d18e1c4b84b4365f145c38dd
Summary: Resubmit of #25980. Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles. We can still manage to get it work with tarfile, but let's delay it when there's an ask. Pull Request resolved: #26723 Differential Revision: D17551795 Pulled By: ailzhang fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e
Summary: This reset the sleef submodule to upstream, since everything else except a small build sanity fix <zdevito/sleef@191f655> has been merged to upstream. The new release includes an important fix for trigonometric functions on MacOS, which would unblock #26431. This should supersede #20536. Close #20536. cc colesbury resistor Pull Request resolved: #26749 Differential Revision: D17572783 Pulled By: ezyang fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc
Summary: GitHub commits: facebook/litho@5096b0a facebook/proxygen@ecd6c10 facebook/mvfst@67abe5d facebookarchive/profilo@90580f7 facebookresearch/PyTorch-BigGraph@7f98961 pytorch/FBGEMM@f8da6e6 Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 60ce61531cf6d4ac8616b3986b40b423abc7de15
Summary: Pull Request resolved: #26773 att Test Plan: ci Imported from OSS Differential Revision: D17563673 fbshipit-source-id: 5a6fb4238b6886695c2d25db11fec22ebe5d0c08
Summary: Pull Request resolved: #25397 Differential Revision: D17565747 Pulled By: Krovatkin fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed
Summary: - Separates device type from default (test) device - Adds multidevice decorator - Updates generic tests to use multidevice decorator where applicable TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality. Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types. Pull Request resolved: #26594 Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'. Differential Revision: D17568910 Pulled By: mruberry fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128
Summary: Pull Request resolved: #26784 Previously we are using empty to generate test tensors, this PR changes the test tensors to use randint so that we can test things properly Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the original size and strides Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D17566575 fbshipit-source-id: 89379fb09b500dd156118e6ee0709df59f169990
…26290) Summary: Pull Request resolved: #26290 Fixes #26206 Happily, I also can delete the dead Dense***Tensor cases, since they are for the defunct THS backend. Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Differential Revision: D17404368 Pulled By: ezyang fbshipit-source-id: 79d71ad40c4325c9f52d2825aceb65074d2e20e8
…6556) Summary: Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK. Pull Request resolved: #26556 Test Plan: _Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch. _Performance_ - All measurements below on Pixel 2 **Before**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25" > > Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > > Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671 **After**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > > Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425 > Differential Revision: D17533311 Pulled By: AshkanAliabadi fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d
Summary: Pull Request resolved: #26583 Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be used for logging NCCL version in exception messages. Test Plan: See above Differential Revision: D17473200 fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64
…26816) Summary: Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain. BEFORE ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` AFTER ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Pull Request resolved: #26816 Differential Revision: D17573455 Pulled By: VitalyFedyunin fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2
Summary: Pull Request resolved: #26751 ### Summary We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below: 1. Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading. 2. Verify the binary locally - Run tests on both arm64 and simulator 3. Publish the cocoapods officially ### Test plan - podspec lint command succeeds - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation` Test Plan: Imported from OSS Differential Revision: D17577131 Pulled By: xta0 fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac
Summary: Pull Request resolved: #26496 It is a BAD BAD idea to deploy Docker versions which are not deployed (per ossci-job-dsl) because those versions will get GC'ed after two weeks. At the moment, there is no verification that your Docker version is deployed. This adds an Azure job to check this. Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Differential Revision: D17575100 Pulled By: ezyang fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a
Summary: Pull Request resolved: #26704 nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :( ghstack-source-id: 90714191 Test Plan: build docker images on Jenkins Differential Revision: D17543120 fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25
Summary: Added ONNX export for baddbmm in opset9 Pull Request resolved: #25738 Reviewed By: hl475 Differential Revision: D17565828 Pulled By: houseroad fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5
Summary: Pull Request resolved: #26739 Test Plan: Imported from OSS Differential Revision: D17577908 Pulled By: bwasti fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8
Summary:
We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR.
For nhwc performance improvement:
import torch, time
for dtype in [torch.qint8, torch.quint8, torch.qint32]:
print('****', str(dtype), '*****')
x = torch.rand(1, 56, 56, 256)
q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
q_x = q_x.permute([0, 3, 1, 2])
x = x.permute([0, 3, 1, 2])
NITER = 100
s = time.time()
for i in range(NITER):
float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
time_per_iter_float = (time.time() - s) / NITER
s = time.time()
for i in range(NITER):
quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
time_per_iter_quant = (time.time() - s) / NITER
ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
# torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())
print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')
bytes_float = (x.numel() + float_out.numel()) * x.element_size()
bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()
float_bw_gbps = bytes_float / time_per_iter_float / 1e9
quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9
print('GB/s float', 'GB/s quant', sep='\t')
print(float_bw_gbps, quant_bw_gbps, sep='\t')
===========without nhwc handling===========
**** torch.qint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
1.999044418334961 2.5860953330993652 1.2936657681940702
GB/s float GB/s quant
1.6192056416115257 0.3129103516188541
**** torch.quint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.02730655670166 2.6061582565307617 1.2855274639721328
GB/s float GB/s quant
1.596632728927902 0.3105014816242217
**** torch.qint32 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.0180463790893555 2.4047350883483887 1.1916153728010588
GB/s float GB/s quant
1.603959172365819 1.3460376636426636
===========with nhwc handling===========
**** torch.qint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.0913314819335938 0.09696483612060547 0.04636512047863123
GB/s float GB/s quant
1.5477527249803915 8.345458337015
**** torch.quint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.1065664291381836 0.09959936141967773 0.04728042754408879
GB/s float GB/s quant
1.5365591871338384 8.124710725706763
**** torch.qint32 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.044203281402588 0.6003522872924805 0.29368521846837126
GB/s float GB/s quant
1.5834354779917448 5.391607675216635
Pull Request resolved: #26631
Differential Revision: D17521498
Pulled By: llyfacebook
fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103
…26453) Summary: Pull Request resolved: #26453 Previously, schema matching would incorrectly widen typevar bindings when later occurrences were supertypes of earlier ones. This allowed callsites like `floatlist.append(tensor.item())` to pass the typechecker, causing a runtime assert (issue #24856). An earlier, reverted fix (#25136) insisted on strict equality across all occurrences of a typevar, necessitating explicit casts around Scalar-typed arguments to int- or float-typed parameters, like `tensor.item()` above. This was per the original type system design, but turned out to break existing user code that relied on the de facto dynamic downcast. (The error required a specialized list representation.) The current fix includes the prevention of typevar widening, but adds logic to insert implicit conversions from Scalar to float or int as needed to satisfy a matched schema. Test Plan: Imported from OSS Differential Revision: D17470598 Pulled By: bhosmer fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659
Summary: This PR makes the following improvements: 1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`) 2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically). 3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`. Pull Request resolved: #26521 Differential Revision: D17507358 Pulled By: yf225 fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9
Test Plan: revert-hammer Differential Revision: D17565828 Original commit changeset: 85f605a7b3fa fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37
… to std::function. (#26592) Summary: function_ref is pulled over from LLVM. It is to callables what StringRef is to strings. This allows it to be substantially lighter weight, particularly in code size. That comes at the cost of not being usable in situations where the callable's lifetime is shorter than the function_ref. This means it is suitable for callback-like scenarios, but not for situations where the callable needs to be stored. In converting TensorIterator, I only encountered one situation that required refactoring to comply with function_ref's constraints. In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB. Pull Request resolved: #26592 Differential Revision: D17516202 fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48
…read pool implementation. (#27547)
…27453) Summary: All of the test cases move into a base class that is extended by the intrumentation test and a new "HostTests" class that can be run in normal Java. (Some changes to the build script and dependencies are required before the host test can actually run.) ghstack-source-id: fe1165b Pull Request resolved: #27453 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800410 fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181
Summary: Pull Request resolved: #27455 Test Plan: Imported from OSS Differential Revision: D17800658 Pulled By: dreiss fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd
This reverts commit a2b3403.
Summary: Pull Request resolved: #27363 Test Plan: Imported from OSS Differential Revision: D17758907 Pulled By: zafartahirov fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2
Landing in master in #27514
`docs/source/named_tensor.rst` is the entry point; most users will land either here or the named tensor tutorial when looking to use named tensors. We should strive to make this as readable, concise, and understandable as possible. `docs/source/name_inference.rst` lists all of the name inference rules. It should be clear but it's hard to make it concise. Please let me know if anything doesn't make sense and please propose alternative wordings and/or restructuring to improve the documentation. This should ultimately get cherry-picked into the 1.3 branch as one monolithic commit so it would be good to get all necessary changes made in this PR and not have any follow ups. Test Plan: - built and reviewed locally with `cd docs/ && make html`. ghstack-source-id: dc2ca7a Pull Request resolved: #27173
This PR updates the docs CI. After this is merged, we open a PR from 1.3.0 -> master. That open PR will build docs on this branch and push them to pytorch.github.io:site-v1.3.0. This is done in dry_run mode so the pushing won't actually happen; I will follow up with a subsequent change to drop dry_run mode after verifying that everything builds correctly.
* Add javadocs for v1.3.0 * Delete Tensor-Tensor_float32 because it is not public * Delete Tensor-Tensor_float64 because it is not public * Delete Tensor-Tensor_int32 because it is not public * Delete Tensor-Tensor_int64 because it is not public * Delete Tensor-Tensor_int8 because it is not public * Delete Tensor-Tensor_uint8 because it is not public * Add reference to DType and TensorImageUtils
* Cherry picked in changes from Jessica's branch. Consolidate all quantization docs in quantization.rst. Add a link to quantization docs from torch.rst. Order quantization.rst alphabetically in index.rst * Fix Quantized reference * Add prose for Quantized Functions in the torch.nn docs * Remove Quantization section * Updates to index for v1.3.0 * Update "Package Reference" to "Python API" * Add in torchaudio and torchtext reference links so they show up across all docs not just the main page * Add "Other Languages" section, add in C++ docs, add in Javadocs * Add link to XLA docs under Notes: http://pytorch.org/xla/ * Doc tests caught that we'd somehow dropped documenting a few functions like result_type, can_cast, promote_types * Add javasphinx extension
…erver (#27574) * docstring only formatting changes in the quantize.py and fake_quantization.py files to render better in HTML. * docstring change on observer.py as well * just kind of tweaking the docstrings a bit more. * switching to r""" for the mult-line string. Per Zafar's suggestion. * trying to resolve the merge conflict soumith saw * trying to avoid a conflict when this gets merged back to master
This was written by Raghu, Jessica, Dmytro and myself.
Organize APIs logically in subsections. Fix typos. This is the v1.3.0 version of a 3 Part PR originally made to master PR: #27677 originally by @dzhulgakov
This is the v1.3.0 version of a 3 Part PR originally made to master PR: #27677 originally by @dzhulgakov
This is the v1.3.0 version of a 3 Part PR originally made to master PR: #27677 Originally by @dzhulgakov
Summary: People get confused with partial support otherwise: #27811 #27729 Suggestions on where else put warnings are welcomed (probably in tutorials - cc SethHWeidman ) Pull Request resolved: #27829 Differential Revision: D17910931 Pulled By: dzhulgakov fbshipit-source-id: 37a169a4bef01b94be59fe62a8f641c3ec5e9b7c
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.