Skip to content

Conversation

@ejguan
Copy link
Contributor

@ejguan ejguan commented Sep 8, 2021

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 8, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit af2fa1e (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-bionic-py3.8-gcc9-coverage / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-09T15:26:12.9653999Z CONTINUE_THROUGH_ERROR: false
2021-09-09T15:26:12.9649245Z   ALPINE_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine
2021-09-09T15:26:12.9649811Z   PR_LABELS: [
  "cla signed"
]
2021-09-09T15:26:12.9650774Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.8-gcc9:3fb4365799993abcfc83e51d42c137e89cb2459a
2021-09-09T15:26:12.9652075Z   JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
2021-09-09T15:26:12.9652629Z   TEST_CONFIG: default
2021-09-09T15:26:12.9652947Z   SHARD_NUMBER: 1
2021-09-09T15:26:12.9653250Z   NUM_TEST_SHARDS: 2
2021-09-09T15:26:12.9653600Z   PYTORCH_IGNORE_DISABLED_ISSUES: 
2021-09-09T15:26:12.9653999Z   CONTINUE_THROUGH_ERROR: false
2021-09-09T15:26:12.9654331Z   SHM_SIZE: 1g
2021-09-09T15:26:12.9654636Z   PR_NUMBER: 64697
2021-09-09T15:26:12.9654918Z ##[endgroup]
2021-09-09T15:26:26.5953242Z Processing ./dist/torch-1.10.0a0+gitfaccda9-cp38-cp38-linux_x86_64.whl
2021-09-09T15:26:26.6208646Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.8/site-packages (from torch==1.10.0a0+gitfaccda9) (3.10.0.0)
2021-09-09T15:26:26.8630093Z Installing collected packages: torch
2021-09-09T15:26:33.0431828Z Successfully installed torch-1.10.0a0+gitfaccda9
2021-09-09T15:26:33.3107384Z ++++ dirname .jenkins/pytorch/common.sh
2021-09-09T15:26:33.3112960Z +++ cd .jenkins/pytorch
2021-09-09T15:26:33.3113980Z +++ pwd -P

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ejguan added a commit that referenced this pull request Sep 8, 2021
@codecov
Copy link

codecov bot commented Sep 8, 2021

Codecov Report

Merging #64697 (af2fa1e) into gh/ejguan/88/base (7205ca0) will decrease coverage by 1.21%.
The diff coverage is 87.23%.

@@                  Coverage Diff                  @@
##           gh/ejguan/88/base   #64697      +/-   ##
=====================================================
- Coverage              66.65%   65.44%   -1.22%     
=====================================================
  Files                    710      710              
  Lines                  92406    92436      +30     
=====================================================
- Hits                   61594    60494    -1100     
- Misses                 30812    31942    +1130     

@ejguan
Copy link
Contributor Author

ejguan commented Sep 9, 2021

Summary of proposed API

dp.map(fn, input_col, output_col)

For the case of list or tuple as input

Single/Multiple Input Column (input_col=0/input_col=[1, 0])
Default output column (output_col=None) The original/left-most column (0/1) is replaced
Single specified output column (output_col=2) The specified column (2) is replaced
New output column (output_col=-1) One new column is appended
Multiple output and multiple specified output column (output_col=[0, 1]) The specified columns (0 and 1) are replaced by expanded result (size should match)
Multiple output and new output columns (output_col=[-1, -1]) Two new columns are appended by expanded result (size should match)
Multiple output and mixing with replacement and appending (output_col=[-1, 1]) The first element in result is appended and the second element in result replaces the original column (1)

For the case of dict as input, the difference is using key rather than index to specify the input_col and output_col.

ejguan added a commit that referenced this pull request Sep 9, 2021
Comment on lines +99 to +100
# Deepcopy data to prevent the original data modified. E.g. list, dict
data = copy.deepcopy(data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is the most elegant way to protect original data

@ejguan
Copy link
Contributor Author

ejguan commented Sep 9, 2021

@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ejguan
Copy link
Contributor Author

ejguan commented Sep 9, 2021

map without index map with index
Replace the original column dp.map(lambda d: (-d[0], *d[1:])) dp.map(lambda d: -d, 0)
Replace the left-most column (1) with multiple outputs dp.map(lambda d: (d[0], (d[1] - d[0], d[1] + d[0]), *d[2:])) dp.map(lambda d0, d1: (d0 - d1, d0 + d1), [1, 0])
Replace a different column dp.map(lambda d: (d[0], -d[0], *d[2:])) dp.map(lambda d: -d, 0, 1)
Replace multiple columns dp.map(lambda d: (d[0], d[0] - d[1], d[0] + d[1], *d[3:])) dp.map(lambda d0, d1: (d0 - d1, d0 + d1), [0, 1], [1, 2])
Append a new column dp.map(lambda d: (*d, -d[0])) dp.map(lambda d: -d, 0, -1)
Append multiple columns dp.map(lambda d: (*d, d[0] - d[1], d[0] + d[1])) dp.map(lambda d0, d1: (d0 - d1, d0 + d1), [0, 1], [-1, -1])
Replace and append columns dp.map(lambda d: (d[0] + d[1], *d[1:], d[0] - d[1])) dp.map(lambda d0, d1: (d0 - d1, d0 + d1), [0, 1], [-1, 0])

@ejguan
Copy link
Contributor Author

ejguan commented Sep 9, 2021

Discussed with @wenleix about multi-return function. Currently, TA only supports to embedded multi-element result as a DataFrame into a column. (Nested DataFrame, df["c"] = df.map(fn, columns=["a", "b"], dtype=...))

Even though both of us think it would be potentially useful for users to expand the multi-element result and assign them to different columns, we decide not to enable this feature for now, for the sake of BC. As it's not hard to implement, we can easily implement it in the future after we gather more feedback from community about the API design.

cc: @VitalyFedyunin

ejguan added a commit that referenced this pull request Sep 13, 2021
…n apply fn"

Fixes https://github.com/facebookexternal/torchdata/issues/135


Updated PR for #64697 . ghstack created this new PR accidentally.

### API for list/tuple
| input_col (fn) | None (lambda d: -d[0] - d[1]) |   0 (lambda d: -d)   | [1, 2] (lambda d0, d1: - d0 - d1) |
|:--------------:|:-----------------------------:|:--------------------:|:---------------------------------:|
| None           | (1, 2) → -3                   | (1, 2) -> (-1, 2)    | (1, 2, 3) -> (1, -5)              |
| 0              | Not applicable                | (1, 2) -> (-1, 2)    | (1, 2, 3) -> (-5, 2, 3)           |
| 1              | Not applicable                | (1, 2) -> (1, -1)    | (1, 2, 3) -> (1, -5, 3)           |
| -1             | Not applicable                | (1, 2) -> (1, 2, -1) | (1, 2, 3) -> (1, 2, 3, -5)        |
### API for dict
| input_col (fn) | None (lambda d: {'z': -d['x'] - d['y']}) |                'x' (lambda d: -d)               |      ['x', 'y'] (lambda d0, d1: - d0 - d1)      |
|:--------------:|:----------------------------------------:|:-----------------------------------------------:|:-----------------------------------------------:|
| None           | {'x': 1, 'y' : 2} → {'z': 3}             | {'x': 1, 'y' : 2} -> {'x': -1, 'y' : 2}         | {'x': 1, 'y' : 2} -> {'x': -3}                  |
| 'x'            | Not applicable                           | {'x': 1, 'y' : 2} -> {'x': -1, 'y' : 2}         | {'x': 1, 'y' : 2} -> {'x': -3, 'y' : 2}         |
| 'y'            | Not applicable                           | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : -1}         | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : -3}         |
| 'z'            | Not applicable                           | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : 2, 'z': -1} | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : 2, 'z': -3} |

Differential Revision: [D30910035](https://our.internmc.facebook.com/intern/diff/D30910035)

[ghstack-poisoned]
@ejguan
Copy link
Contributor Author

ejguan commented Sep 14, 2021

Closing this PR as a duplicate one is uploaded accidentally. #64951

@ejguan ejguan closed this Sep 14, 2021
@facebook-github-bot facebook-github-bot deleted the gh/ejguan/88/head branch October 14, 2021 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants