-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[DataPipe] Improve Mapper to accept input/output index when apply fn #64697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit af2fa1e (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
…n apply fn" [ghstack-poisoned]
Codecov Report
@@ Coverage Diff @@
## gh/ejguan/88/base #64697 +/- ##
=====================================================
- Coverage 66.65% 65.44% -1.22%
=====================================================
Files 710 710
Lines 92406 92436 +30
=====================================================
- Hits 61594 60494 -1100
- Misses 30812 31942 +1130 |
Summary of proposed APIdp.map(fn, input_col, output_col)For the case of
For the case of dict as input, the difference is using key rather than index to specify the |
| # Deepcopy data to prevent the original data modified. E.g. list, dict | ||
| data = copy.deepcopy(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is the most elegant way to protect original data
|
@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
|
Discussed with @wenleix about multi-return function. Currently, TA only supports to embedded multi-element result as a DataFrame into a column. (Nested DataFrame, Even though both of us think it would be potentially useful for users to expand the multi-element result and assign them to different columns, we decide not to enable this feature for now, for the sake of BC. As it's not hard to implement, we can easily implement it in the future after we gather more feedback from community about the API design. cc: @VitalyFedyunin |
…n apply fn" Fixes https://github.com/facebookexternal/torchdata/issues/135 Updated PR for #64697 . ghstack created this new PR accidentally. ### API for list/tuple | input_col (fn) | None (lambda d: -d[0] - d[1]) | 0 (lambda d: -d) | [1, 2] (lambda d0, d1: - d0 - d1) | |:--------------:|:-----------------------------:|:--------------------:|:---------------------------------:| | None | (1, 2) → -3 | (1, 2) -> (-1, 2) | (1, 2, 3) -> (1, -5) | | 0 | Not applicable | (1, 2) -> (-1, 2) | (1, 2, 3) -> (-5, 2, 3) | | 1 | Not applicable | (1, 2) -> (1, -1) | (1, 2, 3) -> (1, -5, 3) | | -1 | Not applicable | (1, 2) -> (1, 2, -1) | (1, 2, 3) -> (1, 2, 3, -5) | ### API for dict | input_col (fn) | None (lambda d: {'z': -d['x'] - d['y']}) | 'x' (lambda d: -d) | ['x', 'y'] (lambda d0, d1: - d0 - d1) | |:--------------:|:----------------------------------------:|:-----------------------------------------------:|:-----------------------------------------------:| | None | {'x': 1, 'y' : 2} → {'z': 3} | {'x': 1, 'y' : 2} -> {'x': -1, 'y' : 2} | {'x': 1, 'y' : 2} -> {'x': -3} | | 'x' | Not applicable | {'x': 1, 'y' : 2} -> {'x': -1, 'y' : 2} | {'x': 1, 'y' : 2} -> {'x': -3, 'y' : 2} | | 'y' | Not applicable | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : -1} | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : -3} | | 'z' | Not applicable | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : 2, 'z': -1} | {'x': 1, 'y' : 2} -> {'x': 1, 'y' : 2, 'z': -3} | Differential Revision: [D30910035](https://our.internmc.facebook.com/intern/diff/D30910035) [ghstack-poisoned]
|
Closing this PR as a duplicate one is uploaded accidentally. #64951 |
Fixes https://github.com/facebookexternal/torchdata/issues/135
Stack from ghstack:
Differential Revision: D30841472