Skip to content

Conversation

@mhvk
Copy link
Contributor

@mhvk mhvk commented Jul 8, 2019

@taldcroft - just a proof of concept, to try to make getting col.info itself faster. More generally, we might want to think about replacing getting/setting attributes on the parent not via __getattr__ and __setattr__, but rather via (automatically generated) properties on info.

from astropy.table import MaskedColumn
m = MaskedColumn([1, 2])
%timeit m.info
# 1000000 loops, best of 5: 336 ns per loop
# on current master: 1.26 µs
m.info._name = 'bla'
%timeit m.info._name
# 1000000 loops, best of 5: 370 ns per loop
# on current master: 1.28 µs
%timeit m.name
# 10000000 loops, best of 5: 82.3 ns per loop
# "only" 4.5 times faster instead of 15!

@taldcroft
Copy link
Member

So this inspired me to start thinking outside our current box. The very first thing about DataInfo is that it is a descriptor class that is storing the per-instance attribute information on the parent object and always dynamically finding the parent. You made that faster, which is definitely good. Overall DataInfo is a bit complicated and has all these redirects and knobs.

Is there any reasonable way we can scrap that whole paradigm and simplify a whole lot? Just make info a property pointing to an ordinary object stored in _info. The info "attributes" are properties that allow quite fast access, with setters that do validation as necessary.

Maybe we've been dancing around this idea, but I wonder if something like below could fly? It might end up making the DataInfo concept much simpler, cleaner, and faster.

master...taldcroft:table-info-play

In [2]: c = Column([1,2], name='hello')

In [3]: timeit c.name
165 ns ± 0.627 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: timeit c.finfo.name
305 ns ± 2.74 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: timeit c.info.name
3.89 µs ± 56.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@mhvk
Copy link
Contributor Author

mhvk commented Jul 9, 2019

@taldcroft - interesting idea. Part of this is what I had in mind by auto-generation - one can just generate the name properties, etc., by a metaclass from attrs_from_parent, etc. I'm not sure making info a property is going to gain much over it being a descriptor (a property is a descriptor, after all), but I think making it simpler/more permanent is good, and I like especially the much closer link to the parent that you have (though obviously we need to look into why we had that weakref in the first place! I experimented with not setting _parent on every call but got quite a few errors).

p.s. The good part is that we've got lots of tests, so one can quickly find out whether a new idea will work!

@mhvk
Copy link
Contributor Author

mhvk commented Jul 15, 2019

closing in favour of #8998 (which includes this commit, though it partially undoes it)

@mhvk mhvk closed this Jul 15, 2019
@mhvk mhvk deleted the faster-info branch October 10, 2020 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants