Skip to content

mrconvert: Operate using native data type#200

Merged
jdtournier merged 3 commits intomasterfrom
mrconvert_template
Apr 22, 2015
Merged

mrconvert: Operate using native data type#200
jdtournier merged 3 commits intomasterfrom
mrconvert_template

Conversation

@jdtournier
Copy link
Member

mrconvert now performs copy operation using the data type of the output image.
On a straight copy of a floating-point image, this provides a speedup of about 30%.
It also means that image values will not lose precision due to being cast to a complex float during copy.
Discussed in #196.
Closes #3.

mrconvert now performs copy operation using the data type of the output image.
On a straight copy of a floating-point image, this provides a speedup of about 30%.
It also means that image values will not lose precision due to being cast to a complex float during copy.
Discussed in #196.
Closes #3.
@jdtournier jdtournier added this to the MRtrix3 3.0 release for ISMRM milestone Mar 31, 2015
@jdtournier
Copy link
Member Author

You masochist...

Couldn't resist either... On master:

$ touch cmd/mrconvert.cpp 
$ time ./build 
[CC] cmd/mrconvert.o
[LB] bin/mrconvert

real    0m7.273s
user    0m7.153s
sys 0m0.223s

On mrconvert_template:

$ touch cmd/mrconvert.cpp 
$ time ./build 
[CC] cmd/mrconvert.o
[LB] bin/mrconvert

real    0m17.847s
user    0m17.827s
sys 0m0.330s

Looks like the compiler's getting a good workout then... ;)

Kidding aside, just one question: I notice you no longer branch with a direct copy when -coord is not supplied. Might this not cause a (minor) slowdown for a simple conversion? Would be trivial to handle within the copy_permute() call the way you're refactored it, simply by checking for an empty pos vector in that case. I can see that the PermuteAxes adapter probably won't affect performance much, if at all - the compiler should be able to optimise a lot of the additional indexing away. But I doubt it would be able to do so for the Extract adapter...

@Lestropie
Copy link
Member

Yeah, I tend to do these things by changing whatever's necessary to remove the code branching, then adding the templating; guess I forgot to look again to see if the extract could be bypassed. I'll have another play tomorrow.

@Lestropie
Copy link
Member

Hmm, compile time seems to have further doubled. Wonder why...

@jdtournier
Copy link
Member Author

Hehe... That's why I wasn't all that keen on the idea initially, having witnessed the compile times for tckmap... ;)

Maybe we can reduce this by lumping all the integer operations into a int32 version (signed and unsigned), all the floats into float64, and all the complex types into cdouble? That should cut down the compile time at least in half...

@Lestropie
Copy link
Member

But, but, ...

@jdtournier
Copy link
Member Author

Hehe... yes, sorry about your bubble...

By the way, the rationale for the extensive template action in tckmap is quite different from what it is here. In tckmap, we really need to keep overall RAM usage to a minimum for the requested data type. In mrconvert, this is only used to hold a temporary buffer for a single row of data per thread... No RAM limitations whatsoever. What the templating does is avoid unnecessary float->int conversions, and operating on complex types for no reason...

Actually, about that last point, we should verify that operating on a real input image to generate a complex output still works...

@Lestropie
Copy link
Member

Yeah I know. Do anything for a giggle. bool, 8 and 16 bits can go. I'd be tempted to keep both (u)int32 and (u)int64 though, just so that most integer ops work in 32 bit but the 64-bit version is there just in case someone's testing the limits. I'd imagine float<->double conversions would cost bugger all. But I'd rather test to see if there's any speed penalty first.

Type selection is based on the output image. Though I would have thought that kind of operation would be handled using mrcalc...?

@jdtournier
Copy link
Member Author

Yes, sorry, no need for the 32 bit versions - it's all going to be handled as 64 quantities anyway on modern systems. No need to optimise for 32-bit machines, they'll be a thing of the past soon enough, and anyone serious about performance will be running the 64-bit version anyway. Just stick to [u]int64 and float64...

Type selection is based on the output image. Though I would have thought that kind of operation would be handled using mrcalc...?

Yes, scratch that, getting myself in a muddle...

@jdtournier
Copy link
Member Author

@Lestropie - how are you getting on with this one? You want to leave it as-is, or try to reduce the compile times as we discussed...?

@Lestropie
Copy link
Member

Was just talking about it with Thijs before I left work. Will cut some templates out.

My understanding though was that even 64-bit processors tend to prefer operating on 32-bit integers rather than 64? I'll speed test everything anyways.

@jdtournier
Copy link
Member Author

My understanding though was that even 64-bit processors tend to prefer operating on 32-bit integers rather than 64?

Not sure - it sounds unnecessary for the processors to duplicate their integer pipelines for this. Where it might make a difference is purely in terms of the amount of RAM used to store each row of values. Doubling the size needed might mean it exceeds the CPU L1 cache, which could potentially slow it down. But frankly, I doubt you'd notice the difference...

I'll speed test everything anyways

Good idea.

@jdtournier
Copy link
Member Author

@Lestropie - I'm thinking we should merge this now, and leave cutting back on the number of template types for later. I'm keen to make sure this can be merged cleanly, which means not leaving it to linger too long. And also there's #211 looming...

Only use (u)int32, (u)int64 and (c)double for conversion; reduces compile time.
Relates to #3 and #200.
@Lestropie
Copy link
Member

OK, finally got around to testing this.

  • Boolean, int8 and int16 actually fractionally faster if anything when run as copy_permute<uint32_t>.
  • About a 15% slowdown if copying a 32-bit image using copy_permute<int64_t>, so kept both 32-bit and 64-bit versions.
  • Didn't see any slowdown copying to a float image using copy_permute<double>. So removed copy_permute<(c)float>.

Compile time back down to ~20s on my system, which I think is acceptable.

@jdtournier
Copy link
Member Author

About a 15% slowdown if copying a 32-bit image using copy_permute<int64_t>, so kept both 32-bit and 64-bit versions.

Not sure I'd be that bothered about 15%... But I guess removing the 32 bit versions isn't going to make all that much difference to the compile times now. Let's merge.

@jdtournier jdtournier merged commit d52e165 into master Apr 22, 2015
@jdtournier jdtournier deleted the mrconvert_template branch April 22, 2015 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Template buffer types in mrconvert copy

2 participants