mrconvert: Operate using native data type by jdtournier · Pull Request #200 · MRtrix3/mrtrix3

jdtournier · 2015-03-31T09:33:03Z

mrconvert now performs copy operation using the data type of the output image.
On a straight copy of a floating-point image, this provides a speedup of about 30%.
It also means that image values will not lose precision due to being cast to a complex float during copy.
Discussed in #196.
Closes #3.

mrconvert now performs copy operation using the data type of the output image. On a straight copy of a floating-point image, this provides a speedup of about 30%. It also means that image values will not lose precision due to being cast to a complex float during copy. Discussed in #196. Closes #3.

jdtournier · 2015-03-31T09:35:22Z

You masochist...

Couldn't resist either... On master:

$ touch cmd/mrconvert.cpp 
$ time ./build 
[CC] cmd/mrconvert.o
[LB] bin/mrconvert

real    0m7.273s
user    0m7.153s
sys 0m0.223s

On mrconvert_template:

$ touch cmd/mrconvert.cpp 
$ time ./build 
[CC] cmd/mrconvert.o
[LB] bin/mrconvert

real    0m17.847s
user    0m17.827s
sys 0m0.330s

Looks like the compiler's getting a good workout then... ;)

Kidding aside, just one question: I notice you no longer branch with a direct copy when -coord is not supplied. Might this not cause a (minor) slowdown for a simple conversion? Would be trivial to handle within the copy_permute() call the way you're refactored it, simply by checking for an empty pos vector in that case. I can see that the PermuteAxes adapter probably won't affect performance much, if at all - the compiler should be able to optimise a lot of the additional indexing away. But I doubt it would be able to do so for the Extract adapter...

Lestropie · 2015-03-31T10:04:08Z

Yeah, I tend to do these things by changing whatever's necessary to remove the code branching, then adding the templating; guess I forgot to look again to see if the extract could be bypassed. I'll have another play tomorrow.

Lestropie · 2015-04-01T00:01:25Z

Hmm, compile time seems to have further doubled. Wonder why...

jdtournier · 2015-04-01T09:02:50Z

Hehe... That's why I wasn't all that keen on the idea initially, having witnessed the compile times for tckmap... ;)

Maybe we can reduce this by lumping all the integer operations into a int32 version (signed and unsigned), all the floats into float64, and all the complex types into cdouble? That should cut down the compile time at least in half...

Lestropie · 2015-04-01T09:11:21Z

But, but, ...

jdtournier · 2015-04-01T09:22:08Z

Hehe... yes, sorry about your bubble...

By the way, the rationale for the extensive template action in tckmap is quite different from what it is here. In tckmap, we really need to keep overall RAM usage to a minimum for the requested data type. In mrconvert, this is only used to hold a temporary buffer for a single row of data per thread... No RAM limitations whatsoever. What the templating does is avoid unnecessary float->int conversions, and operating on complex types for no reason...

Actually, about that last point, we should verify that operating on a real input image to generate a complex output still works...

Lestropie · 2015-04-01T09:47:19Z

Yeah I know. Do anything for a giggle. bool, 8 and 16 bits can go. I'd be tempted to keep both (u)int32 and (u)int64 though, just so that most integer ops work in 32 bit but the 64-bit version is there just in case someone's testing the limits. I'd imagine float<->double conversions would cost bugger all. But I'd rather test to see if there's any speed penalty first.

Type selection is based on the output image. Though I would have thought that kind of operation would be handled using mrcalc...?

jdtournier · 2015-04-01T09:50:52Z

Yes, sorry, no need for the 32 bit versions - it's all going to be handled as 64 quantities anyway on modern systems. No need to optimise for 32-bit machines, they'll be a thing of the past soon enough, and anyone serious about performance will be running the 64-bit version anyway. Just stick to [u]int64 and float64...

Type selection is based on the output image. Though I would have thought that kind of operation would be handled using mrcalc...?

Yes, scratch that, getting myself in a muddle...

jdtournier · 2015-04-10T10:33:18Z

@Lestropie - how are you getting on with this one? You want to leave it as-is, or try to reduce the compile times as we discussed...?

Lestropie · 2015-04-10T10:44:35Z

Was just talking about it with Thijs before I left work. Will cut some templates out.

My understanding though was that even 64-bit processors tend to prefer operating on 32-bit integers rather than 64? I'll speed test everything anyways.

jdtournier · 2015-04-10T10:47:27Z

My understanding though was that even 64-bit processors tend to prefer operating on 32-bit integers rather than 64?

Not sure - it sounds unnecessary for the processors to duplicate their integer pipelines for this. Where it might make a difference is purely in terms of the amount of RAM used to store each row of values. Doubling the size needed might mean it exceeds the CPU L1 cache, which could potentially slow it down. But frankly, I doubt you'd notice the difference...

I'll speed test everything anyways

Good idea.

jdtournier · 2015-04-21T09:29:38Z

@Lestropie - I'm thinking we should merge this now, and leave cutting back on the number of template types for later. I'm keen to make sure this can be merged cleanly, which means not leaving it to linger too long. And also there's #211 looming...

Only use (u)int32, (u)int64 and (c)double for conversion; reduces compile time. Relates to #3 and #200.

Lestropie · 2015-04-22T00:27:56Z

OK, finally got around to testing this.

Boolean, int8 and int16 actually fractionally faster if anything when run as copy_permute<uint32_t>.
About a 15% slowdown if copying a 32-bit image using copy_permute<int64_t>, so kept both 32-bit and 64-bit versions.
Didn't see any slowdown copying to a float image using copy_permute<double>. So removed copy_permute<(c)float>.

Compile time back down to ~20s on my system, which I think is acceptable.

jdtournier · 2015-04-22T07:48:22Z

About a 15% slowdown if copying a 32-bit image using copy_permute<int64_t>, so kept both 32-bit and 64-bit versions.

Not sure I'd be that bothered about 15%... But I guess removing the 32 bit versions isn't going to make all that much difference to the compile times now. Let's merge.

jdtournier added this to the MRtrix3 3.0 release for ISMRM milestone Mar 31, 2015

jdtournier added the enhancement label Mar 31, 2015

jdtournier assigned Lestropie Mar 31, 2015

mrconvert: Direct conversion when no coordinate extraction is performed

59eca27

mrconvert: Reduce template count

d52e165

Only use (u)int32, (u)int64 and (c)double for conversion; reduces compile time. Relates to #3 and #200.

jdtournier merged commit d52e165 into master Apr 22, 2015

jdtournier deleted the mrconvert_template branch April 22, 2015 09:16

jdtournier unassigned Lestropie Apr 27, 2015

jdtournier mentioned this pull request Apr 28, 2015

writing bitwise image results in UInt16 #233

Closed

jdtournier mentioned this pull request Sep 9, 2015

mrconvert scaling issue #346

Merged

Conversation

jdtournier commented Mar 31, 2015

Uh oh!

jdtournier commented Mar 31, 2015

Uh oh!

Lestropie commented Mar 31, 2015

Uh oh!

Lestropie commented Apr 1, 2015

Uh oh!

jdtournier commented Apr 1, 2015

Uh oh!

Lestropie commented Apr 1, 2015

Uh oh!

jdtournier commented Apr 1, 2015

Uh oh!

Lestropie commented Apr 1, 2015

Uh oh!

jdtournier commented Apr 1, 2015

Uh oh!

jdtournier commented Apr 10, 2015

Uh oh!

Lestropie commented Apr 10, 2015

Uh oh!

jdtournier commented Apr 10, 2015

Uh oh!

jdtournier commented Apr 21, 2015

Uh oh!

Lestropie commented Apr 22, 2015

Uh oh!

jdtournier commented Apr 22, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants