jugad2 - Vasudev Ram on software innovation: performance

Showing posts with label performance. Show all posts

Tuesday, December 20, 2016

Simple parallel processing in D with std.parallelism

Phobos image attribution

The D language has a module for parallel processing support. It is in Phobos, D's standard library (guess why it is called Phobos :). [1]
The module is called std.parallelism.

From the Phobos page:

[ Generally, the std namespace is used for the main modules in the Phobos standard library. The etc namespace is used for external C/C++ library bindings. The core namespace is used for low-level D runtime functions. ]

Here are a couple of simple D programs that together show the speedup that can be obtained by using the parallelism module. Note that this is for tasks that involve the CPU, not I/O.

The first one, student_work_sequential.d, does not use the std.parallelism module, so it does 4 tasks sequentially, i.e. one after another.

The second one, student_work_parallel.d, does use it. It does the same 4 tasks as the first. But it uses the convenience function called parallel from std.parallelism, to run the 4 tasks in parallel.
I timed the results of running each program a few times, and the parallel one was consistently over 3 times faster than the sequential one.

Each of the D programs below can be compiled with the command:

dmd program_name.d

Here is the code for student_work_sequential.d:

/*
student_work_sequential.d
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product Store: https://gumroad.com/vasudevram
Twitter: https://mobile.twitter.com/vasudevram
*/

import std.stdio;
import core.thread;

struct Student {
    int number;
    void doSlowOperation() {
        writefln("The work of student %s has begun", number);
        // Wait for a while to simulate a long-lasting operation
        Thread.sleep(1.seconds);
        writefln("The work of student %s has ended", number);
    }
}

void main() {
    auto students =
        [ Student(1), Student(2), Student(3), Student(4) ];
    foreach (student; students) {
        student.doSlowOperation();
    }
}

Here is the output from running it, under the control of a command timing program written in Python:

$ python c:\util\time_command.py student_work_sequential
The work of student 1 has begun
The work of student 1 has ended
The work of student 2 has begun
The work of student 2 has ended
The work of student 3 has begun
The work of student 3 has ended
The work of student 4 has begun
The work of student 4 has ended
Command: student_work_sequential
Time taken: 4.07 seconds
Return code: 0

And here is the code for student_work_parallel.d:

/*
student_work_parallel.d
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product Store: https://gumroad.com/vasudevram
Twitter: https://mobile.twitter.com/vasudevram
*/

import std.stdio;
import core.thread;
import std.parallelism;

struct Student {
    int number;
    void doSlowOperation() {
        writefln("The work of student %s has begun", number);
        // Wait for a while to simulate a long-lasting operation
        Thread.sleep(1.seconds);
        writefln("The work of student %s has ended", number);
    }
}

void main() {
    auto students =
        [ Student(1), Student(2), Student(3), Student(4) ];
    foreach (student; parallel(students)) {
        student.doSlowOperation();
    }
}

Here is the output from running it, under the control of the same command timing program:

$ python c:\util\time_command.py student_work_parallel
The work of student 1 has begun
The work of student 2 has begun
The work of student 3 has begun
The work of student 4 has begun
The work of student 1 has ended
The work of student 2 has ended
The work of student 3 has ended
The work of student 4 has ended
Command: student_work_parallel
Time taken: 1.09 seconds
Return code: 0

We can see that the parallel version runs 3 times faster than the sequential one. And though there was some fluctuation in the exact ratio, it was in this range (3:1) over the half-dozen tests that I ran.

Not bad, considering that the only differences between the two programs is that the parallel one imports std.parallelism and uses this line:

foreach (student; parallel(students)) {

instead of this one used by the sequential program:

foreach (student; students) {

Also, messages about tasks started later in the sequence, appear before earlier tasks have ended, which shows that things are running in parallel, in the second program.

So (as the docs say) the parallel function from std.parallelism is a convenient high-level wrapper for some of the lower-level functionality of the parallelism module, and may suffice for some parallel processing use cases, without needing to do anything more. That is useful.

[1] The D standard library is called Phobos because Phobos is a moon of Mars (and hence a satellite of it. And Mars was the original name of the D language, given by its creator, Walter Bright (his company is/was called Digital Mars). But he says that many of his friends started calling it D (as it was like a better C), so he did too, and changed the name to D. So, Phobos (the library) is a "satellite" of D, a.k.a. Mars (the language) :) You can read an interview of Walter Bright (on the D blog) via this post:

Interview: Ruminations on D: Walter Bright, DLang creator

The post also has an HN thread about the interview. It also links to a video about D and other systems programming languages.

The Wikipedia article on Phobos is interesting. It says that the 'orbital motion of Phobos has been intensively studied, making it "the best studied natural satellite in the Solar System" in terms of orbits completed'. It also says where the name Phobos came from (a Greek god), and the source of the names of geographical features on Phobos (the book Gulliver's Travels). Also, Phobos has very low gravity (not even enough to keep it round in shape), so low that:

"A 68 kg (150 lb) person standing on the surface of Phobos would weigh the equivalent to about 60 g (2 oz) on Earth.[29]"

Maybe that is why the astronomers named some of the places on Phobos after Gulliver's Travels - because of Lilliput :)

The image at the top of the post is of Phobos and Deimos orbiting Mars.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python DLang xtopdf

Subscribe to my blog by email

My ActiveState recipes

FlyWheel - Managed WordPress Hosting

Share |

Monday, August 1, 2016

Video: C++, Rust, D and Go: Panel at LangNext '14

By Vasudev Ram

I recently came across this video of a panel discussion (in 2014) on modern systems programming languages. Yes, I know the term is controversial. Andrei and Rob say something about it. See the video.

The languages discussed are: C++, D, Go and Rust, by a panel at LangNext '14.

The panel consisted of key team members from the teams working on those languages:

- Bjarne Stroustrup for C++
- Andrei Alexandrescu for D
- Rob Pike for Go
- Niko Matsakis for Rust

(that's by language name in alphabetical order :)

Here is the video embedded below, and a link to it in case the embed does not work for you (sometimes happens).

I have only watched part of it so far (it is somewhat long), but found it interesting.

You can also download it with youtube-dl (written in Python, BTW) or some other video downloading tool, for watching offline. It's about 1.34 GB in size, so plan accordingly.

- Enjoy.

- Vasudev Ram - Online Python training and consulting
Follow me on Gumroad for product updates:

My Python posts Subscribe to my blog by email

My ActiveState recipes

Share |

Sunday, December 30, 2012

Progressive JPEGs and web performance

Performance Calendar » Progressive jpegs: a new best practice

Interesting article on web and mobile performance with regard to rendering and display of JPEG images.

Progressive JPEG FAQ:

http://www.faqs.org/faqs/jpeg-faq/part1/section-11.html

The article also gives ways to identify a progressive JPEG and a way to convert baseline JPEGs to progressive JPEGs.

And the comments, both on the original article, and in this HN thread about it ( http://news.ycombinator.com/item?id=4983073 ), are informative too.

Google mod_pagespeed, which I blogged about recently, has support for progressive JPEGs.

- Vasudev Ram
www.dancingbison.com

Thursday, December 20, 2012

Google mod_pagespeed

New mod_pagespeed: cache advances, progressive JPEGs - Google Developers Blog

Caching is one big way to improve performance, but there are other ways too.

mod_pagespeed uses many ways.

Sunday, October 28, 2012

Performance: ZeroMQ: throughput is not the inverse of latency

By Vasudev Ram

Interesting study of performance, throughput and latency (among other things) in the chapter about the ZeroMQ asynchronous messaging library, in the book Architecture of Open Source Applications (Vol. 2), which I blogged about recently.

See Section 24.3. Performance, in that chapter, for the stuff about throughput and latency.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Sunday, September 9, 2012

Internet speed tip: Use mobile version of a site, even on your desktop PC

E.g. mobile.twitter.com
or
m.techcrunch.com

Mobile versions of web pages tend to be faster to load because of overall smaller size and smaller/less images.

Another benefit is that mobile sites tend to have less of the crappy large image ads that burn up your Internet account balance (if you're on a limited plan).

I don't read TechCrunch much any more; just gave it as an example.

- Vasudev Ram
www.dancingbison.com

Monday, September 3, 2012

Performance: Scaling Reddit to millions of page views

By Vasudev Ram

Interesting post on scalability, on the High Scalability blog:

7 Lessons Learned While Building Reddit To 270 Million Page Views A Month

Of the 7 lessons, one that I found particularly interesting, was Lesson 3: Open Schema.
It talks about taking a different approach from traditional RDBMS database design, and the pros and cons of that. But the other lessons are interesting too.

Performance tuning in general is an interesting and challenging area. One book I've read about it, which is very good, IMO, is "Writing Efficient Programs", by Jon Bentley of Bell Labs. I own a copy and had read it cover to cover when I bought it - it's that good.

Excerpts from the Wikipedia article about Jon Bentley (linked above):

[ After receiving his Ph.D., he joined the faculty at Carnegie-Mellon University as an assistant professor of computer science and mathematics.[1] At CMU, his students included Brian Reid, John Ousterhout, Jeff Eppinger, Joshua Bloch, and James Gosling, and he was one of Charles Leiserson's advisors. Later, Bentley moved to Bell Laboratories.
...
He wrote the Programming Pearls column for the Communications of the ACM magazine, and later collected the articles into two books of the same name. He has published or presented over 200 papers.
...
Bentley received the Dr. Dobb's Excellence in Programming award in 2004. ]

Though it was written many years ago, and is now probably out of print (it was when I checked a while ago), it has many fascinating articles about performance tuning. What is more of interest, is that the chapters in the book talk about performance tuning at many different levels, from tuning at the level of algorithms and data structures, down to tuning at the level of individual subroutines and the code within them, and even further down the stack to the generated assembly language code and the hardware. (The book does not give actual examples of tuning at the level of hardware, but it does have plenty of "war stories", as Bemtley calls them, about real-life performance tuning cases, and one of them is an astonishing case where the performance of quicksort (IIRC), was improved by over a million times, by working at many levels of the stack, from the architecture down to the hardware.

Loop unrolling is another interesting example, where, IIRC, the speed of a binary search was increased several times. There are many other such examples and war stories.

One particular interesting one (though the code is not given, only a brief description) was a war story where someone said "I remember a young man building an interpreter for an interpreter ...", "thereby packing the program down into an incredibly small amount of space" - or words to that effect. That was actually needed in that case, because of the low memory of the target environment.

Bentley gives many tuning rules in the book, which can be applied to specific situations. Most of them are at the programming level, so are accessible to most programmers (who may not have the luxury of changing the architecture or the hardware).
He also gives situations where the rules are useful, and where they may not be useful.

UPDATE: Though I said it is out of print, I just searched and found an Amazon link for the book:

http://www.amazon.com/Writing-Efficient-Programs-Prentice-Hall-Software/dp/0139702512

- Vasudev Ram - Dancing Bison Enterprises

Share |

Saturday, August 25, 2012

Cython: combined power of C and Python; call back and forth between Python and C/C++

By Vasudev Ram

Cython - (Cython on Wikipedia) is "a language that makes writing C extensions for the Python language as easy as Python itself. It is based on the well-known Pyrex, but supports more cutting edge functionality and optimizations."

Excerpts:

[ Cython gives you the combined power of Python and C to let you:

- write Python code that calls back and forth from and to C or C++ code natively at any point.

- easily tune readable Python code into plain C performance by adding static type declarations.

- use combined source code level debugging to find bugs in your Python, Cython and C code.

- integrate natively with existing code and data in legacy, low-level or high-performance libraries and applications.

]

Excerpts from Wikipedia:

[

Cython is particularly popular among scientific users of Python,[11][16][17] where it has "the perfect audience" according to Python developer Guido van Rossum.[18] Of particular note:

The free software Sage computer algebra system depends on Cython, both for performance and to interface with other libraries.[19]
Significant parts of the scientific and numerical computing libraries SciPy and NumPy are written in Cython.[20][21]
Cython's domain is not limited to just numerical computing. For example, the lxml XML toolkit is written mostly in Cython, and Cython is used to provide Pythonic bindings for many C and C++ libraries ranging from the graphics library OpenGL[22] to the messaging library ZeroMQ.[23]

]

- Vasudev Ram - Dancing Bison Enterprises

Share |

Wednesday, May 2, 2012

Torbit Insight launched

http://torbit.com/

It helps you track page load speed per user page page, and correlate speed with revenue, says the site. I had blogged about Torbit some time ago.

- Vasudev Ram
www.dancingbison.com

Friday, August 12, 2011

Google Chrome Beta to support C and C++ via Native Client - NaCl

By Vasudev Ram - dancingbison.com | @vasudevram | jugad2.blogspot.com

Seen on ReadWriteWeb and TechCrunch.

Some of the benefits claimed for Native Client (NaCl) are better performance via leveraging modules written in C and C++ in your web apps, re-using legacy code written in those languages (and there is, of course, tons of that around, though some parts may have to be modified to work with NaCl), and all this, while still maintaining security, due to the "double sandbox" model that NaCl apps will use.

NaCl is the chemical formula for common salt, and in a Google-ish play on words, the API that developers will use to create such apps is called the Pepper API.

Excerpts:

[ Native Client allows C and C++ code to be seamlessly executed inside the browser with security restrictions similar to JavaScript. Native Client apps use Pepper, a set of interfaces that provide C and C++ bindings to the capabilities of HTML5. As a result, developers can now leverage their native code libraries and expertise to deliver portable, high performance web apps. ]

The links to the articles:

Google Chrome Beta Now Supports C/C++:

http://www.readwriteweb.com/cloud/2011/08/google-officially-announces-cc.php

Google Unleashes Native Client Into Chrome, Next-Gen Web Apps To Follow?

http://techcrunch.com/2011/08/11/chrome-native-client/

The Google announcement about NaCl support:

http://chrome.blogspot.com/2011/08/building-better-web-apps-with-new.html

Posted via email
- Vasudev Ram @ Dancing Bison

Friday, May 13, 2011

$3 million funding for open source Scala / Typesafe Stack and company

By Vasudev Ram - www.dancingbison.com

I found this news interesting:

Seen on VentureBeat (and first on Hacker News).

Typesafe raises $3M for cloud and multi-core software development tools:

http://venturebeat.com/2011/05/12/typesafe-raises-3m-for-modern-software-development-tools/

Excerpts:

[ Typesafe ... has raised $3 million in a first round of funding. ]

[ The Cambridge, Mass.-based company is also introducing today its open source Typesafe Stack, which integrates ... the Scala programming language, Akka middleware and development tools. Scala ... takes advantage of multicore hardware and cloud computing. Scala is used by some of the world's highest-trafficked web properties such as Foursquare, Twitter and LinkedIn. ]

[ Greylock, the investment firm whose roster includes LinkedIn founder Reid Hoffman, made the investment. ]

[ Martin Odersky, chief executive of Typesafe, created the Scala (which stands for scalable language) programming language in 2001 at the Ecole Polytechnique Federale de Lausanne and launched it seven years ago. It runs on top of the Java Virtual Machine and is interoperable with Java. ]

[ Chris Conrad, engineering manager at LinkedIn, says Scala is a powerful programming tool that offers scalability and efficiency ]

[ Alex Payne, former platform lead at Twitter ... said that Scala played a critical role in improving the scalability and reliability of Twitter's backend services ]

Links:

http://www.typesafe.com/

http://www.greylock.com/

Posted via email
- Vasudev Ram - Dancing Bison Enterprises

jugad2 - Vasudev Ram on software innovation

Pages

Tuesday, December 20, 2016

Simple parallel processing in D with std.parallelism

Monday, August 1, 2016

Video: C++, Rust, D and Go: Panel at LangNext '14

Sunday, December 30, 2012

Progressive JPEGs and web performance

Thursday, December 20, 2012

Google mod_pagespeed

Sunday, October 28, 2012

Performance: ZeroMQ: throughput is not the inverse of latency

Sunday, September 9, 2012

Internet speed tip: Use mobile version of a site, even on your desktop PC

Monday, September 3, 2012

Performance: Scaling Reddit to millions of page views

Saturday, August 25, 2012

Cython: combined power of C and Python; call back and forth between Python and C/C++

Wednesday, May 2, 2012

Torbit Insight launched

Friday, August 12, 2011

Google Chrome Beta to support C and C++ via Native Client - NaCl

Friday, May 13, 2011

$3 million funding for open source Scala / Typesafe Stack and company

Blog Archive

Labels