Your browser either does not support Javascript or you have it disabled. Please enable Javascript to be able to navigate our site and utilize features.

Montavista


Archive for the 'multicore' Category

Cache: the key to multicore performance

Friday, October 10th, 2008

If you attended my multicore webinar last month then you know that I’m a big fan of looking out the window at existing open source applications and seeing how they tackle performance issues. The webinar used the Apache HTTP server as an example. A recent Intel engineering study uses a modified version of snort as a test application for improving multicore performance.

The Intel study, commented extensively on by Lori Matassa, indicated performance scaling on a 4 core system of 6.2x. This is notable because the generally expected benefit from adding a core is typically slightly under 1. For example moving an application to a 4 core system and then getting a 3.8x performance boost would be expected and would likely require some work to attain.

The Intel engineering study is notable because it indicates something that experienced practitioners have always known: efficient cache usage is a critical factor in multicore performance. It simply makes no sense to play games elsewhere until you’ve got a grasp on cache efficiency and have maximized that aspect of your system performance.

The Intel engineering study did something interesting with “flow pinning”. Each TCP flow through the system was handled, for the lifetime of the flow, by a single assigned core. This improves cache efficiency by optimizing locality of reference.

The Intel paper also prompts some thoughts in my mind regarding those who are migrating RTOS applications from their dead-end platform to the new funky Linux world. A vogue thought these days is that a virtualization platform can be used to just run your RTOS side-by-side with the new Linux platform. My concern is that taking a non-multicore aware RTOS based application and just moving it to a new multicore processor implies either no cache efficiency or a decrease in cache efficiency. The RTOS based application never had multiple cores and hence has no awareness or ability to do flow-pinning as discussed in the paper.

Potentially a better approach is to migrate the RTOS application’s algorithms to a consistent Linux platform and then do cache optimization work now that you’ve eliminated other variables and have design flexibility. If you want to learn more about RTOS to Linux migration we’ve got an upcoming webinar on that, too.

Take a read… the paper is compelling. 6.2x performance boost on 4-cores is impressive.

VDC survey on multicore indicates reality has a firm grasp on the developer community

Thursday, August 7th, 2008

VDC (a research firm covering developers of embedded systems) recently unveiled a “teaser” press release commenting on developer attitudes towards multicore architectures. The teaser said “…embedded developers still do not yet consider multiprocessing and multi-core architecture support a highly critical factor influencing their selection of embedded operating systems for current projects.”

Well… isn’t that confusing? Is multicore a flop? Did Intel, Cavium, ARM, TI, and Freescale all place the wrong bet?

My opinion is, no, multicore isn’t a flop. This survey is, however, a symptom of a few inconvenient truths.

Most applications likely can’t exploit multicore without significant re-engineering.

Recent discussions with several customers helps to confirm some of this thinking. I’ve heard all sorts of misconceptions about multicore and what developers need to do to exploit it. There is some real wishful thinking going on in some minds about how throwing 8 processors at a single threaded application will make the whole job GOFAST! right now.

We had a discussion in the office a while ago that was sparked by this informal ESC poll about multicore adoption:

Only 51 percent of respondents said they have applications running on multicore CPUs or will migrate to multicore processors in the next 3-5 years. A whopping 49 percent said they neither use multicore chips nor plan to in the next five years.

[Note that this was just a casual survey by Freescale and Virtutech. Likely they just walked around the show floor and asked folks what they thought. VDC runs a good shop and employs strong survey methodologies with secondary sources.]

The discussion got started with this quote from Donald Knuth that was shared in a recent article:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the Itanium approach that was supposed to be so terrific until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.

I know that important applications for parallelism exist… rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

The office debate that followed was interesting. It boiled down to:

  • There well could be many uses for multicore processors that don’t have anything direct to do with the implementation of multicore aware algorithms. Pinning a critical algorithm to a dedicated processor is one example. More efficiently implementing virtualization for its many use cases is another. All of these are in effect using multicore inefficiently from an algorithm standpoint but delivering benefit nevertheless.
  • Multicore transition is a marathon, not a sprint.
  • Many of the edge cases Knuth highlights as suitable for multicore are actually typical in the world of embedded devices.

Heterogeneous multicore processors are hotrods… but application developers need huge motivation or lots of help to use them.

I first heard wiffs of this in discussions with a company that makes consumer products based on a popular ARM+DSP architecture. While we certainly are aware of companies that will employ DSP developers to exploit this heterogeneous multicore chip this particular company didn’t want to be in that business. They would rather have the DSP code pre-cooked and provided to them by the vendor or ISV partners.

You’ve seen this in graphics processors for years. Most everyone has a heterogeneous multicore computer on their desk now. The graphics processors have a slew of processors that can be used, in some cases, for running algorithms other than graphics processing. Specialty companies even exist to help you exploit these capabilities.

Exploiting the GPU is still, however, a niche activity for engineers.

Intel’s upcoming Larabee graphics processors will be an interesting test. Larabee is described as a multicore x86 based GPU. Will introducing a familiar instruction set remove a barrier to running non-graphics algorithms?

This is a broad market survey.

VDC typically surveys a wide swath of developers who are building all sorts of applications. If you segmented the market and isolated developers who are likely to have applications that would benefit from multicore you’d likely have a different view.

Wrap up

I don’t share’s Knuth’s opinion that multicore could end up being Itanium like in any way. There is a demonstrated track record of successful multicore applications in the embedded software domain.

I do wonder if in the broad embedded systems market if multicore adoption will lag or the multicore architecture will be used inefficiently to serve other important use cases facing designers.

Will alternate languages take hold?

Thursday, July 10th, 2008

An often discussed approach to addressing multi-processing or multicore system architectures is that the programming languages utilized by most applications do not lend themselves to distributed and concurrent systems. Erlang typically comes up quickly in these discussions as an example of a language which has a set of language and runtime features intended to support concurrent execution.

You really owe it to yourself to watch the first few minutes of this video about Erlang from the creators. No… this is not a parody. It is just old.

There are a fair number of Erlang fans out there. Some compelling applications including Facebook’s chat system and Amazon’s Simple DB service have been created using Erlang. Sometime I ought to sit down and learn Erlang. I have my doubts, though. There just seems to be a limit to the adjustments developers will make to adopt a new technique or reach a desired improvement. Sure… Erlang might be the choice if you are doing an application that would be profoundly difficult without using Erlang. Would a developer chose Erlang for an application that looks moderately difficult but is just hard to get correct using standard languages like C/C++? I’m not so sure about that. Just human nature.

We can take some lessons from Erlang and use them as we consider architectures for multicore applications.

  • There is a strong trend towards stateless application architectures. Shared state is the bugaboo in many of these concurrent systems. Some very large distributed systems are seen exchanging very popular system attributes (such as guaranteed simultaneous read consistency) in exchange for minimizing the state entanglements.
  • Asynchronous loosely connected sub-systems. Structuring the application so that a human can still grok the functionality of the sub-system and can make progress. [ Before you start thinking RPC… here’s an interesting viewpoint. ]

So maybe Erlang-think ought to be part of your next concurrent multicore application?

It’s the application, stupid!

Wednesday, July 2nd, 2008

We now stand at the cusp of wide-scale popular deployment of multicore processors into the realm of retail computing. Yes, chips are out there now but with each successive generation of PC’s deployed the percentage of multicore capable chips increases.

We are also, of course, seeing the advancement of high end 32 and 64 bit embedded processors stepping deep into multicore. 2 core Broadcom MIPS has been around forever… 8 core Freescale P4080’s on the horizon… specialty 16 core Cavium processors have been popular in many customer designs. Intel even unveiled an 80 core prototype system at a conference.

One of the most popular usages envisioned for multicore processors is usage as a virutalization host. Not a bad idea for the right use cases. There are, however, some natural limitations to the number of use cases that can be sustained by simple virtualization alone. In telecom, for example, simply encapsulating formerly isolated applications in a virutal container and then consolidating them onto a multicore virutalization host can dramatically complicate failure planning. The fault domain will suddenly encompass many formerly unrelated functions.

I’m very much pro-virtualization but there are some use cases that have been popularized by the enterprise usage of virtualization that likely won’t translate into telecom.

But this blog post was about multicore, wasn’t it? Back to multicore.

I’ve seen a prevalent meme circle the trade-show world that multicore will be best exploited by creative stacking of our existing product building blocks in ever more dramatic ways. Reconfigure ye’ole RTOS to be a hypervisor. Stack Linux on top, on the side, inside, alongside, or outside. Or get a whole new hypervisor. Repeat the stacking drill. Make Linux the hypervisor. Stack yet again. Before long we’ll have a hyper-hypervisor and we’ll need these kids with cup stacking skills like these to make it work.

I’m making fun, of course. There really are use cases for this stacking mania. My concern is that when you look at enterprise computing they are taking a very different approach to using multicore. Since the popular adoption of Linux has brought enterprise and embedded closer together what they do is increasingly relevant to what we ought to do.

The real action mid-term is going to be at the application level, not in the OS stacking games. It what goes in the cups, not how you stack them, that will matter the most.

“This (move to multicore) presages a change where the industry at large, the whole concept of applications, will ultimately have to be restructured in order to think about how to take advantage of these machines, because they won’t just get faster every year. They’ll get more powerful, but in fact only if you’re able to master these problems of concurrency and complexity.” - Craig Mundie, Microsoft. Quoted from All About Microsoft

Yep… concurrency and complexity. No surprise there. Concurrency has always been hard and there are some natural limitations to the complexity of an application. Past a certain point and one just can’t hold it in your head.

Does there need to be a radical simplification of the methods the average developer uses to exploit concurrent execution?

Debugging is at least twice as hard as writing the program in the first place. So if your code is as clever as you can possibly make it, then by definition you’re not smart enough to debug it. — Brian Kernighan

Yeah, probably so. Odds are the training and experience that is most typical amongst engineers (C and POSIX threads) won’t cut it in the new “guess we’ll have to live with multicore processors” era. I’ve heard as much from customers I’ve been in conversation with.

I’m starting this exploration of what potential techniques and technologies exist to help developers cope with concurrency and complexity to educate myself. I’m not a big believer in futurist prognosticating on what will be in 10 years. I’m a little more pragmatic and would rather look at what those in parallel fields who are solving problems like mine are doing right now. Learn fast from those one step ahead.

I think the fields to keep your eyes on when it comes to exploiting multicore will be realtime financial modeling and web-scale computing. They’ve got the biggest payoff for getting it right and the biggest challenges to solve, respectively. Both fields are also leading adopters of open source technology and sponsor contributors.

We’ll touch on cloud computing next time but for now let’s get started with some of the presentations from the recent Google sponsored Seattle Scalability Conference.

Transactional Memory (TM) was one of the interesting techniques reviewed. Pick up the presentation from the conference link above and follow along with the video from presenter Vijay Menon.

There are some open source implementations of Softer Transactional Memory (STM) available, too: See Rochester’s STM implementation for C++.

I’m also actively trolling for papers to read. Here’s a stash.

Thanks for joining me on this self-educational journey. If you want to help please drop some comments in.

Brad

Freescale Technology Forum (FTF) QorIQ P4080 Demo

Friday, June 20th, 2008

I’m back from FTF now and I’ve got a little more to share about the demo I spent most of my time standing in front of. I was showing off the new Freescale QorIQ P4080 running MontaVista Linux in an SMP configuration:

The system is booting, debugging, and running applications including a full userspace configuration. Our DevRocket IDE was in the demo as an example of debugging memory usage, memory leaks, and multi-threaded debug. We’ll have broader tool support up and running as engineering continues.

This is all exciting for a chip that has yet to be manufactured.

Brad

Developer Resources
Contact Us      Careers      Resource Download Library      Meld Community      Request Information            Feeds of news, blogs, and more

©2010 MontaVista Software, LLC. All Rights Reserved