Cache: the key to multicore performance
Friday, October 10th, 2008If you attended my multicore webinar last month then you know that I’m a big fan of looking out the window at existing open source applications and seeing how they tackle performance issues. The webinar used the Apache HTTP server as an example. A recent Intel engineering study uses a modified version of snort as a test application for improving multicore performance.
The Intel study, commented extensively on by Lori Matassa, indicated performance scaling on a 4 core system of 6.2x. This is notable because the generally expected benefit from adding a core is typically slightly under 1. For example moving an application to a 4 core system and then getting a 3.8x performance boost would be expected and would likely require some work to attain.
The Intel engineering study is notable because it indicates something that experienced practitioners have always known: efficient cache usage is a critical factor in multicore performance. It simply makes no sense to play games elsewhere until you’ve got a grasp on cache efficiency and have maximized that aspect of your system performance.
The Intel engineering study did something interesting with “flow pinning”. Each TCP flow through the system was handled, for the lifetime of the flow, by a single assigned core. This improves cache efficiency by optimizing locality of reference.
The Intel paper also prompts some thoughts in my mind regarding those who are migrating RTOS applications from their dead-end platform to the new funky Linux world. A vogue thought these days is that a virtualization platform can be used to just run your RTOS side-by-side with the new Linux platform. My concern is that taking a non-multicore aware RTOS based application and just moving it to a new multicore processor implies either no cache efficiency or a decrease in cache efficiency. The RTOS based application never had multiple cores and hence has no awareness or ability to do flow-pinning as discussed in the paper.
Potentially a better approach is to migrate the RTOS application’s algorithms to a consistent Linux platform and then do cache optimization work now that you’ve eliminated other variables and have design flexibility. If you want to learn more about RTOS to Linux migration we’ve got an upcoming webinar on that, too.
Take a read… the paper is compelling. 6.2x performance boost on 4-cores is impressive.



