May 242017

In follow up to my last post on C++ performance analysis, I wanted to let people know about another cppcon 2016 talk called “Want fast C++? Be nice to your hardware” by Timur Doumler. Timur gave a great talk in my opinion, allowing the audience to go and dig deeper at their own discretion if any of the particular topics appeal to them. This talk has a bit more C++ than the previous talk I posted, which I appreciate.

Topics covered:

  • Data and instruction cache
  • Cache levels (L1, L2, L3,…)
  • Cache lines (typically 64 byte on desktops)
  • prefetcher
  • cache associativity
  • pipeline
  • instruction level-parallelism
  • branch predictor
  • memory alignment
  • multiple cores
  • SIMD

Too long didn’t watch (though I highly recommend you do!):

  • Be conscious whether you’re bound by data or computation
  • prefer data to be contiguous in memory
  • If you can’t, prefer constant strides to randomness
  • Keep data close together in space (e.g., putting data structures that are used one after another into a struct)
  • keep accesses to the same data close together in time
  • Avoid dependencies between successive computations
  • Avoid dependencies between two iterations of a loop
  • avoid hard-to-predict branches
  • be aware of cache lines and alignment
  • minimized the number of cache lines accessed by multiple threads
  • don’t be surprised by hardware weirdness (cache associativity, denormals, etc)

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">