正 文


(2009-2-8 13:35)
    --Write Through
    --Write back
    只是写到Cache里,Memory的内容要等到cache保存的要被别的数据替换或者系统做cache flush时,才会被更新。

    把Cache内容写回Memory,当Cache为Write through,不需要Flush

    当有DMA在使用memory的时候,一般要用到cache的处理。因为DMA在访问memory时是不经过cache的。比较典型的比如在Ethernet,wireless,USB等driver里,DMA会操作descriptors和packet buffers,Driver要做这些处理
    --如果driver使用descripter和packet buffer的地址都是cache的地址,那么
    a).Driver在读descripter里一些状态比如Owned by CPU/DMA,有没有收到包时,要对descripter当前结构里的内容做cache invalidate,收到packet后,也要对packet buffer做cache invalidate
    b).Driver在写descripter里一些状态比如Owned by DMA,要发送包时,要对descripter当前结构里的内容做cache flush,发送packet时,也要对packet buffer做cache flush
    --有些driver会对descripter使用uncache 地址,那么上面两种情况里invalidate/flush就不用做了。一般很少会对packet buffer也用uncache地址的,因为对packet内容的处理将会很频繁,使用uncache会很慢。而descripter一般由于结构比较小,如果也使用cache地址的话,做invalidate/flush的时间消耗可能会比uncache的还要多。


Cache Cohernce with Multi-Processor



刚写完一篇关于Cache Coherence的文章,就发现BNN2年前就有一篇好文,早知道就不这么费事自己写了:)

Recently work with dual cpu kernel part. For dual cpu, or we say, multi-processor, the big challenge part for a kernel is how to handle the cache coherence.

Conceptually, two choices--Write Invalidate and Write Update.

We will talk about Write Invalidate today.

Typically, there are two protocols falling into the Write Invalidate protocol, namely, the Write-Through Write Invalidate Protocol and the Write Once(Or Write-Back Write Invalidate Protocol). Note that the well-known MESI protocol is derived from the Write Once. That's why we will focus on the Write Once here.


Write Once:

Write Once protocol is to offset the shortcomings of Write-Through Write-Invalidate Protocol, which will introduce extra bus traffic onto the system bus.

Write Once works basically as follows:


* the hardware snoop enabled over the shared system bus.

* The cache is Write Back


There are four states for Write Once protocol--Valid, Reserved, Dirty and Invalid.

Initial State:

* When a LOAD MISS, the data will be loaded into cache line and the state goes to VALID state

// Please note here, Write Once protocol will promise that your loaded data in memory will be the latest. Why? If at the time you tried to load a cache line, there is an modified copy in another CPU, the snoop protocol will abort the load bus transaction; flush another cpu's data into main memory and then resume the aborted transaction so that the requesting CPU will then get the updated data....

Now, let's investigate the state machine of Write Once Protocol.


VALID State:


When a LOAD HIT, we do nothing. That's right. The cache line is already here. CPU is happy to find the data in the cache.

When a LOAD MISS, we will re-start the init procesure to load the latest data into cache line.

When a CPU STORE HIT(a store hit from the current processor) , Now comes to the key part of Write Once protocol. When having a write/store behavior, for UP(unique processor) system, we all understand that the cache state will go to DIRTY state and ****didn't write data back to main memory****. However, Write-Once protocol works like this below in order to achieve multiple processor cache coherence.

The stored data will be flushed back to main memory(why? We need a bus transaction over the bus!!!) and then cache state will be moved to Reserved State.

This is exactly why this protocol is given the name of "Write Once"---Write the first time write access to a write-back cache line into main memory*****!!!! so that other processor cache controller will be awared and then invalidate the corresponding cache lines, and thus the whole system will only one copy of the cache line.

After the first time write once, the subsequent write access will only change the state to DIRTY state and the data will stay in cache line and will not be flushed into main memory, the same case as we see in UP write-back approach.

When a SNOOP STOEE HIT( we found another CPU is trying to do a store on that cached address), then, with the write-invalidate semantics, we know, the system will then invalidate its own copy in this current processor in order to keep only one legal copy for that cache line. In other words, the state will go to Invalid state from Valid state. Note that, we don't have to do any flush. The reason is simple: In this processor, we didn't do any write yet. So we will only invalidate our own copy in this processor. In the later, if we want to read this particular data, we will have to load it from main memory again.

For VALID state, we have other input needed to be considered, like

snoop load hit

cpu store miss

How cpu will react with these two actions?

I will leave questions to you guys......

Functionality: Modelling:

This component models a memory cache suitable for use at different levels of the memory hierarchy. It provides a bus interface and connects to another bus, providing a transparent pass-through. In this documentation, "CPU" and "main memory" are synonymous for "upstream bus" and "downstream bus", as this is the most common usage (but not the only possible).

The parameters of the cache are a matter of configuration. At instantiation time, the following parameters are specified:

  • cache size in KB (1, 2, 4, 8, 16, 32, 64, 128, 256, 512)
  • line size in bytes (16, 32, 64, 128)
  • associativity (direct, full, 2way, 4way)
  • replacement policy for N-way and fully associative caches (lru, fifo, random)

For a 16KB cache with a line size of 32 bytes, 2-way set associativity and a "least recently used" (LRU) replacement policy, the component name is hw-cache-2way/16kb/32/lru. For direct mapped caches, replacement policies are not applicable and should be omitted from the component name, such as hw-cache-direct/64kb/16. This particular 64KB direct-mapped cache configuration is also known by the type name of hw-cache-basic.

tag calculation

The size of a tag is dynamically computed based on the line size. Unlike physical caches which economise on the number of tag bits to reduce hardware costs, the model uses a full address, but discards the redundant bits that can be inferred by a bytes position in the cache line. For example, a 32 (2^5) byte line uses 27 bits for the tag.

hash algorithm

A simple hashing algorithm is used to select a set from a target address. The algorithm uses values from hash-bit-mask and hash-shift-amount to compute: These two values must be chosen carefully to ensure good cache utilisation. In particular, the "all-ones" value of mask should not exceed the number of sets in the cache.

misaligned accesses

The component does not handle memory accesses that are not aligned on the natural boundary of the data being referenced. In such cases, the cache is bypassed and memory is accessed directly.

write strategy

When a write is made to the cache, the write-through? attribute determines if the data will be simultaneously written to the memory. Otherwise, writes will only be made to the cache and will not be synchronised with main memory until the line is flushed due to line replacement or an explicit flush (see Flushing).

In the case of a write miss, the write-allocate? attribute specifies the component's action. If this attribute is set to yes, then a miss will cause the missed line to be loaded into the cache in anticipation of future references.


The component supports prefetching of data into the cache by driving prefetch with an address. If, due to the line replacement policy, the prefetch cannot be performed, this operation has no effect.


If dirty lines are flushed from the cache, the component will ensure that their contents are synchronized with main memory. Some architectures provide a facility for explicitly flushing a line to memory. For this purpose, the component provides flush which can be driven with an address. If the address falls on a line that is present and dirty, it will be flushed to memory and marked as not dirty. A line can be flushed and invalidated in one atomic operation by driving the flush-and-invalidate pin. The entire cache can be flushed by driving flush-all.


Lines in the cache that contain accurate contents are marked as valid. Some architectures provide a facility for explicitly marking a line as invalid so that future accesses will cause a new memory access. For this purpose, the component provides invalidate that can be driven with an address. If the address falls on a line that is present, it will be invalidated. No consideration is made for dirty lines, so a line should be flushed before being invalidated. A line can be flushed and invalidated in one atomic operation by driving the flush-and-invalidate pin. The entire cache can be invalidated by driving invalidate-all.

line locking

The component supports locking lines in the cache to prevent them from being removed to accommodate more recently referenced lines. A line can be locked by driving lock with any address that falls on the line. Subsequently, a line can be unlocked by driving unlock.

memory latency

The component models the effects of memory latency. The hit-latency and miss-latency values specify the cumulative latencies for hit and missed cache operations. Any misaligned accesses are penalised as if they are a miss. Cache line refills incur an additional latency, specified by the refill-latency attribute.

statistics gathering

The component gathers statistics for a number of significant events and records them in read-only attributes. The collection of statistics may be disabled using collect-statistics?.

statistics reporting

The component will write a summary report of the statistics it collects to standard error when report! is driven. The report-heading value, prepended to the report, allows reports from multiple caches to be distinguished.

SID Conventions
functional supported -
latency supported -

评 论
博 主
建立时间:2006-11-23 20:52
导 航
公 告
Locations of visitors to this page 本博客主要用于个人学习与资料收藏。当然大家应该读了之后也能学到不少东西。其中大多数资料都是来自网络,我转载时尽可能地表明文章出处与原作者姓名,但由于很多资料经多人转载,已不清楚原作者信息与出处,所以未表明相关…
评 论
链 接