Steamroller 依然是 128-bit vector unit,2 Vector Integer ALUs + 2 FMAC hardware,只不過從 4-issue 降 ...
Puff 發表於 2012-10-24 09:27 PM



    純粹問下
咁如果唔係用int多過fp嘅
點解amd 要2x int 而唔係 2x fp

TOP

提示: 作者被禁止或刪除 內容自動屏蔽

TOP

本帖最後由 Puff 於 2012-10-25 06:53 編輯
純粹問下
咁如果唔係用int多過fp嘅
點解amd 要2x int 而唔係 2x fp
cheungmanhoi 發表於 2012-10-25 00:50

無直接關係. Design choice. Shared fp&int vec for better utilization & efficiency
Int vec to FP vec 依然係 1:1

TOP

回復  willy930
整數SSE/AVX實際係用MMX Unit運算,唔係用x86內核跑 ,不要因為佢個名叫MMX就以為佢 ...
BMS 發表於 2012-10-24 17:34


Of cause I know the MMX/SSE/AVX INT is not being run on the INT cores.

The issue is how did come up with that math?

What is the nature of the applications? Is it all AVX instructions? Is it some AVX instructions, some SSE/MMX instructions? Or just SSE/MMX instructions.

If that MMX unit is 256 bit wide, it can process the following cases.

1) 256 bit integer
2) 2 * 128 bit integers
3) 4 * 64 bit integers
4) 8 * 32 bit integers

and so on.

Different cases have different workload. No reasons to pick the worse case scenario to talk about because I don't think there will be an application to run with all AVX INT. And AVX is too new, it won't affect us right-away.

TOP

本帖最後由 BMS 於 2012-10-25 17:08 編輯

因為我這個論調唔太科學,所以放咗一整年到現在才拿出來說
大多數人都認為A記個溫度唔準,但我認為那個可當做間接的“漏電監視器”(額外耗電/廢熱)

內核低於表面溫度10度以上,通常會是低於室溫的狀態,我認為在這時候的漏電率是比較低
內核和表面溫度約有10度左右間距,正常全速高負載基本就會變成這樣
內核高於表面溫度,你超頻加電也超太勁啦~

Power.png
圖中例子(最大溫度),由左到右1至3是持續負載,最右4大部分是瞬間負載
3的溫度清淅告訴你,你油佢自由升頻只會額外浪費廢熱/廢電
非常建議使用K15的用家,平常普通使用真的限個速好,當時BD未推出前坊間流傳可能最高2.8GHz,可能真的不是騙人
從2.7GHz至3.3GHz的內核溫度提升幅度睇,再上面的速度真的懷疑是超出來,32nm工藝最理想的性耗比可能真的只到2.xGHz

PS:CPU風扇增速設置是45度以上,所以1至3為相同風速狀態

待機不好看是因為用的不是4熱管或以上,有熱門,雖然風扇開盡可以再低一、兩度,但有D無謂
這個待機溫度是和原裝散熱器一樣,換咗和未換待機一樣,當然都是用低風速
根據網上資訊,4或6-8熱管就可見2x-3x度,介意就買貴D吧

TOP

本帖最後由 BMS 於 2012-10-25 17:01 編輯
Steamroller 依然是 128-bit vector unit,2 Vector Integer ALUs + 2 FMAC hardware,只不過從 4-issue 降 ...
Puff 發表於 2012-10-24 21:27

看來256bit整數總算最多是4個

If that MMX unit is 256 bit wide, it can process the following cases.

1) 256 bit integer
2) 2 * 128 bit integers
3) 4 * 64 bit integers
4) 8 * 32 bit integers
willy930 發表於 2012-10-25 15:05

煙條浮點是256bit單元,如果可以這麼做,煙條跑128bit 浮點SSE應該也可以4核+HT跑出8倍性能才對,不用double去用256bit 浮點AVX吧

TOP

本帖最後由 Puff 於 2012-10-25 18:12 編輯
看來256bit整數總算最多是4個
BMS 發表於 2012-10-25 17:00

Up to 10 Flex FP Unit per processor. 依然是 128-bit datapath,也當然是 MCM 了。

煙條浮點是256bit單元,如果可以這麼做,煙條跑128bit 浮點SSE應該也可以4核+HT跑出8倍性能才對,不用double去用256bit 浮點AVX吧

還是那句,整個 vector engine block 是共享的,在 scheduling 時是不會管你是什麼線程,只會管 data availability. 換句話說,即使 4 核加 HT,也並不等於 4 核的 200% 的性能,因為資源是共用的、有限的,就看要跑的兩條線程正要跑的程式是怎樣的特性才能下判斷。

TOP

看來256bit整數總算最多是4個


煙條浮點是256bit單元,如果可以這麼做,煙條跑128bit 浮點SSE應 ...
BMS 發表於 2012-10-25 17:00


First of all, HT is only useful when a core is not full-loaded by a thread. If a core is running at 100% already, HT is useless.

Secondly, say there are 2 threads, both has one 128 bit instruction, HT does not mean you can assign a 128 bit instruction from another thread to run together with the 128 bit instruction of this thread. You can only run two 128 bit instructions from the same thread at the same time.

Thirdly, going 256 bit, means more data can be processed at once. Higher throughput.

TOP

看來256bit整數總算最多是4個


煙條浮點是256bit單元,如果可以這麼做,煙條跑128bit 浮點SSE應 ...
BMS 發表於 2012-10-25 17:00


SandyBridge本身係用128-bit FP function去做256-bit AVX
(注意唔係全部都係256-bit data path)

?9d7bd4

TOP

無直接關係. Design choice. Shared fp&int vec for better utilization & efficiency
Int vec to FP vec  ...
Puff 發表於 2012-10-25 06:52


有直接關係
而家BD的2 INT + 2 AGU, 係根據program analysis而得出的

TOP