Thursday, December 21, 2006

A Lot Has Happened... Not Interesting Though...

A lot has happened in the past month or two. But this stuff can barely be called interesting. Intel, as expected, launched Kentsfield and Clovertown. Nothing earth-shattering there. The only surprise, to some extent, was that Clovertown came with 1333 MHz bus. As expected, AMD launched 4x4 (or QuadFather or Quad FX), which , to a great extent, was a huge disappointment. Performance barely matching that of 2.6 GHz Kentfield with twice as much power consumption!! The upside on the Green side though was that the 65nm rolled out, and Barcelona was demonstrated. But overall, nothing earth-shattering on either side, and hence, nothing much to talk about.

In fact, if you look at Dr. Sharikou's blog, it is also getting uninteresting. Yes, in a masochist way, I do believe that his blog is interesting. But when nothing new is happening, even the great doctor is having trouble keeping up the pace...

So, what to write about?

Well, on the home front, I got LinkSys WRT54GL router, and installed dd-wrt firmware on it. The router is really cool. The configuration flexibility that this router offers at around $60 price tag is just amazing. You can assign static IP addresses to DHCP clients, set up PPTP server, configure PPPoE, and lot more. If you are in market for a new DSL/Cable router, definitely take a look at this nifty device. It is slightly more expensive than the others, but the device is well worth it. What else? I also got a 1.2 Terabyte NAS to back-up my 600 GB C2D system. I am using Acronis TrueImage 10.0 for backup. The software works great. My home network is now getting decently crowded. Once desktop, two laptops, one NAS, one VOIP router, one printer, and a PDA. All connected through a gigabit switch and the Linksys router. I love gadgets!

Sorry for not posting

Guys! I am extremely sorry for not posting. I was on vacation :) for almost a month. Will post something soon.

Monday, October 23, 2006

Intel is pushing AMD to even more niche markets

On Friday, Intel demoed its Tigerton processor--a quad-core, Core2-based beast that sits on a platform with 4 independent buses. Now that is definitely going to make some heads turn.

Let me reiterate, the Core2 microarchitecture is vastly superior to the K8 micro-architecture, with K8L addressing only some of the gaps. The only upper edge that K8L has is its interconnect architecture and integrated memory controller, that gives it better memory bandwidth and lower latency. However, with four independent FSBs, Tigerton is to 4P what Woodcrest is to DP, and we have already seen what Woodcrest is capable of (AMD even acknowledged that they are facing competition in 2P with Intel claiming they regained some lost market share in 2P). The four independent buses will vastly alleviate Intel's bandwidth problem, while large caches and smart prefethers can mostly nullify the latency advantage. Esentially, I expect Tigerton to be the new king of 4P and rule them all for a while. K8L with HT3 can show some advantage in some memory-intensive loads, but with Barcelona stuck at three HT2 links, Tigerton will have full three to six months of unchallenged supremacy.

Why does it matter? Well, frankly I don't think it matters that much to Intel as far as revenues or profits are concerned. 4P is already niche. But considering how AMD is strongly reliant on its 4P business, it has potentil of making the marketplace more difficult for AMD.

And please, don't get started on vaporware argument. Intel has shown a Tigerton system running. All that AMD has shown in a wafer containing some huge K8L dies.

Sunday, October 01, 2006

Torrenza and 4S

I honestly think that 4S is reaching the end of the road. When you start putting so many cores into a single package, who needs 4P? 4P is already such a narrow market... The increasing power of 2P will put even more pressure on this already niche market!

Consider this for a moment: If Intel adds Dempsy-style internal bus to quad-core Penryn, then the 4 cores will look like just a single load to the FSB, and arguably, Intel will be able to pack two of these qaud-cores onto the same package (if the package has enough space). So, in theory, a Dempsy-style internal bus will allow Intel to have 8-core chips by the end of next year. If this happens (I understand that that is a big if, but Intel is desperate to claim sustained leadership, thus who knows), we are looking at 2P systems with 16 cores!! Going forward, rumors are that Intel will reintroduce its hyperthreading. That will put 32 logical processors on a 2P system. This is bound to make the 4P market ridiculously niche. So what does that mean? Does Intel really need the CSI? Afterall, the dual FSB is more than sufficient for the 2P market...

In my opinion, it does need CSI. The problem is not 4P or 8P--that is a dying breed. The real problem is Torrenza--AMD's ability to couple third-party processors with its own. At IDF, Intel announced that it will open up its FSB to third parties. Pardon my french, but who gives a f&*^ing @#$%? Why would anyone want to put their co-processor on an FSB that Intel is always in a hurry to upgrade? With HT, you can arguably negotiate the different links at different speeds, and hence the third parties do not have to upgrade their HT logic to keep up with AMD. On the other hand, since the entire FSB system will be limited by the speed of the slowest component, third parties have no choice but to run with Intel or be rendered obsolete. And that is exactly what they don't want. Thus, if Intel wants to provide a Torrenza-like capability, it needs a point-to-point interconnect that can be negotiated independently. FSB just won't cut it.

Does Intel need to provide a Torrenza-like solution? Frankly, I don't know. Today there are not many co-processor applications where the co-processor has to interact with the main processor on a clock-cycle-by-clock-cycle basis. But arguably, that is because, presently there is no technology that allows a co-processor to interact with the CPU that closely. AMD's Torrenza will make that possible for the first time--and who knows, it might even catch on? Intel cannot afford to ignore Torrenza, that's the bottom line. And that is why, it needs a cache-coherent point-to-point interconnect solution. Maybe it's CSI, maybe it's something else. But they need one for sure...

Saturday, September 23, 2006

My E6700 Arrived

I immediately replaced my P4 560 with it, and overclocked it to 3.2 GHz on stock voltage (I tried 3.33, it booted, but just after log on, windows froze, 3.2 worked, and I did not have patience to play with intermidiate frequencies). Everything worked great. I ran burnin test for 12 hours with error checking enabled, it passed.

Then I started Adobe Premiere Elements, and started encoding a file. Three minutes into the process, and boom, video driver crashed. Tried again, and the same thing happened. I have ATI Radeon 600 w/256 MB onboard memory (I am not a gamer). The video driver for that card used to crash quite freuquently, but since I had updated it to their July release, it was running without quite solidly. I have another Radeon at work, though I don't know the model (it has 512 MB memory), and the display driver there also keeps crashing. Upgrading the driver hasn't helped me there.

Now I am running the E6700 back at 2.66 GHz. It is still pretty fast. But since I have tasted what 3.2 GHz looks like, it kind of feels slow now...

Why would overclocking the CPU cause display driver (and only display driver) crash? I tried locking PCIe speed at 100MHz, that didn't help either. If anyone has any suggestions, I am open to try.

The sad part is, I need the speed for video processing, and the darn driver crashes only when I start encoding with Adobe PE + a couple other programs.

At 2.66, I have encoded 3 DVDs so far, and everything seems rock solid. I am tempted to try an NVIDIA card, but what is the guarantee that that thing won't crash on me? Has anyone experienced something like this before? My previous NVIDIA card was very stable, but then, it was AGP and I was not trying to overclock. And as I have mentioned before, I am not rich, and hence I cannot spend 100 dollars on a card just to try it out...

Turns out, this was a north bridge problem afterall. Raising the VCore on NB solved the problem. The system is rock stable again. However, the max I am able to reach with 1.45v on NB is 3.1 GHz. I don't want to raise the voltage any further, and 3.1 is not that far off from 3.2.

Friday, September 22, 2006

4x4 revisited--I told you so!

Remember only a couple of days ago I doubted whether AMD would release 4x4 for very cheap? Turns out (at least seems like) I was right. AMD's 4x4 roadmap has leaked (for those who don't understand dutch, click here). Looks like 4x4 will be available only in FX variant--FX70, FX72, and FX74, coming in at 2.6GHz, 2.8GHz, and 3.0GHz. What's more, you have to buy these processors in pairs--so the upgradability argument that AMD fanboys were giving just got flushed down the drain. Each of these processors will have 2x1MB L2 cache, and a TDP of whopping 125W. Now, I do not think that lineup is exactly cheap. Despite being beaten down pretty badly, AMD has priced its FX62 today at about $800. So I do not expect them to price each of these FXes below $600 a piece, or $1200 together (maybe the lowest one at about $999 for two--but again, that is not mainstream).

Mommy, why are AMD fanboys crying? Well honey, they were expecting caviar, but AMD served them a crow!!

Tuesday, September 19, 2006

LV Clovertown at 50W TDP?

HKEPC reports that low-voltage, quad-core Clovertown (L5310) will be released at 50W TDP. That would certainly be impressive. AMD has been talking about delivering Quad-core at 80W TDP and hyping it up a lot. If Intel delivers their 50W QC almost a quarter or two before K8L arrives, that would be something (this is a big IF, since Intel hasn't announced this part). AMD will still keep on hyping up the idle power, but really, who gives crap about idle power in data centers or rendering farms? Also, the 1066 FSB will be more than enough, at least for rendering farms (check out Kentfield reviews from Tom's Hardware--1066 FSB does not cause a bottleneck on most benchmarks).

What does "same power envelope" mean anyways? You move to next-generation process, the power is expected to go down. Then you reduce the clock speed a little bit, that gives you additional power savings. Next-generation process also allows you lower voltages, reducing power even further. Everyone is doing it. Only AMD is hyping it.

Expect to see a lot of hype on idle power from AMD. Probably AMD expects their K8Ls to just sit idle? :)

Again, does 2P quad-core make 4P more irrelevant?

4x4 Pricing

Does anyone know how 4x4 will be priced? AMD claims there will be models well under $1000. What does that mean anyways? $1000 for CPUs, or $1000 for the entire system?

I seriously doubt AMD will release 4x4 at such low prices. That is because that will grossly undercut their own 2P Opteron sells. Here are some facts:

1. AMD's operating margins are lower than those of Intel.
2. As percentage of revenue, 2P servers matter more to AMD than to Intel
3. ASP lift that AMD gets from Opterons is higher than the ASP lift Intel gets from Xeon.
4. The highest-end desktop model that currently AMD is selling, for all practical purposes, is X2 3800+, which you get for $159 with a free motherboard. This implies, AMD desperately needs the margins it is getting on its Opterons.
5. ATI acquisition is putting strain on AMD's cache flow.

Considering all this, how can AMD kill their own 2P business? Now if Intel prices Kentsfield at prices that undercut Opterons, then AMD will be forced to price 4x4 that way. But why would they price 4x4 that way, without being forced, when they desperately need the margins the cash-cow Opterons are generating?

4x4 won't undercut 4P and 8P systems, but those alone are not sufficient to provide AMD with enough cash to stay in green (this is a guess, not a fact).

BTW, I do not expect even Intel to undercut Opterons (or Xeons) with Kentsfield. If they do that, they might kill that business forever. And with Woodcrest, they at least have some hope of regaining market share. If the market is killed, no one stands to gain. The likes of Google will immediately jump onto 4x4 or Kentsfield to build cheap "servers". Google doesn't care about reliability. All it cares about is performance/watt/$. And there are many others who care about exactly that.

Sunday, September 17, 2006

My E6700 Just Shipped

I am so excited!! I plan to overclock the heck out of it! But like other bloggers, I am not rich, and hence I do not plan to increase the voltage by much... 1.4V maybe? What do you say??

K8L and Penryn

It just amazes me to see how AMD fans keep on saying, "K8L will make Conroe look obsolete," or "Intel needs someting a lot better than Conroe if it has to survive K8L onslaught." That's funny, considering these same fanboys were saying that with Conroe, Intel will not be able to close gap on K8. That is funny...

Let us just look at some of the facts. In video encoding, at the same clock speed, Conroe beats K8 by about 30%. In games, if the GPU bottlenect is removed, Conroe beats K8 by upto 50%. So, it is safe to say that Conroe, in general is better than K8 by about 30%. Now, first we have to understand that K8L will go against Penryn--Conroe's 45 nm cousin. Also, sometime back, Inquirer reported that Intel's 45 nm process is a giant leap over its 65 nm process, as far as leakage is concerned. If this is indeed true, Conroe's 45 nm sibling, Penryn, would not have any trouble clocking at 4 GhZ, while operating in the same or lower power envelope. Recent reviews of Kentsfield also show that Conroe architecture is not hurting for memory bandwidth.

So what does all this mean? If Penryn clocks at 4GHz, it will be 33% faster than X6800. Now, even though Conroe performance scales almost linearly with speed, let us give it some headroom, and assume that this additional 33% clock-speed will give it only 25% performance improvement. Considering that X6800 beats AMD's best by 30%, this means, top-end Penryn will beat AMD's best (as of today) by 62%. Now that is huge!! Now beating that is not going to be an easy task, for K8 or K8L.

AMD fanboys might say--K8L is a grounds up architecture built of performance, and it should not be too difficult for it to improve over K8 by 62%. Well, let us take a look at what K8L brings to the table.

First, improved SSE performance. Yawn!! It was time. Conroe beats AMD in pure SSE2 operations by 400%. Nothing that AMD does here is going to tilt the balance in the other direction.

Second, it adds load reordering. Another bit of a Yawn! Intel has been doing that for how many years now? Conroe also does load/store reordering. Nothing new here.

Third, it adds support for HT 3.0. Last I checked, K8 was not really hurting for memory bandwidth. So this is not expected add too much to real performance. Extremely good for bragging rights, but from end-user's perspective, another yawn.

Fourth, it adds dual-ported L1 cache. AMD might be onto something with this. But again, we do not know what it really means. In fact, personally I do not expect it to add a lot. How many workloads today are really hurting for cache bandwidth. I would say, not many. Definitely not the 32-bit ones. At an IPC of about 2, and at most two source operands per instruction, you do not need more than 16 byte reads per clock cycle. For 64-bit workloads, it *MAY* mean someting. But again, until we see the effect on real benchmarks, I wouldn't start jumping up and down. Add to that the fact that K8L won't be reordering loads and stores. That means, if there are stores in front of loads, the second cache port could be just sitting there, doing nothing. Actually not entirely true, the second cache port will be adding load to the cache, increasing power conumption. In short, this seems like a good feature, but unless there is evidence to the contrary, I wouldn't count too much on it.

Fifth: improved floating point performance. Well, AMD has always been strong in this area, and when it comes to editing word documents, unzipping files, compressing videos, and playing games, it means zilt!! Universities and NASA care about it--impling 99% of their computers will be from AMD instead of 95%. But again, it is a performance improvement tailored for a very niche market.

Finally, they add shared L3 cache. Time they did so, wouldn't you say? Whatever happened to all those claims about cache thrashing, makes you wonder...

You might say, this is a very one-sided view. If Intel managed such a huge performance improvement from P4 to Core 2, why can't AMD do the same from K8 to K8L?

For first, Intel did not improve performance going from P4 to Core 2, they improved power. You can take P4, cool it with liquid nitrogen, clock it at 6 GHz, and it will beat the crap out of any CPU on the planet. The problem is, no one wants to do that, and hence Intel has to clock it below 4 GHz. Unfortuantely, P4 was designed to be performant at 5+ GHz. Realizing this mistake, what Intel did with Core 2 is that they came up with an architecture that works great at sub-5GHz range. No matter what you do to Core 2, you cannot clock it beyond 5GHz. If you take 5 GHz Conroe and 7GHz P4, they will probably be pretty close in performance. However, since no average user clocks in that range, P4 seems far inferior to Core 2.

In short, Intel saw the huge performance improvement going from P4 to Core 2 because they changed the philosophy. I wouldn't expect the same type of performance jump when Intel goes to Nehalem, for example. And that is why, I do not expect such a huge performance jump going from K8 to K8L--afterall, there is no huge change in philosophy. K8L will definitely be better. Will there be a wow factor? Hard to tell at this point. However, considering that Intel's transition to Core 2 has already set expectations very high, I think K8L will hugely be a disappointment.

Wednesday, September 13, 2006

FSB, CSI, IMC, and Hyper Transport

To the date I strongly believe that AMD's success is a result of Intel's mistakes, rather than any exceptional invention/execution on AMD's part. I will analyze why Intel made the mistakes it did in some other post--there were some genuine reasons (and having moronic engineers is not one of them), and for all we know, AMD could have made the same mistakes as well. Afterall, making mistakes and screwing up execution had been trademark of AMD until 2001.

Anyways...

The purpose of this post is to get some things off of my chest. I have tried posting this material on many pro-AMD blogs (pogs?), but I always got censored. Now that no one can censor me, I think I can just say what I want to say...

AMDers like to think that IMC is AMD's greatest gift to the mankind, and it unequivocally demonstrates AMD's supremacy when it comes to architecture. And don't even get them started on hyper transport--their eyes glow up, their jaws drop down, and they start drooling like a hungry dog in a meat factory. What they fail to realize is, IMC and Hyper Transport are just a very very small part of the overall CPU architecture. All they provide is a CPU-to-CPU and CPU-to-memory interconnect.

Now Computer Architecture is all about making compromises, and neither Intel nor AMD are coming out with any earth-shattering innovations here--everything that was there to be invented was invented in late 50s/early 60s. Both Intel and AMD are just using the millions (billions?) of transistors that are now available to them to make their own architectural compromises. IMC and HT is *ONE WAY* of improving the interconnect. But certainly it is not the only way.

Having larger and faster caches coupled with good prefetcher is certainly another very effective way of improving the interconnect--and currently Intel is using this way very effectively to counter AMD's IMC+HT onslaught. Core2Duo is the prime example of this. In video encoding, C2D is about 30% faster than equally clocked Athelons. In games that are not GPU constrained, it is 30% to 50% faster than the Athelon. In fact Intel is so successful with this strategy that they don't even need the 1333 MHz FSB that they are capable of. They launched C2D with just 1066 MHz FSB, and it is impossible to come up with a real-life benchmark in which a K8-based CPU can outperform C2D.

In fact, the recent reviews of Kentsfield indicate that the FSB is not a bottleneck at all. In fact, in video encoding, just a 1066 MHz FSB allows Kentsfield to have linear performance scaling. If the 1066 bus is sufficient for Kentsfield, it ought to be enough for for Conroe. Also, with Tulsa, Intel showed once again that FSB is not always the limiting factor.

So why is Intel bothered with CSI? Well, for the first, a faster interconnect would allow them to provide the same performance with lower cache. Lower cache implies smaller dies size, which implies higher margins. Thus, I believe, it has all to do with margins, and has got nothing to do with the performance you and me are going to see. For the end user, how does it matter if my computer has an FSB or CSI or HT. Finally what matters is performance. And as far as desktops and 2P servers are concerned, HT can easily be countered with a slightly larger cache. In fact, Tulsa's benchmarks indicate that that it FSB may not even be an issue in the 4P segment.

That brings me to my other point. Many AMD fanboys (Sharikou included) keep on arguing that when CSI comes out, it will be slower than HT. Well, frankly, that does not mean dudly squat! If Intel can beat the crap out of AMD CPUs using FSB, it can definitely do that using CSI. They may not be able to reduce the cache size and improve the margins by as much as they like, but they can definitely own the performance crown. Remember that they have the industry-leading process and yields allowing them to be wasteful with the silicon real estate.

CSI/HT does matter in the 8P space (and in 4P space in some benchmarks). But honestly, now that both Intel and AMD have started packing more and more cores on the CPUs, are we even going to have 8P systems anymore? With Penryn, Intel will most-likely become a 4-core-minimum supplier. Four of these CPUs would give you 16 cores in total. What is the market for systems with more than 8 or 16 cores. I am not an expert here, but I doubt that it is anything significant. Essentially, by packing multiple cores on a single chip, Intel and AMD are making the MP market irrelevant. So AMD today has performance advantage in a market segment that is quickly becoming irrelevant. I would really like to see how it all pans out.

Anyways, now that I have said everything I wanted to say, I can go to bed....

My brief encounter with Core 2

A friend of mine got hold of a Core 2 Duo E6700 from somewhere, and brought it over to my place just to check whether my setup is fully compatible with the chip. I have a P5B with 2GB Corsair 800 MHz. I am still waiting for my own E6700. Meanwhile, I am running it using P4 560 3.6 GHz with HT. I mainly use the computer for video encoding. Now let me remind you, when it comes to video encoding, P4 560 itself is a pretty strong chip. It can transcode DVD-quality MPEG-2 in real time, which itself is pretty impressive, I think.

When my friend showed up with his E6700, I poped that into my box, and proceeded to install the heat-sink. My friend advised me not to install the heat sink completely, but to just place it on the CPU. I didn't mind--afterall it was his CPU. We did that, and then booted the computer. It booted up in less than 30 seconds. Then I ran a short trans-coding test--DVD-quality, 20 min clip. To my surprise, Core 2 encoded it in less than 10 minutes. That was insane!! Mind you, I had not overclocked the system, and the heatsink was barely sitting on the CPU. And the whole time, the CPU temperature was well below 50 deg C. It was really sad that I had to return that chip.

I would just love to see what this baby can do when I overclock it to 3.6 GHz. I think it will be simply insane.

I hear FX-62 can barely manage DVD-quality transcoding in real time. In Dr. Sharikou's terms, it means E6700 "frags" FX62. Too bad--Intel $550 chip frags AMD's $800 chip. AMD's BK is certain :).

About this blog

I like to read stuff! All sorts of stuff!! And especially technology stuff. The first technology blog (pog??) I came across was Sharikou's blog. What a shame!! That guy is a sad excuse of intelligent life form. I tried to put some sense into that guy for a few days, and then realized that I was wasting my time (yeah, I am kind of slow in realizing such stuff).

Then I saw Mad-mod-mike! That dude is hilarious. He just loves name calling. Tried posting stuff over there. But again, can't say everything that I have to say.

Looked at Pointer's blog as well. He understands the stuff, and gives a more balanced view, but he is terrible in his english writing skills. After a while, it gets really annoying when you start stumbling on every second sentence...

Considering all this, I decided to start my own blog. I don't think anyone will care to read it or post it. But at least it gives me a way of expressing my opinion.

So why is the blog called "Of Chips and Salsa"?

Well, no particular reason really. I intend to talk about computer chips on this blog, and if by any chance, we get any debate on this blog, it will be spicy as salsa (shrugs)!