Understanding Intel Core I7: Part 2
73One of the factors that allow the Core 2 Duo overcomes most current AMD clock for clock is the fact that processors are able to process 4 instructions per cycle (4 issue), compared to 3 of AMD. Naturally, there are many other factors to consider (the efficiency of the branch prediction circuit, the size and speed of the caches and so on), but the 4 instructions per cycle offer a considerable advantage.
Nehal The remains of 4 processing instructions, but adds a number of architectural refinements, that allow the execution units are fed with a larger volume of data, reducing the time they are idle waiting for data in the cache or the result operation of a branch prediction, for example. This results in a considerable gain in efficiency in relation to Penryn.
Besides the changes in addition to caches and memory controller, another change is the replacement of former FSB enhanced by a bus, called QuickPath Interconnect, or QPI.
The FSB (front-side bus, bus or front), has been used since the first Intel processors. It is a shared one bus, which connects the processor to the chipset, as you can see in this diagram from Intel:
How it is used not only for communication between the processor cores and memory, but also for communication between 2 or 4 cores of the processor, it has strangled the memory access, impairing system performance. The problem is aggravated when using multiple processors in SMP, as in the case of cards for servers, or on the platform Skultrail
Intel Penryn until the problem on the basis of force-Gross, simply adding more processors to the L2 cache. With QuickPath, decided to tackle the root of the problem, replacing the FSB for a modern bus, composed of independent links which operate 4.8 or 6.4 GT / s (follow the "GT / s" indicates the volume of transactions per second, other than "GHz", which indicates the clock) with the transmission of 16 bits of data in each direction per cycle, resulting in a block of 9.6 or 12.8 GB / s in each direction (25.6 GB / s in total) per line of data .
As the memory is now accessed directly from the memory controller, this link is fully available for traffic from I / O. When using two processors, each processor will communicate with the chipset through an independent line and a third line of data is established to coordinate communication between the two:
By using 4 processors (which can be well exploited for high-performance servers) are included additional buses, which means that each processor has direct access to all the others:
If you followed the evolution of the AMD processors in recent years, will notice a great similarity between the QuickPath and HyperTransport, used in AMD. Obviously, it is not mere coincidence. Intel studied the strengths of AMD's solution and then find a solution adapted to its architecture. As they say, copying is the sincerest form of praise.
Regarding the processing of instructions, an important novelty is the Loop Stream Detector (LSD), an additional controller that crawl the instructions decoded before they reach the processor, instructions for locating the loop processing.
Instead of reprocessing the instructions of the loop repeatedly, the process stores the instructions in a small internal cache and run from there. Besides allowing more time, this slightly reduces electrical consumption because it allows the circuit to disable branch prediction, along with the fetch and decode units during the processing of the loop:
In Conroe (used in the initial generation of Core 2 Duo), Intel launched the use of "macro-ops fusion, which enables some specific instructions to be rendered during the decoding and processed as a single instruction, resulting in a small gain in performance. Conroe in the macro-ops fusion worked only with 32-bit instructions, but the Nehal won support merger of instructions for 64-bit, which is good news for anyone who has made or intends to do the migration.
Nehal This also marks the return of Hyper Threading, now called the SMT (Simultaneous Multi-Threading) which causes the processor to present to the operating system as having 8 cores instead of 4. Of course, not the SMT processor doubles the performance, serving only as an extra feature that lets him enjoy the best resources for processing, processing two threads simultaneously whenever possible.
If you followed the era of Pentium 4, may not have good memories of Hyper Threading, since it reduces the performance of the processor in some operations and substantially increased electrical consumption. In the case of Nehal, however, the light went through a series of improvements, making it more efficient. In addition to the optimizations, some other important factors are:
a) The Nahal has an integrated memory controller, and much larger caches, which ensures a much greater flow of data. This is a prerequisite for good performance when using the SMT, as the process two threads simultaneously, each core needs to be fed data from both.
b) Today, we have a much higher volume of software optimized for simultaneous processing of multiple threads, different from what we had at the time of the Pentium 4.
The gain when using the SMT in Nehal is below 10% in most tasks (in some situations, there may be even a small loss), but there are some specific cases where it represents significant gains, such as 3DMark, where gain up to 35%, which is not bad, considering he is to take advantage of processing cycles that would otherwise be wasted.
Besides the issue of performance, there is also a small gain in terms of energy consumption, since to perform the tasks faster, the processor spends more time in low-power. It should be noted that Intel has used the SMT in the Atom, for the same reason.
Unlike the crude Kentsfield (used in the first generation of Core 2 Quad), where all cores operate always at the same frequency and using the same voltage, the Nehal management system offers a slightly more elegant, where the nuclei are still operating the same frequency, but can be configured with different tensions, according to the level of use. The stranded cores are placed in a stage of low consumption, which are almost completely off, allowing the processor to be only one active nuclei to lighter duties, and off the other cores as needed.
The management is done with the assistance of the PCU (Power Control Unit), a dedicated controller, which has its own firmware and its own processing circuitry and is dedicated solely to the task of monitoring the requirements of the system and levels of use of nuclear, taking decisions with regard to Clocks and voltages used by each.
The PCU has a moderately large area of the processor, with no less than one million transistors. It is as if the Nehal had a 486 integrated, dedicated solely to power management.
Another important change is the Turbo Boost, which the processor may increase the frequency of operation when only one or two of the cores are active, in a sort of automatic overclocking.
Traditionally, single-core processors or dual-core operating at frequencies slightly higher than the quad-core processors, allowing them to make combat or even overcome the successors in applications with low parallelism, as in most games . Without doubt, Intel does not like to see a very simple Pentium E overclocado overcome an expensive Core 2 Quad in some tests.
With the Turbo Boost, the Nehal can be "converted" into a single-core processor, dual-core or triple-core in situations where a small increase in clock offset the additional deactivation of the nuclei, tapando that gap. Another point of view would have the Turbo Boost as a system overclock "supported", which allows even those who do not believe in overclocking can benefit from a part of the hidden potential of the processor.
The increased frequency is controlled by the PCU, which monitors the level of use, deciding in which cases it can be applied. The basic rule is that the increase is done only in situations that result in tangible gains in performance (since it increases the electrical consumption) and only when the processor is operating comfortably below the TDP and the maximum temperature.
In the initial versions, the Turbo Boost is capable of increasing the clock in two units (266 MHz) where only one core is active and only 133 MHz if two or more are in activity, change that is made by increasing the multiplier ( without affecting the frequency of other components).
These small increases amount to only a slight overclock, but later versions should bring highest increases. You can also disable the Turbo Boost through the setup, which is important when you overclock, since the processor operating near the limit, any further increase may be sufficient to destabilize the system.
A gloomy prospect is that more aggressive versions of the Turbo Boost could mark the beginning of the end for overclocks, as the process is to adjust the frequency of operation dynamically between, say, 2.66 and 4.0 GHz, will not make much sense to bear the higher consumption and the need to use an oversized cooler to keep the processor working to 4.0 GHz all the time.
PrintShare it! — Rate it: up down flag this hub
|
Intel Core i7 920 2.66GHz 8M L3 Cache 4.8GT/sec QPI Hyper-Threading Turbo Boost LGA1366 Processor
Price: $259.99
List Price: $341.99 |
|
Intel Core i7 Processor 2.80 GHz 8 MB LGA1156 CPU I7-860BOX
Price: $266.98
List Price: $325.99 |
|
Corsair TR3X6G1600C8D Dominator 6 GB 3 x 2 GB PC3-12800 1600MHz 240-Pin DDR3 Core i7 Memory Kit
Price: $204.98
List Price: $290.99 |
|
Intel Core i7 950 3.06GHz 8M L3 Cache LGA1366 Desktop Processor
Price: $539.99
List Price: $628.99 |
|
Cooler Master V8 Nickel Plated Copper Base Aluminum Fins 8 Heatpipes Core i7 1366 CPU Cooler - (RR-UV8-XBU1-GP)
Price: $55.47
List Price: $69.99 |
- Understanding Intel Core I7: Part 1
The Core I7 mark the introduction of the Nahal, based on an architecture with many changes on the Penryn processors and above, including an integrated memory controller and FSB's long-awaited migration to a...









