Based on the same GK104 found on the GTX680, the GTX770 contains 1536 CUDA cores at a slightly higher clockspeed than the GTX680. The GTX770 comes in two varients, the 2GB and the 4GB version. For most the 2GB card will provide the best balance of power vs price however should you have a larger display, then the 4GB version may be worth opting for. Both versions feature a worlds first 7GBPs (GDDR5 effective)memory subsystem which is a significant boost over the GTX680. Comprising of the same 256-bit memory controller means the GTX770 has 224.3GB/sec memory bandwidth at it's disposal.
The GTX770 is equipped with a GK104 core is fabricated on the 28nm process with every component was designed with power efficiency in mind to ensure the GPU gave the best performance-per-watt possible.
While the 28nm manufacturing process holds the bulk of the power saving features, the way the new architecture works is also key to furthering Kepler's power reduction. SMX, an evolution of the SM from Fermi is NVIDIA's new streaming multiprocessor:
Each SMX runs at the graphics clock speed rather than 2x that speed as previously but because the GK104 has 1536 CUDA cores (8xSMX), over Fermi's (GF110) 512 CUDA cores (16xSM), Kepler allows the GTX770 to operate at twice the performance per watt measurement compared to the GTX570.
The clock throughput of FMA2, SFU and texture operations have all been significantly increased and while some operations still retain the same speed as the GTX580, the GTX770's much higher core clockspeed ensures a substantial increase for all GPU operations.
To feed the new SMX, each unit has four warp schedulers, each capable of firing off two instructions per warp, every clock. With a redesigned scheduling function, the GTX770 is a lot less complex and much more streamlined allowing the compiler to determine which instructions will be issued and can provide this information direct to the hardware block rather than going around the houses using Multi-port decode and a register scoreboard along with the dependency check before the information gets issued. This method is both quicker and more importantly, more power efficient.
Perhaps the biggest change from Fermi to Kepler was the omission of the old Shader Clock. The shader clock was getting tired as it was part of the old Tesla architecture which was implemented as an area optimisation. Because of the way Kepler executes instructions at a much more streamlined manner and at a higher clock rate, fewer copies of the execution unit need be made.
Another improvement over Fermi with the introduction of SMX is the redesigned Polymorph engine (2.0). The Polymorph engine takes card of the tessellation workload of the GTX770. The design of the Polymorph engine is to ensure that the tessellation we see on screen has little (as possible) impact on the rendering performance. What you may find bizarre though is that NVIDIA have cut the Polymorph engine count in half to 8 from Fermi's 16 yet claim it is almost twice as quick. This is thanks to the improved core an memory clockpeed.
With both a higher core clockspeed based on older (but still very fast) GK104 core and of course that blisteringly fast memory, the GTX770 looks set to replace not only the GTX670, but it's bigger, but older brother, last years flagship graphics card, the GTX680.