Intel may have launched Cascade Lake relatively recently, but another refresh of the 14-nm server is already on the horizon. Intel has lifted the veil on Cooper Lake today, giving new details on how the processor integrates into its product line with the 10-nm Ice Lake server chips supposed to be queuing for the deployment in 2020.
Cooper Lake features include support for Google's bfloat16 format. It will also support up to 56 processor cores in a snap-in format, unlike Cascade Lake-AP, which can scale up to 56 cores but only in a welded BGA configuration. The new take would be known as LGA4189. There is reports that these chips could offer up to 16 channels of memory (since Cascade Lake-AP and Cooper Lake use multiple chips on the same chip, Intel could run up to 16 channels of memory per socket with version double chip).
Bfloat16 support is a major addition to Intel's artificial intelligence efforts. While 16-bit semi-precision floating point numbers have been defined in the IEEE 754 standard for over 30 years, bfloat16 changes the balance between the format used for significant digits and that used for exponents. The original IEEE 754 standard is designed to give priority to precision, with only five bits of exponent. The new format allows a much larger range of values but with less precision. This is particularly useful for artificial intelligence and deep learning calculations, and is a major step on Intel's path to improving the performance of artificial intelligence and deep processor learning computations. Intel has released a White Book on bfloat16 if you are looking for more information on the subject. Google says that using bfloat16 instead of the conventional semi-precision floating point can generate significant performance benefits. The society written"Some operations are related to the memory bandwidth, which means that the memory bandwidth determines the time spent in such operations. Storing the inputs and outputs of memory bandwidth-related operations in bfloat16 format reduces the amount of data to be transferred, improving the speed of operations. "
The other benefit of Cooper Lake is that the CPU would share a socket with the upcoming Ice Lake servers in 2020. A theoretically important distinction between the two families is that Ice Lake servers at 10 nm can not support bfloat16, while 14nm Cooper Lake servers will. This could be the result of increased differentiation of Intel's product lines, although it is also possible that this reflects the difficult development of 10 nm.
The introduction of 56 cores as a base indicates that Intel expects Cooper Lake to expand to more customers than the Cascade Lake / Cascade Lake-AP target number. It also raises questions about the type of Ice Lake servers that Intel is going to market and the possibility of seeing 56-core versions of these chips as well. To date, all of Intel's 10-nm Ice Lake messaging has focused on servers or mobile devices. This may reflect the strategy used by Intel for Broadwell, where desktop versions of the processor were scarce, and where server and server components dominated this family – but Intel says later the fact of not publishing Broadwell desktop was a mistake and that the company had gaffed by skipping the market. Does this mean that Intel keeps launching an Ice Lake desktop or if the company has decided to no longer use its desktop computer? made understand that this time is not yet clear.
Cooper Lake's attention to AI treatment means that it is not necessarily meant to go with AMD's next 7 nm Epyc. AMD has not talked much about AI or machine learning on its processors and, although its 7nm chips add support for 256-bit AVX2 operations, the company's CPU division does not tell us has not yet hinted that a particular goal is the AI market. AMD's efforts in this area are still based on a graphics processor and, although its processors will certainly work with AI code, it does not seem that the market is at the same level as that of Intel. Between the addition of a new support for AI to existing Xeons, its products Movidius and Nervana, projects like Loihiand plans the data center market with Xe, Intel is trying to build a market to protect its high-performance computing and high-end server operations, and to address Nvidia's current dominance of the industry.
AMD has kept details of his next family of Epyc products remarkably close to his chest. A recent leak (now removed) of the publicly accessible Open Benchmarking database shows fierce competition between AMD's upcoming Epyc 7nm processors and Intel's equivalent Xeon products. Intel CEO Bob Swan said that AMD offered increased competition in the last half of 2019, especially in data centers. So these numbers are not automatically surprising – unless, of course, you remember when the AMD market went into the servers was essentially zero.
According to the text of the leak now removed (picked up by THG Before breaking down, the AMD Epyc 7742 is a 64-core processor with 128 threads, 256 MB L3 cache, a 225-Watt TDP, and a 2.25 GHz / 3.4 GHz base / boost clock , respectively. The Epyc 7601, already launched, is a TDP 180C processor, 32C / 64T, with 64 MB of L3 and an almost identical boost clock of 2.2 GHz / 3.4GHz. The Xeon Platinum 8280 measures 28C / 56T, 2.7GHz, 4GHz and 205W, while the Xeon Gold 6138 (included for reference) measures 20C / 40T, 2GHz / 3.7GHz and 125W.
If these rumors are correct, AMD managed to double the number of cores and increase the clock very slightly in a larger TDP envelope of 1.25. I'm not sure what the "RDY1001C" refers to at the bottom of the results, although this configuration is still the fastest in the list. Googling the term has given no results.
There are more tests at THG than what we have reproduced here; check their article for complete results. And, as always, treat all results with great caution. These are disclosed results. Even though they are accurate, they may reflect engineering samples that are not representative of the final performance.
SVT is a highly optimized video encoder for Intel processors, but optimizations for Intel chips also work well for AMD processors, and we certainly see it here. None of the codes seems to evolve particularly well when adding new cores, so we will not attempt to make sense of the dual figures. A simple 7742 is significantly faster than the Xeon Platinum 8280 and the 7742 is more than twice as fast as the 7601.
In HEVC, the performance figures change. Here, Intel and AMD are globally at par, but the 7742 represents a huge increase over the Epyc 7601.
POV-Ray 3.7 is evolving with an increased number of threads, but the gain of a dual processor is much smaller than that of the 7742 compared to the 7601. AMD averages only 24% more performance adding 64 additional cores, compared with 42 percent scaling for the 8280 Xeon Platinum. This difference in scaling means that a pair of Xeon 8280 doubles is roughly equivalent to a Epyc 7742 pair, although an Epyc 7742 is significantly faster than a Platinum Xeon 8280.
Blender, and more generally rendering, are tests for which AMD processors are generally excellent. AMD resolutely wins this test, although it is interesting to note that we are also seeing signs of significantly improved scalability for Intel processors. This may simply reflect the fact that Intel processors have far fewer cores. The Xeon Platinum 8280 is only a 28-core chip compared to the performance of a 64-core chip. This is a pretty important benefit for AMD. Of course, there is also the question of pricing and positioning – Intel has generally charged its Xeons well above AMD's Epyc processors, and we tend to prioritize price comparison over other factors. .
However, readers should be aware that there are sizing issues with AMD processors due to the large number of 128C / 256T cores, while Xeon Platinum processors only have 56 cores in a 2S configuration. Applications themselves may not adapt to these types of thread counting.
If these numbers are accurate, they suggest that AMD's Epyc 7nm processor will be a major challenge for Intel in more markets – which is exactly what we expected from previous third generation Ryzen claims and from AMD on Epyc 2. Factor in Bob Swan's recognition of increased competition in the market, and we plan a scenario in which Intel will reduce its Xeon prices either by reducing them directly or by launching Cooper Lake (currently the first half of 2020). Intel processor prices have always been much higher than those of AMD, but it's hard to know exactly how much, because the company's list prices (the best indicator to follow) do not reflect the volume actually paid by customers.
If AMD's Rome is as good as it looks, we should see an increased adoption of this piece by OEMs over first-generation Epyc, as well as some reaction from # 39; Intel. It may take several generations of products for server clients to switch to new providers, but they take this into consideration.
Steam deploys a new algorithm using the AI to make recommendations on game content. Will it work better than the old beacon system?
The post office Valve presents an automatic learning algorithm to recommend new Steam games appeared first on ExtremeTech.
Larrabee is finally dead. The project, which was originally a major graphics processor initiative within Intel, was canceled. Intel has quietly stated that the Xeon Phi 7295, 7285 and 7235 would be end of life, July 31, 2020, with no further orders for KML after August 9, 2019. May 6, 2019 marked the beginning of Product discontinuation program support for KML chips.
Since Intel offers two products of the same name, let's discuss it briefly. Knights Hill, who was killed several years ago, was originally scheduled for 10nm and 2016. Intel killed Knights Hill in November 2017 (its delay has delayed the introduction of the Aurora supercomputer). Knights Mill, on the other hand, is a 14-nm Knights Landing (KNL) derivative that has been designed for AVX-512 workloads, deep learning and acceleration of AI. Unlike previous Larrabee designs, Knights Mill was not an additional PCIe card, but a socket chip for the LGA3647.
While Larrabee and Knights Corner were derived from the classic Pentium processor, Knights Mill offered between 64 and 72 processor cores based on the Silvermont x86 Atom architecture, supporting more threads and features such as AVX-512. Larrabee architect, Tom Forsyth, wrote a blog article about his work on the project several years ago should be read by anyone interested in the chip. Although Larrabee is generally considered a failure, he points out that chip design efforts have largely contributed to the long-term Intel product lines, noting:
KNF / KNC did not have an SSE because the construction and validation of these units would have been too laborious (remember that we started with the original Pentium chip). KNL adds the inherited instructions from SSE, and it is really true that it is an appropriate x86 kernel. In fact, it's so x86 that x86 has grown to include it – the new AVX512 instruction set is the Larrabee instructional game with a few encoding changes and a few adjustments.
Of course, Intel does not withdraw from the HPC space – it's just about treating it differently. While Xeon Phi was trying to stretch x86 into a MIC, the company seems to be focusing on Xe, its upcoming graphics architecture, to tackle the AI / DL market. Xeon Phi will take on this role, but not before 2020. It is not clear whether or not Intel will reuse branding. presumably, the company will create a new brand around Xe, given the focus on this new effort. The decision to kill the architecture is supposed to be directly related to the surrounding problems of 10 nm, which makes sense. If Intel shipped Knights Hill in 2016, as originally planned, it would already ship 7 nm today and would have been able to market new products on the market, in order to compete with Nvidia. Instead, the company was stuck with 14-nm hardware and a limited roadmap to compete with competitors.
"Intel has not yet really started talking about Xe as an artificial intelligence and deep learning solution, but we know that these markets are key targets for the GPU – this n & # 39; It's no coincidence that Intel is moving in space now, so doing the advent of AI and deep learning means having a GPU is now not optional if you want to play in the data center space – and this market is far too important for Intel not to compete with. Xe, and not Xeon Phi, represents the future of Intel's IT framework for AI and DL – with a lot of Xeon support, of course, if any. It will be necessary to wait one year before knowing if Xe will deliver the goods or not.
Geoffrey Hinton, Yann LeCun and Yoshua Bengio – sometimes called the "godfathers of artificial intelligence" – the 2018 Turing Prize for their work on neural networks. The work of the three pioneers of artificial intelligence has basically laid the foundation for modern AI technologies.
In the 1980s and early 1990s, artificial intelligence renewed popularity within the scientific community. However, by the mid-1990s, scientists had not been able to make significant progress in AI, making it more difficult to obtain funding or publish research. Hinton, LeCun and Benggio remained intact and continued their work.
In 2004, in an effort to revive the field, Hinton a new research program with "less than $ 400,000 in funding from the Canadian Institute for Advanced Research". The program would focus on "neural calculus and adaptive perception". Bengio and LeCun joined Hinton in the program.
Autonomous cars, voice assistants and facial recognition technology are just some of the advances made possible by the work of Hinton, LeCun and Bengio.
The award, named after the British mathematician Alan Turing, carries the million dollar award, as the trio will separate. Tim Berners-Lee, known for having invented the World Wide Web, has already been awarded the Turing Prize.
Hinton is currently one of the leading researchers on AI at Google. LeCun is now on Facebook, as chief scientist of AI for the company. Bengio has remained in the academic world, but has worked with companies such as AT & T, Microsoft and IBM.
Neural networks The last few years have been a hot topic, but evaluating the most efficient way to build one for the treatment of a given data set remains a daunting task. The design of systems that can use algorithms to build in the most optimal way is still an emerging field – but MIT researchers have developed an algorithm that can speed up the process up to 200 times.
The NAS algorithm (Neural Architecture Search, in this context) that they developed "can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms – when it is run on a large image data set – in just 200 hours GPU " reports. This is a significant improvement over Google's 48,000 hours of developing a state-of-the-art NAS algorithm for image classification. The scientists' goal is to democratize artificial intelligence by offering the CNN concept without the need for huge GPU arrays for initial work. While the search for advanced solutions requires 48,000 GPU bays, there are few precious people, even in large institutions, who will have the opportunity to try.
The algorithms produced by the new NAS were on average 1.8 times faster than CNNs tested on a mobile device with similar accuracy. The new algorithm used techniques such as binarization at the path level, which stores a single path at a time to reduce the memory by an order of magnitude. MIT does not refer to specific search reports, but from an excerpt from Google, referenced articles seem right here and right here – two different research reports from a group of overlapping researchers. The teams focused on pruning between the potential paths that CNN could use, each evaluating their turn. The least likely paths are pruned successively, leaving the final path in the best case.
The new model also incorporates other improvements. Architectures were checked against hardware platforms for latency during evaluation. In some cases, their model predicts higher performance for platforms considered inefficient. For example, 7 × 7 filters for image classification are not usually used because they are quite expensive in computation – but the research team found that they worked well for GPUs.
"This goes against earlier human thought," said Han Cai, one of the scientists at MIT News. "The larger the search space, the more you can find unknown elements.You're not sure if anything will be better than the past human experience.Let it be understood by him." "
These efforts to improve the performance and capabilities of AI are still at the stage where great improvements are possible. As you have recently we talkedover time, the field will be limited by the same discoveries that will advance it. Accelerators and artificial intelligence processors offer tremendous benefits in terms of short-term performance, but they are not a fundamental replacement for the scaling historically provided by Moore's Law. .