Hardware
The device I am interested in is the new NVIDIA Jetson Nano (128CUDA) and Google Coral Edge TPU (USB accelerator). And I will also test i7-7700K+GTX1080 (2560CUDA), Raspberry Pi 3B+, and my old friend, a 2014 macbook pro contains an i7-4870HQ (without CUDA-capable kernel).
Software
I will use MobileNetV2 as a classifier for pre-training on the imagenet dataset. And I will use the model directly from Keras, TensorFlow backend, the GPU's floating-point weights, as well as the 8-bit quantitative tflite version of the CPU and Coral Edge TPU.
First, loading the magpie model and image. Then I perform 1 prediction as a warm-up (because I notice that the first prediction is always much slower than the next prediction). I let it sleep for 1 second, so all threads must be finished. Then the script runs for it and classifies the same image 250 times. By using the same image for all classifications, we ensure that it will remain close to the data bus throughout the test. After all, we are interested in the speed of inference rather than the ability to load random data faster. It is different for the score of the quantified tflite model with CPU. But it always seems to return the same prediction as others. So I think this is strange in the model, and I'm pretty sure it won't affect performance.
Now, because the results of different platforms are so different. It is difficult to imagine, so here are some charts, choose your favorite...
Analysis
There are 3 bar graphs jumping into the view in the first picture. (Yes, the first picture, the linear scale fps, is my favorite. Because it shows the difference in high-performance results.) Of these 3 bars, 2 of them are implemented by Google Coral Edge TPU USB accelerator, The third is the full NVIDIA GTX1080 assisted by Intel i7-7700K.
Looking carefully, you will see that GTX1080 is actually defeated by Coral. Let it sink for a few seconds, and then be ready to be blown away. Because the maximum power of GTX1080 is 180W, which is absolutely huge compared to Coral2.5W. What we see next is that Nvidia’s Jetson Nano score is not high. Although there is a GPU that supports CUDA, it is actually not much faster than my original i7-4870HQ. But this is the problem, "not faster". It is still faster than the 50W, quad-core, hyper-threaded CPU. Jetson Nano has never consumed a short-term average of more than 12.5W, because this is my motivation. Power consumption is reduced by 75% and performance is improved by 10%.
Obviously, its own Raspberry Pi is not an impressive thing. It is not a floating point model, and it is still not useful for quantitative models. But hey, anyway I have the file ready. And it will run the test. The more the better? And it's still an interesting thing. Because it shows the difference between the ARM Cortex A53 in Pi and the A57 in Jetson Nano.
NVIDIA Jetson Nano
Jetson Nano does not use the MobileNetV2 classifier to provide impressive FPS rates. But as I have already said, this does not mean that it is not a very useful project. It's cheap and it doesn't require a lot of energy to run. Perhaps the most important attribute is that it runs TensorFlow-gpu (or any other ML platform), just like any other machine you have been using before. As long as your script is not deep into the CPU architecture, you can run the exact same script as i7 + CUDA GPU, or you can train! I still think NVIDIA should use TensorFlow to pre-load L4T. But I will try not to be angry anymore. After all, they have a good explanation on how to install it (don't be fooled, TensorFlow 1.12 is not supported, only 1.13.1).
Google Coral Edge TPU
Edge TPU is also called "ASIC" (application specific integrated circuit), which means it has a combination of small electronic components such as FET and the capacity burned directly on the silicon layer, so that it can fully realize what it needs to do is Speed up reasoning.
Infer, yes, the Edge TPU cannot perform backward propagation.
The logic behind this sounds more complicated than it is now. (Actually creating hardware and making it work is a completely different thing, and very, very complicated. But the logic function is much simpler). If you are really interested in the way it works, you can check "Digital Circuit" and "FPGA", you may find enough information to keep you busy in the next few months. Sometimes it’s complicated at first, but it’s really fun! But this is exactly why Coral is so different when comparing performance/wattage. It is a bunch of electronic devices designed to perform the required bitwise operations, with virtually no overhead.
Why doesn't the GPU have an 8-bit model?
The GPU is essentially designed as a fine-grained parallel floating-point calculator. Therefore, the use of floating is exactly what it creates and its advantages. The Edge TPU is designed to perform 8-bit operations, and the CPU has a faster method than 8-bit content that is faster than full-bit wide floating-point numbers because they must deal with this problem in many cases.
Why choose MobileNetV2?
I can give you many reasons why MobileNetV2 is a good model, but the main reason is that it is one of the pre-compiled models provided by Google for Edge TPU.
What other products does Edge TPU have?
It used to be different versions of MobileNet and Inception, but as of last weekend, Google introduced an update that allows us to compile custom TensorFlow Lite models. But the limitation is, and may always be the TensorFlow Lite model. This is different from Jetson Nano, that thing can run anything you imagine. Raspberry Pi + Coral compared to others
Why does Coral look much slower when connected to Raspberry Pi? The answer is simple and straightforward: Raspberry Pi only has a USB 2.0 port, and the rest have USB 3.0 ports. And since we can see that the i7-7700K is faster on Coral and Jetson Nano, but still has not got the score of the Coral development board in NVIDIA test, we can conclude that the bottleneck is the data rate, not the Edge TPU.
I think it’s long enough for me, and maybe it’s the same for you. I am very shocked by the powerful features of Google Coral Edge TPU. But for me, the most interesting setting is the combination of NVIDIA Jetson Nano and Coral USB accelerator. I will definitely use this setting and it feels like a dream.
Speaking of Dev Board of Google Coral, and Edge TPU, then by the way, Model Play based on Coral Dev Board. It is developed by a domestic team and is an AI model sharing market for global AI developers. Model Play not only provides a platform for AI model display and communication for global developers, but also can be used with Coral Dev Board with Edge TPU to accelerate ML inference, preview the running effect of the model in real time through mobile phones, and help AI to expand from prototype to product. Developers can not only publish their trained AI models, but also subscribe and download the models they are interested in, to retrain and expand their AI ideas, and realize the idea-prototype-product process. Model Play also presets various commonly used AI models, such as MobileNetV1, InceptionV2, etc., and supports the submission and release of retrainable models to facilitate users to optimize and fine-tune their own business data.
Just like Google at this year's I/O conference, the developers were called to jointly contribute to the development community. At the same time, the Model Play team is also issuing AI model convening orders to developers around the world, soliciting deep learning models based on TensorFlow that can run on the Google Coral Dev Board to encourage more developers to participate in the activities. Ten thousand AI developers share ideas together.
The device I am interested in is the new NVIDIA Jetson Nano (128CUDA) and Google Coral Edge TPU (USB accelerator). And I will also test i7-7700K+GTX1080 (2560CUDA), Raspberry Pi 3B+, and my old friend, a 2014 macbook pro contains an i7-4870HQ (without CUDA-capable kernel).
Software
I will use MobileNetV2 as a classifier for pre-training on the imagenet dataset. And I will use the model directly from Keras, TensorFlow backend, the GPU's floating-point weights, as well as the 8-bit quantitative tflite version of the CPU and Coral Edge TPU.
First, loading the magpie model and image. Then I perform 1 prediction as a warm-up (because I notice that the first prediction is always much slower than the next prediction). I let it sleep for 1 second, so all threads must be finished. Then the script runs for it and classifies the same image 250 times. By using the same image for all classifications, we ensure that it will remain close to the data bus throughout the test. After all, we are interested in the speed of inference rather than the ability to load random data faster. It is different for the score of the quantified tflite model with CPU. But it always seems to return the same prediction as others. So I think this is strange in the model, and I'm pretty sure it won't affect performance.
Now, because the results of different platforms are so different. It is difficult to imagine, so here are some charts, choose your favorite...
Analysis
There are 3 bar graphs jumping into the view in the first picture. (Yes, the first picture, the linear scale fps, is my favorite. Because it shows the difference in high-performance results.) Of these 3 bars, 2 of them are implemented by Google Coral Edge TPU USB accelerator, The third is the full NVIDIA GTX1080 assisted by Intel i7-7700K.
Looking carefully, you will see that GTX1080 is actually defeated by Coral. Let it sink for a few seconds, and then be ready to be blown away. Because the maximum power of GTX1080 is 180W, which is absolutely huge compared to Coral2.5W. What we see next is that Nvidia’s Jetson Nano score is not high. Although there is a GPU that supports CUDA, it is actually not much faster than my original i7-4870HQ. But this is the problem, "not faster". It is still faster than the 50W, quad-core, hyper-threaded CPU. Jetson Nano has never consumed a short-term average of more than 12.5W, because this is my motivation. Power consumption is reduced by 75% and performance is improved by 10%.
Obviously, its own Raspberry Pi is not an impressive thing. It is not a floating point model, and it is still not useful for quantitative models. But hey, anyway I have the file ready. And it will run the test. The more the better? And it's still an interesting thing. Because it shows the difference between the ARM Cortex A53 in Pi and the A57 in Jetson Nano.
NVIDIA Jetson Nano
Jetson Nano does not use the MobileNetV2 classifier to provide impressive FPS rates. But as I have already said, this does not mean that it is not a very useful project. It's cheap and it doesn't require a lot of energy to run. Perhaps the most important attribute is that it runs TensorFlow-gpu (or any other ML platform), just like any other machine you have been using before. As long as your script is not deep into the CPU architecture, you can run the exact same script as i7 + CUDA GPU, or you can train! I still think NVIDIA should use TensorFlow to pre-load L4T. But I will try not to be angry anymore. After all, they have a good explanation on how to install it (don't be fooled, TensorFlow 1.12 is not supported, only 1.13.1).
Google Coral Edge TPU
Edge TPU is also called "ASIC" (application specific integrated circuit), which means it has a combination of small electronic components such as FET and the capacity burned directly on the silicon layer, so that it can fully realize what it needs to do is Speed up reasoning.
Infer, yes, the Edge TPU cannot perform backward propagation.
The logic behind this sounds more complicated than it is now. (Actually creating hardware and making it work is a completely different thing, and very, very complicated. But the logic function is much simpler). If you are really interested in the way it works, you can check "Digital Circuit" and "FPGA", you may find enough information to keep you busy in the next few months. Sometimes it’s complicated at first, but it’s really fun! But this is exactly why Coral is so different when comparing performance/wattage. It is a bunch of electronic devices designed to perform the required bitwise operations, with virtually no overhead.
Why doesn't the GPU have an 8-bit model?
The GPU is essentially designed as a fine-grained parallel floating-point calculator. Therefore, the use of floating is exactly what it creates and its advantages. The Edge TPU is designed to perform 8-bit operations, and the CPU has a faster method than 8-bit content that is faster than full-bit wide floating-point numbers because they must deal with this problem in many cases.
Why choose MobileNetV2?
I can give you many reasons why MobileNetV2 is a good model, but the main reason is that it is one of the pre-compiled models provided by Google for Edge TPU.
What other products does Edge TPU have?
It used to be different versions of MobileNet and Inception, but as of last weekend, Google introduced an update that allows us to compile custom TensorFlow Lite models. But the limitation is, and may always be the TensorFlow Lite model. This is different from Jetson Nano, that thing can run anything you imagine. Raspberry Pi + Coral compared to others
Why does Coral look much slower when connected to Raspberry Pi? The answer is simple and straightforward: Raspberry Pi only has a USB 2.0 port, and the rest have USB 3.0 ports. And since we can see that the i7-7700K is faster on Coral and Jetson Nano, but still has not got the score of the Coral development board in NVIDIA test, we can conclude that the bottleneck is the data rate, not the Edge TPU.
I think it’s long enough for me, and maybe it’s the same for you. I am very shocked by the powerful features of Google Coral Edge TPU. But for me, the most interesting setting is the combination of NVIDIA Jetson Nano and Coral USB accelerator. I will definitely use this setting and it feels like a dream.
Speaking of Dev Board of Google Coral, and Edge TPU, then by the way, Model Play based on Coral Dev Board. It is developed by a domestic team and is an AI model sharing market for global AI developers. Model Play not only provides a platform for AI model display and communication for global developers, but also can be used with Coral Dev Board with Edge TPU to accelerate ML inference, preview the running effect of the model in real time through mobile phones, and help AI to expand from prototype to product. Developers can not only publish their trained AI models, but also subscribe and download the models they are interested in, to retrain and expand their AI ideas, and realize the idea-prototype-product process. Model Play also presets various commonly used AI models, such as MobileNetV1, InceptionV2, etc., and supports the submission and release of retrainable models to facilitate users to optimize and fine-tune their own business data.
Just like Google at this year's I/O conference, the developers were called to jointly contribute to the development community. At the same time, the Model Play team is also issuing AI model convening orders to developers around the world, soliciting deep learning models based on TensorFlow that can run on the Google Coral Dev Board to encourage more developers to participate in the activities. Ten thousand AI developers share ideas together.
請按此登錄後留言。未成為會員? 立即註冊