May 7 “Let Loose” Event - new iPads

Cmaier · 2024-05-12T15:42:14-0700

theorist9 said:
Thanks for letting me know that--I didn't think it was appropriate to compare Open CL to Metal scores, so it's nice to know they are comparable, and you can thus choose whatever is most appropriate for the platform.

But even using your numbers, the RTX 4060 Ti is still showing a higher Score/Bandwidth ceiling than the M3 Max, suggesting the M3 has enough bandwidth to score even higher, and that it is thus not bandwidth-limited. At the same time, the ratios are within 10%, which is about the typical variance of these tests. Hence the three possible conclusions are :

(a) The results are consistent with the M3 not being bandwidth limited.
(b) The results are too close to tell us anything.
(c) Because we don't know the relative "bandwidth efficiency" (bandwidth needed per computation rate) of the M3 and RTX 4060 Ti, this comparison doesn't tell us anyting.

View attachment 29386

is the bandwidth number for the RTX 4060 dedicated bandwidth to the GPU? If so, then maybe M3 actually has a lot less bandwidth available to its graphics circuits given that it has to share bandwidth with the CPU?

dada_dave · 2024-05-12T16:13:09-0700

Cmaier said:
is the bandwidth number for the RTX 4060 dedicated bandwidth to the GPU? If so, then maybe M3 actually has a lot less bandwidth available to its graphics circuits given that it has to share bandwidth with the CPU?

I believe that’s been taken into account now as he lowered the bandwidth to 350GB/s which is what the GPU can actually draw according to @leman. I’m not sure though.

theorist9 said:
Thanks for letting me know that--I didn't think it was appropriate to compare Open CL to Metal scores, so it's nice to know they are comparable, and you can thus choose whatever is most appropriate for the platform.

In some ways that is indeed better but I still feel that comparing cross API and cross graphics architecture is fraught with issues of so many variables. In the end it’s possibly good enough for this purpose as I’ll explain below.

theorist9 said:
But even using your numbers, the RTX 4060 Ti is still showing a higher Score/Bandwidth ceiling than the M3 Max, suggesting the M3 has enough bandwidth to score even higher, and that it is thus not bandwidth-limited. At the same time, the ratios are within 10%, which is about the typical variance of these tests. Hence the three possible conclusions are :

(a) The results are consistent with the M3 not being bandwidth limited.
(b) The results are too close to tell us anything.
(c) Because we don't know the relative "bandwidth efficiency" (bandwidth needed per computation rate) of the M3 and RTX 4060 Ti, this comparison doesn't tell us anyting.

View attachment 29386

I think @leman ’s point is that the 4060 Ti has theoretically nearly double the Tflops of the M3 Max (22 vs 13 I think) and if it weren’t bandwidth limited it should be scoring far more than the M3 Max. With such a wide gulf between them in compute and OpenCL being good enough on Nvidia cards it should be pulling far ahead. So in the end, you need to normalize by raw compute power as well. Or compare the normalization with bandwidth with normalization by compute.

One thing though that’s in the back of my mind, right now I’m just waking up in a haze and @leman always remembers this better than I do anyway, is Nvidia one of those that relies on ILP within a thread to achieve its full compute potential? In other words could this also be a limitation of that design at least with respect to the GB6 tests? Though I’ll admit a priori I would’ve thought compute-oriented tests would’ve been the ideal case for such a design.

May 7 “Let Loose” Event - new iPads

Cmaier

Site Master

dada_dave

Elite Member

Similar threads