Intel’s Accelerating Mobile Push -or- Don’t Bet Against Intel

Writing about Intel in the mobile space, Jean-Louis Gassée wrote

The company’s inability to break into the mobile field — into any field other than PCs and servers — isn’t new, and it has worried Intel for decades.

The answer to this situation, I think, can be found earlier in the same piece-

A 65% Gross Margin is enviable for any company, and exceptional for a hardware maker

Intel’s failure to gain momentum and share in the mobile space, I would suggest, has more to do with Intel’s concern that they’ll compete with themselves1, rather than the company being outsmarted by rivals. Intel’s interest in low-power, decent-performance, low-priced processors is minimal if it might end up replacing a fat-margin desktop or laptop processor.

AMD ideally would be keeping Intel in check, but realistically it doesn’t.

From this philosophy, we’ve seen an endless procession of horribly crippled Atom processors. Generally built on obsolete processes and technologies, they were never a compelling option. But they presented no threat to Intel’s fat-margin product lines.

But that is starting to change. Intel is starting to care. Their negligence of the market didn’t keep it undeveloped, but instead saw competitors like Apple, Qualcomm and others building fantastic solutions, and the echos started to reverberate that these makers would soon challenge Intel’s crown jewel (also). It’s arguable that none of them have the fabrication or design skills of Intel, but they’ve made compelling solutions for the space.

And Intel started noticing: You can’t make profit on a sale you don’t make.

Android now fully supports both x86 and x86-64, with significant contributions from Intel, both in the OS and managed space, and in the tooling. Many millions of Intel-based Android tablets have been sold, countless users not even aware that they’re at the cutting edge of what may be a pretty big shift that two or three years ago was discounted as impossible.

In a few years critics might start crying foul about Intel buying their way into the space.

And they have. Intel has been subsidizing the use of Intel processors to the tune of $50+ per unit (though they claim they’ll end this subsidy sometime this year), with makers like Samsung, Acer, Asus, Lenova, Dell and others selling a number of Intel-equipped Android devices. I took advantage of that this holiday season, grabbing my kids some Asus ME176C tablets (hosting the Z3745), and we happened to grab a cheap Acer B1-730HD (Z2560) for a relative as an introduction to e-reading.

Getting 7″ tablets with IPS screens and decent enough processors for barely $100 is simply remarkable. It astounds me that these sensor-rich, dual-camera, GPS-equipped, connected, long-life devices cost so shockingly little, partly courtesy of Intel buying themselves into the space. As a first personal tablet, to be used and abused, these are perfect devices: The 1280×720 resolution isn’t my cup of tea, but for the use they make of it is ideal. The only real weakness of the tablets are the vendor-supplies OS (the crapware is particularly strong on the Acer, and you have the little annoyances like the Google apps not updating unless you find each of them in the play store and manually update, at which point they’ll be correctly under the auspices of the play app). They still have the household devices to use, but on their own units can configure it as they wish, with no concern about their siblings messing up their Minecraft world or filling the storage full of pictures of the dog.

Out of curiosity — and to play with the NDK optimization flags — I threw together some native benchmarks (using the NDK — an unloved product that Google forever treats as an unwanted afterthought, despite the fact that it is the single most significant reason Android saw market success) to build a fat APK: a single Android APK can contain native ABIs for a variety of platforms, which by default is ARM, ARM64, x86, x86-64, MIPS and MIPS64, choosing and using as appropriate at runtime. I ran these across a variety of dev devices in the house, normalizing to the high-performance Nexus 5 (Snapdragon 800). There are a number of benchmarks out there, but I was particularly curious about the impact of vectorization, and native performance direct, and through the underappreciated transcoder.

perf

These are entirely synthetic benchmarks, so the normal caveats of benchmarks apply — these do nothing to measure GPU performance, nor do they stress or demonstrate storage performance, and may totally deviate from real-world performance in certain scenarios (e.g. memory intensive, dependent on cache sizes, network-limited, etc).

These are single core benchmarks. The ME176C happens to have 4 cores, as do the other devices but the Nexus 4 and Galaxy S3 (both of which have two cores), and the lesser Atom processor in the Acer (the dual-core Z2560, which notably also doesn’t offer SSE4.x, and is crippled with a single-channel memory connection — it has significant weaknesses compared to the z3745) so naively you could quadruple the performance. In actual practice you can’t simply extrapolate like that, given that SoCs have power and thermal profiles that allow more leeway when fewer cores are in use, and start to clamp when the entire chip is drawing power (fun fact — using AVX on Xeons can often reduce the available clockspeed, undermining some of those gains).

In any case, these were compiled using every hardware optimization possible for each platform: VFPV4, Neon, SSE4.2 or SSE3, or a combination of that and hardware floating point, and loop unrolling. While the Atom-based chips take a significant performance penalty with double-precision calculation, they’re entirely adequate for a very low power, low price processor that in practice yields a 10-hour plus battery life. For the thermal and power profile, Intel is becoming competitive, which is something many were sure simply couldn’t happen with x86. As an important aside, the Z3745 has an unexploited x86-64 personality ready to come forth when the correlating OS build is supplied.

And to be clear, the Z3745 is a year old processor, and in this case is being compared against year+ old processors (the Snapdragon 800 is a year and a half old at this point). Of course newer designs, especially the A57 derivatives (including Apple’s superb designs), push performance higher, but the gains are coming slower. Intel is quite literally giving these chips away, and they contain little of Intel’s current technology, but they absolutely hold their own in power/performance, which is ultimately the concern (if we were talking about pure performance, the questions all disappear — Intel reigns absolutely supreme among high performance systems).

Android happens to host a number of superb emulators, and this happens to be a pretty big use of the pads, aside from the normal YouTube play-along videos and music playback. One of the indications that Intel buying their way into Android is paying dividends is that a growing number of these emulators are expressly noting that they natively support x86 devices.

And emulators bring up a fascinating topic. An emulator for a Gameboy Advance, for instance, emulates the device’s processor, turning the game instructions into “native” instructions (or, more likely, series of instructions). In the case of x86 Android tablets, before these emulators added native x86 support, the process of the emulators was to take the original device instructions and convert that into the appropriate set of ARM instructions. An Intel supplied transcoder works its magic, converting that ARM to the appropriate x86 instructions. This sounds like it would be horrible for performance, but the general experience of most users is fairly positive: Again, most have no idea that the process is even happening.

ARM, unsurprisingly, begs to differ. As one fun note on their analysis, they state-

What he found was somewhat surprising. Despite the relative ease of porting 32-bit ARMv7 Android apps to native x86 apps using the Android Native Development Kit (NDK), his July 2013 check-up revealed that 42 per cent of those popular apps still required binary translation to run, and that number rose to 44 per cent by January of this year.

I can explain this discrepancy for them — Unity. When you build a Unity application for Android — an increasingly common choice — it binds a significant native library with the build. When these Unity apps are run on an Intel-equipped Android tablet, the transcoding came into play. With Unity 4.6, however, Unity added x86 native binaries to their library. The transcoding for an enormous number of applications disappears.

In any case, transcoding is pretty fascinating. What sort of performance impact does it impose? I built the APK with only ARM native ABIs and compared.

perf_arm

Not as bad as I was expecting. While my normal ARM build for my gamut of devices included NEON and VFPV4, those builds crashed on the x86 device, but ran successfully when I tooled back the optimizations to just VFPV3. This remains an issue with some users experiencing crashes with native binaries built for newer ARM devices.

Emboldened, I wondered what the performance of the x86-64 Android AVD would be running on HAXM. Traditionally dealing with the Android emulator has been an absolutely brutal experience, but with native x86 and x86-64 OS builds, and the hardware thin-veneer accelerator, it is a far more tenable situation. In this case running on a Pentium G3258 (a very low end desktop processor).

perf_haxm

Obviously a processor plugged into the wall, and with several fans and a fat heatsink has more of a grant to perform, but just interesting that this image, running in an Android emulator, offers 6x the performance per core of a very well respected mobile chipset. Just a humorous aside.

But seriously, the question then is how HAXM compares to the world we used to live in. One where we had to deal with the QEMU-emulated ARM image.

perf_arm_x86

It really was that bad, and by messages regarding Android development, it seems that many still think it is. Given that it’s hard to make out, the HAXM x86 image bested the ARM emulated image by from 49x to 221x.

In any case, the tooling is there. The chips are becoming increasingly competitive (as an aside about GPUs — Intel, just like Apple, licenses Imagination Technologies PowerVR parts. The performance of the GPU is a function of the power headroom they want to abide, and the pricing of the chip). The market is adapting, and increasingly x86 is gaining equal support. Unity 4.6, for instance, effortlessly brought first class x86 mobile support to an enormous number of applications. The NDK itself by default supports x86 and x86-64, and for most developers it literally is nothing more than a flag.

It isn’t an ARM world, but instead is a wonderfully heterogeneous world where competition reigns supreme!

1 – The so-called “big three” North American auto-makers had their lunch eaten due to the same flawed strategy. While makers like Honda and Toyota made excellent low-priced cars, the domestic makers felt that they needed to make their lower-priced options particularly terrible — not only in features, functionality and quality, but even in branding where they essentially set them up as “the failure vehicle” — and the end result were millions of families becoming advocates of those alternative brands.