At this year’s Worldwide Developers Conference (WWDC; 5-9 June), Apple announced two new frameworks – an augmented reality (AR) developer kit named ARKit, and a machine learning API called Core ML. The frameworks will be among the key features of iOS 11 (the third beta was released earlier this month) – the latest version of the company’s mobile platform. In today’s discussion, we will give you a brief idea about ARKit first, and move on to Core ML next:
“Augmented reality is going to help us mix the digital and physical in new ways.”
– Mark Zuckerberg, Facebook F8 Conference
Over the years, there has hardly been any activity from Apple in the virtual reality (VR) and augmented reality (AR) realms. As major rivals like Amazon (with Alexa), Microsoft (with HoloLens) and Google (with Project Tango) have upped their respective games, all that we got from Apple in the form of AR tools were Siri and iOS 10’s ‘intelligent’ Quicktype. The scene has changed with the arrival of ARKit, which has been billed as the ‘largest AR platform in the world’ by Apple.
More power to developers and apps
For third-party mobile app developers working on the iOS platform, ARKit brings in never-before capabilities to blend in AR experiences within their applications. With the help of the framework resources, motion sensor, and of course the camera of the iPhone/iPad, devs will be able to make their software seamlessly interact with the actual environment (read: digital tools will enrichen the real world). The role of AR in Pokemon Go was only the tip of the iceberg (many users even reported that the gameplay got enhanced when AR was turned off) – and ARKit will help developers go all in to integrate augmented reality in their apps, to make the latter unique, useful and more popular than ever before.
The fight with Google and Facebook
Apple is late to the AR game, there are no two ways about that. For ARKit to be able to make a mark, it has to offer something more than the AR-based solutions of Facebook and Google, which are both established players in this domain. Interestingly, Apple’s new framework DOES seem to have a key advantage: it is compatible with all existing iDevices running on the A9 or A10 chip, while for integrating Project Tango, Android OEMs have to create separate, customized hardware. Also, Facebook’s AR activity is, at least till now, confined to its own Camera app only. ARKit, on the other hand, will be pushed out to all iPhone/iPad applications. In terms of existing user-base, Apple certainly has a stronghold.
How does ARKit work?
The ARKit framework does not form three-dimensional models to deliver high-end AR-experiences to app-users. Instead, it uses a revolutionary technology called Visual Inertial Odometry (or, VIO) – which has the capability to combine information from the CoreMotion sensor and the device camera, to track the movement of the smartphone/tablet in a room. Put in another way, a set of points are traced in the environment by ARKit – and these points are tracked as the device is moved. This functionality is expected to help developers create customized virtual world experiences over the real environment, with their new apps (the superior processing speeds of A9/A10 chips is also an important factor). ARKit does not need any external calibration either, and should typically generate highly accurate data.
Note: The process in which ARKit integrates virtual elements into the real world with the help of projected geometry is known as ‘world tracking’.
4. The role of dual cameras
Apple’s decision to do away with the headphone jack in iPhone 7 raised quite a few eyebrows. There has been considerable curiosity about the presence of dual cameras in the handset. The announcement of ARKit fully justifies the latter decision though. With the help of the dual cameras, the capability to correctly gauge the distance between two viewpoints (from the device’s current location) become easier, and triangulation of this distance is also possible. The two cameras, working together, offers improved depth sensing, and obviously, better zooming features as well. This, in turn, helps the handset in creating pinpoint-accurate depth maps, and differentiate between background objects and foreground objects.
5. Finding planes and estimating lights
Floors and tables and other basic forms of horizontal planes can be detected by the ARKit framework. After the detection, the device (iPhone/iPad) can be used to put virtual objects on the tracked surface/plane. The plane detection (scene understanding) is done by devices with the help of the ‘scenes’ generated by the built-in camera. What’s more, the framework can also determine the availability of light in different scenes, and ensure that the virtual objects have just the right amount of lighting to appear natural in any particular scene. From tracking the perspective and scale of viewpoints, to shadow correction and performing hit-tests of digitized objects – the ‘world tracking’ functionality of ARKit can do them all.
Note: Once the scene understanding and estimation of lighting is done, virtual elements can actually be rendered to the real environment.
6. The limitations of ARKit
ARKit is Apple’s first native foray into the world of VR/AR, and the tech giant is clearly planning to take small steps at a time. As things stand at present, the framework lacks many of the convenient features of Google’s Project Tango – from the capability of capturing wide-angle scenes with the help of the additional cameras, to full room scanning and create 3D room models without requiring external peripherals (needed in iOS). The framework is not likely to have in-built web search capabilities (as Facebook’s and Google’s AR solutions have) either. What ARKit is expected to do (and do well) is motivating developers to come up with new app ideas with AR as their USP. It does not place any extra pressure on the device CPU, and also offers high-end object scalability. The Apple App Store has more than 2.2 million apps – and if a significant percentage of them have AR features (e.g, the option of activating AR mode), that will be instrumental in helping the technology take off in a big way.
In 2013, Apple coughed up around $345 million to acquire PrimeSense, the 3D sensor company that worked for Microsoft’s Kinect sensor. A couple of years later, the Cupertino company swooped in once again, with the acquisition of Linx (a smart camera motion manufacturer) for $20 million – and Metaio (an AR startup). ARKit might be the first significant augmented reality tool from Apple, but the company has been clearing its way for it from a long time. The arrival of this framework is big news, and it can revolutionize the interactions of iDevice owners with their mobile apps.
The global artificial intelligence (AI) market has been estimated to touch $48 billion by the end of this decade, growing at a CAGR (2015-2020) of more than 52%. Once again, Apple was relatively quiet on the AI and machine learning (ML) front (apart from the regular improvements in Siri). Rivals like IBM, Google, Facebook and Amazon are already firmly entrenched in this sector, and it will be interesting to see whether Core ML on iOS 11 can put Apple at a position of strength here.
1. What exactly is Core ML?
Core ML has been created as a foundational framework for creating optimized machine learning services across the board for Apple products. The implication of its arrival is immense for app-makers, who can now blend in superior AI and ML modules in their software. The manual coding required for using Core ML is minimal, and the framework offers in-depth deep learning for more than 30 different layer formats. With the help of Core ML, devs will be able to add custom machine learning capabilities in their upcoming apps for the iOS, tvOS, watchOS and macOS platforms.
2. From NLP to machine learning
Way back in 2012, natural language processing (NLP) debuted on iOS 5 through NSLinguisticTagger. iOS 8 brought in Metal, a tool that accessed the graphical processing units (GPUs) to deliver enhanced, immersive gaming experiences. In 2016, the Accelerate framework (for processing signals and images) received something new – the Basic Neural Networks for Subroutines (or, BNNS). Since the Core ML framework is designed on top of both Accelerate and Metal, the need to transfer data to a centralized server is eliminated. The framework can function entirely within a device, boosting the security of user-data.
Note: The iPhone 8 might well have a new AI chip. If that happens, it would be perfectly in line with Apple’s attempts to create a space for itself in the machine learning market.
3. How does Core ML work?
The operations of the Core ML framework can broadly be divided in two stages. In the first stage, machine learning algorithms are applied to available sets of training data (for better results, the size of the training dataset has to be large) – for the creation of a ‘trained model’. The next stage involves the conversion of this ‘trained model’ to a file in a .mlmodel format (i.e., a Core ML Model). High-level AI and ML features can be integrated in iOS applications with the help of this Core ML Model file. The function flow of the new machine learning API can be summarized as: creating ‘trained models’ → transforming them into Core ML models → using them to make ‘intelligent’ predictions.
The Core ML Model contains class labels and all inputs/outputs, and describes the layers used in the framework. The Xcode IDE has the capability of creating Objective-C or Swift wrapper classes (as the case might be), as soon as the model is included in an app project.
4. Understanding Vision
While ARKit and Core ML were the frameworks that grabbed most of the headlines in WWDC 2017, the arrival of a new computer vision and image analysis framework – appropriately named Vision – has been equally important. Vision would work along with Core ML, and will offer a wide range of feature detection, scene classification and identification features – right from ML-backed picture analysis and face recognition, to text and horizon detection, image alignment, object tracking and barcode detection. The wrappers for the Core ML models are generated by the Vision framework as well. Developers have to, however, keep in mind that Vision will be useful only for models that are image-based.
Note: Just like the other two frameworks, Vision also works with the SDKs of iOS 11, tvOS 11 and macOS 10.13 beta.
The Core ML Model, as should be pretty much evident from our discussions till now, is THE key element of the Core ML framework. Apple offers as many as 5 different, readymade Core ML models for third-party developers to use for creating apps. These models are Places205-GooLeNet, Inception V3, ResNet50, SqueezeNet and VGG16. Since Core ML works within the devices (and not on cloud servers), the overall memory footprints of these models are fairly low. Apart from the default-supported models, the new API is supports quite a few other ML tools (libSVM, XGBoost, Caffe and Keras).
Note: Whether a model is to be run on the GPU or the CPU of the device is decided by the Core ML framework itself. Also, since everything is on-device, the performance of machine learning-based apps is not affected by poor (or unavailability of) network connectivity.
The limitations of Core ML
There are no doubts about the potential of Core ML as an immensely powerful tool in the hands of developers to seamlessly add efficient machine intelligence for apps that would be usable on all Apple hardware devices. However, much like ARKit, this framework too seems like slightly undercooked on a couple of points. For starters (and this is a big point), Core ML is not open-source – and hence, app-makers have no option to tweak the API for their precise development requirements (most other ML toolkits are open-source). Also, in the absence of ‘federated learning’ and ‘model retraining’ in Core ML, the training data has to be manually provided. The final release of iOS 11 is still some way away, and it remains to be seen whether Apple adds any other capabilities to the framework.
Tree ensembles, neural networks, SVM (support vector machines) and regression (linear/logistic) are some of the models that are supported by Core ML. It is a framework that will make it possible for iOS developers to consider making apps with machine learning as one of their most important features. Core ML has been hailed by Apple as ‘machine learning for everyone’ – and it certainly can bring in machine learning (ML) and deep learning (DL) as an integral part of iOS app development in future.