Apple has created a new file format for machine learning models. These files can be used easily to predict, regardless of the creation process, which means that “Apple Introduces Core ML” draws an analogy between these files and PDFs. It’s possible to generate predictions with only this file, and none of the creation libraries.

Generating predictions is a pain point faced by data scientists today and often involves the underlying math. At best, this involves using training the model in Python and then calling the underlying C library in the production app.

This file format will only become widely used if easy conversion from popular machine learning libraries is possible and predictions are simple to generate. Apple made these claims during their WWDC 2017 keynote. I want to investigate their claim.

Specifically, Apple claimed easy integration their .mlmodel file format and various Python libraries. It’s easy to integrate these into an app (literally via drag-and-drop) or another Python program.

File creation

Apple’s coremltools Python package make generation of this .mlmodel file straightforward:

  1. Train a model via scikit-learn, Keras, Caffe or XGBoost (see docs for conversion support for different library versions)
  2. Generate a coreml_model with converters.[library].convert(model)
  3. (optional) Add metadata (e.g., feature names, author, short description)
  4. Save the model with coreml_model.save

coremltools prints helpful error messages in my (brief) experience. When using converters.sklearn.convert it gave a helpful error message indicating that class labels should either be of type int or str (not float like I was using).

Here’s the complete script for the .mlmodel file generation:

import coremltools
from sklearn.svm import LinearSVC

def train_model():
    model = LinearSVC()
    # ...
    return model

model = train_model()

coreml_model = coremltools.converters.sklearn.convert(model)
coreml_model.author = 'Scott Sievert'  # other attributes can be added
coreml_model.save('sklearn.mlmodel')

Yup, creation of these .mlmodel files is as easy as Apple claims. Even better, it appears this file format has integration with named features and Pandas.

The generation of this file is easy. Now, where can these files be used?

These .mlmodel files can be included on any device that supports CoreML. It will not be tied to iOS/macOS apps, though these files will certainly be used there. It will allow general and easy use in Python for both saving and prediction. Given Apple’s expansion of Swift to other operating systems, I don’t believe it will be tied to a particular operating system.

Prediction

Prediction is easy as saving:

coremlmodel = coremltools.models.MLModel('sklearn.mlmodel')
coremlmodel.predict(example)  # `example` format should mirror training examples

However, I can’t test it as macOS 10.13 (currently in beta) is needed.

Difficulties

This difficulties were resolved quickly. Here’s what I ran while generating this post:

  • CoreML depends on Python 2.7
  • Version support in converting (e.g., Keras 2 not supported but 1.2 is).

The largest potential difficulty I see is with the limited (or not unlimited) scope of coremltools. There could be issues with version of different libraries, and not all classifiers in sklearn are supported (supported sklearn models).