Monday, December 2, 2024
No menu items!
HomeCloud ComputingHow to run inference workloads from a Dataflow Java pipeline

How to run inference workloads from a Dataflow Java pipeline

Many data engineers desire the ability to more easily run their python-based machine learning workloads from Java or Go pipelines. There are a number of reasons why users want to do this: they might already have a pipeline written in a language that they want to enrich with ML, they might have access to special sources or transforms written in that language, or they might just prefer working in Java or Go. Whatever the reason, with Dataflow‘s multi-language capabilities, running Python ML workloads is now easy.

There are 2 main ways of running multi-language inference in Dataflow:

1) Run using the Java or Go RunInference transforms. If you have a Python model developed with frameworks like TensorFlow, PyTorch, or Sklearn and you want to run inference with that model while keeping most of your pre/post-processing in Java or Go, you can use the Java or Go RunInference transforms. Under the covers, Dataflow will automatically spin up a Python worker to run your inference.

code_block[StructValue([(u’code’, u’private String getModelLoaderScript() {rn String s = “from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy\n”;rn s = s + “from apache_beam.ml.inference.base import KeyedModelHandler\n”;rn s = s + “def get_model_handler(model_uri):\n”;rn s = s + ” return KeyedModelHandler(SklearnModelHandlerNumpy(model_uri))\n”;rnrn return s;rn }rnrnpCollection.apply(rn RunInference.ofKVs(getModelLoaderScript(), schema, VarLongCoder.of())rn .withKwarg(“model_uri”, options.getModelPath()))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e96f1115d10>)])]

The RunInference transform will automatically handle most of your boilerplate code like batching your inputs, efficiently sharing the model, and loading your model from a remote filesystem or model hub. As of Beam 2.47, RunInference has full support for TensorFlow, PyTorch, Sklearn, XGBoost, ONNX, and TensorRT models. You can also build your own custom model handler to support other frameworks. For a full example pipeline using the Java RunInference transform, see https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/multilanguage/SklearnMnistClassification.java. 

2) Create a custom Python transform to do all of your pre/postprocessing and inference (this can still leverage the RunInference transform). This can be invoked using an External Transform and allows you to use your favorite Python libraries for pre/post-processing.

code_block[StructValue([(u’code’, u”# Python transformrnclass InferenceTransform(ptransform.PTransform):rn def expand(self, pcoll):rn return (rn pcollrn | ‘Preprocess’ >> beam.ParDo(<myPreprocessingFunction>)rn | ‘Inference’ >> RunInference(PytorchModelHandlerKeyedTensor(<config>))rn | ‘Postprocess’ >> beam.ParDo(<myPostProcessingFunction>)rn )”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e96f1115c50>)])]
code_block[StructValue([(u’code’, u’// Applied on Java pipelinerninput.apply(“Predict”, PythonExternalTransform.<PCollection<String>, PCollection<String>>from(rn “multi_language_custom_transform.composite_transform.InferenceTransform”)rn .withKwarg(<args to pass transform>)rn .withExtraPackages(<dependencies to pass transform>)rn );’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e96f11157d0>)])]

For a full example walkthrough using an External Transform, see https://beam.apache.org/documentation/ml/multi-language-inference/.

Running inference from Java or Go at scale is easy with Dataflow. Pipelines that make use of this can unlock the full potential of python machine learning frameworks without overhauling existing logic or losing access to Java APIs.

Learn More

If you’re interested in learning more, you can try walking through an end to end example of performing inference from a Java pipeline in the apache/beam repository.

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments