Dell, EMC, Dell Technologies, Cisco,

Monday, October 10, 2016

Why Java in Big Data? What about Scala?

Why #Java ? Why not? What about #Scala ? Or #Python. I use all three for various parts of #BigData projects. Use the best tool for the job. A lot of things can be orchestrated and managed without any coding through #Apache NiFi 1.0. Some things like #TensorFlow are best done in Python, while #Spark and #Flink jobs could be Scala, Python, or Java. #ApacheBeam is Java only (Spotify added a Scala interface, but it's not official yet. If you are a really strong Java 8 developer and code clean, you can write #Hadoop #MapReduce, #Kafka, Spark, Flink, #Apex. Apache NiFi is written in Java and so is most of Hadoop, so it's Big Data scale. Spark and others are written mostly in Scala. Ecosystem Scala and Java share a ton of libraries, as they run on the JVM. Python has its own huge ecosystem, but for many Hadoop things the JVM languages have a bit of an advantage. You can run JPython on the JVM, but I really haven't seen that used for Big Data, Spark, or Machine Learning. I am wondering if anyone is doing this? Please comment here. Python has TensorFlow and some nice Deep Learning and Machine Learning libraries. They are also starting to get more Universities teaching Python instead of Java. Not too many Universities are teaching Scala.

https://dzone.com/articles/why-java-in-big-data

No comments:

Post a Comment