I have recently been exploring the world of big data and started to use Spark, a platform for cluster computing (i.e. allows the spread of data and computations over clusters with multiple nodes (think of each node as a separate computer)).
However, Spark can be used in 3 main languages, Scala, Python and Java. If you are curious as to which language to use, check out this great article by Datacamp https://www.datacamp.com/community/tutorials/apache-spark-python
We will be download PySpark, the Python API for Spark.