#bigdata #hadoop #apachespark #pyspark #machinelearning #windows
WhatsApp - +91 81065 31928
Email: thedatashastra@gmail.com
PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing. If you’re already familiar with Python and libraries such as Pandas, then PySpark is a good language to learn to create more scalable analyses and pipelines.
In this PySpark free tutorial we are going to learn all the aspects of PySpark using examples step-by-step.
In this first episode we will get Spark introduction and also learn how to install spark in windows 10 machine.
Download links:
1. Winutils.exe: https://github.com/steveloughran/winutils/blob/master/hadoop-3.0.0/bin/winutils.exe
2. Python: https://www.python.org/downloads/
3. JDK: https://www.oracle.com/in/java/technologies/javase/jdk11-archive-downloads.html
4. Apache Spark: https://spark.apache.org/downloads.html