Step by Step guide to create multi-cluster in Google Cloud, Read and Write CSV Files from Jupiter Notebook Hosted on Google Cloud

Prabhat Goel
3 min readApr 10, 2022

Please find below step by step instruction to setup the multi-node Hadoop cluster on google cloud and using PySpark framework in Jupiter notebook hosted/stored in bucket in Google cloud.

  1. Create your account on Google Cloud — By providing your credit card detail. Google gives you a credit worth $300.
  2. After setting up your first project in Google Cloud. Go to Data Proc -> Cluster.
Data Proc -> Cluster.

3. Once you click on cluster you will get a screen like below

4. Click on Cluster/ Try this API and configure “Master and Worker Node”

5. Below is the image showing – How to configure the “Master Node”.

(5.1) Give name to “Master Node”

(5.2) Select the cluster type — for our assignment we have chosen the “Standard” Cluster Type i.e. One master and N worker Node.

(5.3) Now configure the Master and Worker Node

Here we gave 32 GB space to master and worker nodes. For master node we choose 2 CPU of 7.5 GB Memory and for worker node we choose 1 CPU for each node of 3.75 GB memory. You can choose based on your requirement.

6. Select Jupiter Notebook option while configuring the nodes in a cluster. This option will create a web interface of Jupiter Notebook in cluster like show in below image.

7. After configuring the node. Click on Create cluster. Master and worker node will be created.

8. After creating the cluster we will see VM instance like below :

We successfully created Hadoop Cluster in Google Cloud. A bucket storage will also be created for the cluster. On this storage we will store our data given in assignment.

9. Click on Web Interface -> Jupiter Notebook.

Thanks for reading, if you and doubt or question feel free to reach out to me on my LinkedIn profile.

--

--

Prabhat Goel

Convert Idea into Reality , Always have time to discuss ideas, wake me up at midnight to discuss ideas or to implement new things