Sunday, January 29, 2017

Configuring Eclipse for hadoop-"MapReduce"

SETTING UP THE ENVIRONMENT : #Eclipse is a IDE used for mainly for java programming. #MapReduce is a software framework associate with #java programming, using which we can write applications to batch process huge amounts of data, terabytes or petabytes of data stored in #Apache #Hadoop. MapReduce is a core component of Hadoop. So, we need to install and configure eclipse for MapReduce. To do that we need to have hadoop installed in our machine and HDFS must working fine. In this document we will install and configure for MapReduce and also run a program ( “WordCount.java” ) in Hadoop using eclipse. Required Software : Hadoop must be install and working fine. In this document hadoop 2.7.3 version will be used. Eclipse must be installed . INSTALLING HADOOP : We will consider Apache Hadoop is already installed and Hadoop Distributed File System (HDFS) is working fit and fine. If not then install hadoop. By storing some data in hdfs, retrieving data from hdfs, deleting and restaring machine check hdfs workoing fine or not. Do all kind of operations on hdfs. If something wrong happen then resolve it. INSTALLING & CONFIGURING ECLIPSE : DOWNLOADING ECLIPSE : We need eclipse to develop java programs so we need eclipse for java developers. To download it go to this page: https://eclipse.org/downloads/eclipse-packages/And download eclipse for Linux 64bit OS and select Eclipse IDE for Java developers. EXTRACTING THE TAR FILE : Go to that directory through terminal where the downloaded file resides. Suppose that directory is /home/suto/Downloads/downloads and extract the downloaded file (say ‘eclipse-java-neon-2-linux-gtk-x86_64.tar.gz’) by following commands : $cd /home/suto/Downloads/downloads $tar -zxvf eclipse-java-neon-2-linux-gtk-x86_64.tar.gz After successful extraction we will get a new folder named ‘eclipse’. DOWNLOADING HADOOP ECLIPSE PLUG-IN : Hadoop Eclipse Plug-in provides tools to ease the experience of Map/Reduce on Hadoop. Hadoop eclipse plug-in is used to associate all the hadoop accessories with eclipse. It supports to create Map, Reduce and Driver classes. Also helps eclipse to browse and interacts with hdfs, submitting jobs and monitoring on their execution. To download it go to this web page : https://github.com/Ravi-Shekhar/hadoop-eclipse-plugin/blob/master/release/hadoop-eclipse-plugin-2.6.0.jar After downloading the plug-in.jar file copy that file and paste this file in the folder ‘/eclipse/plugins’. CREATE DESKTOP ICON FOR ECLIPSE : Open terminal and edit ‘eclipse.desktop’ file which resides in ‘/usr/share/applications/’ directory and change all the following under desktop entry. Assign the path of ‘icon.xpm’ to ‘Icon’. Here icon.xpm resides in ‘/home/suto/Downloads/downloads/eclipse’ so We will use ‘Icon=/home/suto/Downloads/downloads/eclipse/icon.xpm’. Also change ‘Exec’ path to ‘Exec=/home/suto/Downloads/downloads/eclipse/eclipse’: $ sudo gedit /usr/share/applications/eclipse.desktop [Desktop Entry] Type=Application Name=Eclipse Comment=Eclipse Integrated Development Environment Icon=/home/suto/Downloads/downloads/eclipse/icon.xpm Exec=/home/suto/Downloads/downloads/eclipse/eclipse Terminal=false Categories=Development;IDE;Java; StartupWMClass=Eclipse Save the file.Then in the finder search for Eclipse when the menu comes up - just drag it to the launcher and then it will start working(fig2.5).Now run it from the launcher. CREATING PROJECT FOR HADOOP : Choose Map Reduce Perspective from Top-Right corner of eclipse IDE.Now, Go to top left corner and open File--> New --> Map reduce Project. Give it a name. in this document I am using the project name as ‘hadoop’ and Select the option ‘specify Hadoop library location’ and Browse it to the folder where hadoop is installed. In this document we set it /usr/local/hadoop. Click Next and finish and your ‘hadoop’ project will be created. SETTING DFS LOCATIONS : Select the Map/Reduce locations TAB at the bottom of the screen.Right click on the blank space and choose ‘new hadoop location’.Give Location name e.g. "master". Give host name e.g. "localhost". Give Port number Map Reduce = 9001 and DFS master = 9000 . Click on finish. This sets the hadoop server with eclipse also we can access, execute and modify data files through eclipse. But stil it will show error connecting to hdfs. So, Close the eclipse. CREATING DIRECTORIES IN HDFS : At first start all hadoop daemons if they are not started yet by the command start-all.sh and jps to check all the daemons are running or not. $ start-all.sh $jps Open terminal and go to this directory /usr/local/hadoop/bin and run this command to make a directory hadoop fs -mkdir /user/suto/input. Its a laborious to go this directory each time to execute the hadoop commands so its better to add this directory to PATH variable. So, we can execute our hadoop commands without changing the directory. $ export PATH=$PATH:/usr/local/hadoop/bin $ hadoop fs -mkdir /usr/suto/input Now you can create more directories say input1, input2, etc by this procedure and check them out by this command haoop fs -ls /. Also you can check it from eclipse. Re-open the eclipse and see under DFS location under master new directories are created. Now everything is set. You can perform any program through eclipse.

https://www.ibm.com/developerworks/community/blogs/d9a07ec3-11e2-467d-b758-6861c4cb1d44/entry/Configuring_Eclipse_for_hadoop_MapReduce?lang=en

1 comment:

  1. Very nice article post,Thank you for sharing this awesome blog.
    keep updating more big data hadoop tutorials.

    Big Data Online Training

    ReplyDelete