Saturday, November 26, 2011

Hadoop Lookouts

Some important checks for hadoop

Logs are in log directory

Its a good idea to start and stop all the nodes once. While stopping we are looking for following which means all the Hadoop components have been stopped.

stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode


If you following message then namenode is not started and all the jobs will fail.

stopping jobtracker
localhost: stopping tasktracker
no namenode to stop
localhost: stopping datanode
localhost: stopping secondarynamenode

For minimal configuration refer to http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ or http://hadoop.apache.org/common/docs/current/cluster_setup.html


Format the name node
$./hadoop namenode -format

Start dfs :
$./start-dfs.sh

Stop dfs :
$./stop-dfs.sh

start mapred :
$./start-mapred.sh

stop mapred :
$./stop-mapred.sh

copy data from local to hdfs :
$./hadoop fs -copyFromLocal /home/input /input

create directory in hdfs
$./hadoop fs -mkdir input

copy the data from hdfs to local file system
$./hadoop fs -get /output /home/output

check if whole dfs file system in ok
$./bin/hadoop fsck / -files -blocks -locations > dfs-v-old-fsck-1.log
Note that here we are checking "/" ie root dfs. You can replace root dir with any other dir you want to check.

List all the dfs files and dirs of the system
$./bin/hadoop dfs -lsr / > dfs-v-old-lsr-1.log

LIst all the nodes pariticipating in the cluster
$./bin/hadoop dfsadmin -report > dfs-v-old-report-1.log





No comments:

Post a Comment