Question

I have been using Cloudera's hadoop (0.20.2). With this version, if I put a file into the file system, but the directory structure did not exist, it automatically created the parent directories:

So for example, if I had no directories in hdfs and typed:

hadoop fs -put myfile.txt /some/non/existing/path/myfile.txt

It would create all of the directories: some, non, existing and path and put the file in there.

Now, with a newer offering of hadoop (2.2.0) this auto creation of directories is not happening. The same command above yields:

put: ` /some/non/existing/path/': No such file or directory

I have a workaround to just do hadoop fs -mkdir first, for every put, but this is not going to perform well.

Is this configurable? Any advice?

Answer:

 

 Now you should use hadoop fs -mkdir -p <path>

 

Placing a file into a non-extant directory in hdfs requires a two-step process. As @rt-vybor stated, use the '-p' option to mkdir to create multiple missing path elements. But since the OP asked how to place the file into hdfs, the following also performs the hdfs put, and note that you can also (optionally) check that the put succeeded, and conditionally remove the local copy.

First create the relevant directory path in hdfs, and then put the file into hdfs. You want to check that the file exists prior to placing into hdfs. And you may want to log/show that the file has been successfully placed into hdfs. The following combines all the steps.

fn=myfile.txt
if [ -f $fn ] ; then
  bfn=`basename $fn` #trim path from filename
  hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
  hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
  hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn
  success=$? #check whether file landed in hdfs
  if [ $success ] ; then
    echo "remove local copy of file $fn"
    #rm -f $fn #uncomment if you want to remove file
  fi
fi

And you can turn this into a shell script, taking a hadoop path, and a list of files (also only create path once),

#!/bin/bash
hdfsp=${1}
shift;
hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
for fn in $*; do
  if [ -f $fn ] ; then
    bfn=`basename $fn` #trim path from filename
    hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
    hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn >/dev/null
    success=$? #check whether file landed in hdfs
    if [ $success ] ; then
      echo "remove local copy of file $fn"
      #rm -f $fn #uncomment if you want to remove file
    fi
  fi
done

 

 


Other Popular Courses