Monday, March 28, 2016

Two key properties to set if you write code for accessing Hadoop concurrently

conf.setBoolean("fs.hdfs.impl.disable.cache", true);
conf.setBoolean("fs.maprfs.impl.disable.cache", true);
  conf.setBoolean("fs.s3.impl.disable.cache", true);
  conf.setBoolean("fs.s3n.impl.disable.cache", true);
  conf.setBoolean("fs.s3a.impl.disable.cache", true);

This will disable the cache instance of FileSystem when calling FileSystem.get(conf).  Otherwise, when first thread closes the FileSystem instance, other threads will get "Filesystem closed" error.