From /etc/hive/conf/hive-site.xml, looking for the following property so you will know where is the root location for all hive tables:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
Then, your hive file should look like:
hdfs://cdh1.host:8020/user/hive/warehouse/table_name/2018-01-01/.../file_000000
Do a substring search for "/user/hive/warehouse" and the table name is just right after it (of course, need to filter out the /).
Above algorithm only works if a user uses default location and default database.
If a table is under non-default database, the full path will be something like:
If a user specifies the 'Location' when defining Hive table (as in the following example), we have to use HiveMetaStoreClient to get the table name. This is the external table case.Above algorithm only works if a user uses default location and default database.
If a table is under non-default database, the full path will be something like:
hdfs://cdh1.host:8020/user/hive/warehouse/database_name.db/table_name/2018-01-01/.../file_000000
CREATE EXTERNAL TABLE weatherext ( wban INT, date STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘ /hive/data/weatherext’;
BTW, the way to get file path for a Hive table is to issue the following command from hive command line:
DESCRIBE FORMATTED table_name
Trie is the right data structure to hold locations so we can use do substring match for location given a full path of an HDFS file. We can build a location to table name hash map, so we can get the table name once we get the location from the full path.
Caching all locations in trie can cause high memory usage if you have millions of Hive tables. Also, need to update such trie when a new external table is created.
Trie is the right data structure to hold locations so we can use do substring match for location given a full path of an HDFS file. We can build a location to table name hash map, so we can get the table name once we get the location from the full path.
Caching all locations in trie can cause high memory usage if you have millions of Hive tables. Also, need to update such trie when a new external table is created.