Thursday, August 28, 2014

Access Hive collection type data in map reduce for Orc file

public void map(Object key, Writable value,
Context context) throws IOException, InterruptedException {

OrcStruct orcStruct = (OrcStruct)value;
int numberOfFields = orcStruct.getNumFields();

for (int i=0; i<numberOfFields; i++) {

// getFieldValue is private at this moment. Use reflection to access it.
Object field = null;
try {
field = getFieldValue.invoke(orcStruct, i);
} catch (Exception e) {
e.printStackTrace();
}
if (field==null) continue;

// process Hive collection type array, struct or map
if (field instanceof List) {
List list = (List)field;
for (int j=0; j<list.size(); j++)
System.out.println(list.get(j));
}
else if (field instanceof Map) {
Map map = (Map)field;
for (Iterator entries = map.entrySet().iterator(); entries.hasNext();) {
Map.Entry entry = (Entry) entries.next();
System.out.println("key="+entry.getKey()+",value="+entry.getValue());
}
}
else if (field instanceof OrcStruct) {
OrcStruct struct = (OrcStruct)field;
int numberOfField = struct.getNumFields();
for (int j=0; j<numberOfField; j++) {
try {
System.out.println("field"+j+"="+getFieldValue.invoke(struct, j));
} catch (Exception e) {
e.printStackTrace();
}
}
}
else {
System.out.println("Unknown type for field"+ field);
}
}
}


4 comments:

  1. could you provide the driver code for this MapReduce program?

    ReplyDelete
  2. You left out the tricky part: assigning a value to getFieldValue. I found that the following simple statement didn't work: getFieldValueMethod = OrcStruct.class.getMethod("getFieldValue", new Class[] { int.class });

    ReplyDelete
  3. OK, cool, you need to use a different reflection method, getDeclaredMethod, and also to make the function accessible afterwards, how weird is this?

    getFieldValue = OrcStruct.class.getDeclaredMethod("getFieldValue", int.class);
    getFieldValue.setAccessible(true);

    ReplyDelete