Monday, July 21, 2014

Assigning a static IP to a VMware Workstation VM

http://bytealmanac.wordpress.com/2012/07/02/assigning-a-static-ip-to-a-vmware-workstation-vm/

Assumption: the VM is running a DHCP client and is assigned a dynamic IP by the DHCP service running on the host. My host machine is runs on Windows 7 Ultimate x64 with VMware Workstation 8.
Open C:\ProgramData\VMware\vmnetdhcp.conf as Administrator. This file follows the syntax of dhcpd.conf. Add the following lines (change host name, MAC, IP appropriately – these are shown in a different colour) under the correct section (for me it was for a NAT based network – VMnet8). The MAC can be found from the VM’s properties.
host ubuntu {
    hardware ethernet 00:0C:29:16:2A:D6;
    fixed-address 192.168.84.132;
}
Restart the VMware DHCP service. Use the following commands from an elevated prompt:
net stop vmnetdhcp
net start vmnetdhcp
On the VM, acquire a new lease using the below command (if VM runs Linux):

ifconfig eth0 down
ifconfig eth0 up

Thursday, July 17, 2014

Cloudera CDH5 source code download

https://repository.cloudera.com/artifactory/public/org/apache/hadoop/hadoop-core/

Wednesday, July 2, 2014

zookeeper-env.sh issue when setting up HBase

Followed CDH4.2.2 installation guide to setup HBase.

Ran "service zookeeper-server start"

No error from command line.  But zookeeper.log says "nohup: failed to run command java’: No such file or directory".

Interesting!  Java home is right by doing "echo $JAVA_HOME".

After about one hour troubleshooting and script checking, it turned out:

zookeeper-env.sh is needed under /etc/zookeeper/conf directory (as other Hadoop component).  But for somehow, zookeeper installer did not have such file by default.

I have to manually create such file and put following line into it:

export JAVA_HOME=/opt/jdk1.6.0_45/

After that, starting zookeeper works.  I can see it from 'jps':

[root@centos conf]# jps
2732 TaskTracker
4964 Jps
4776 QuorumPeerMain
3133 NameNode
2548 JobTracker
2922 DataNode



Wednesday, June 25, 2014

Eclipse debug step into, step over, etc. disappeared. How to bring them back?



http://stackoverflow.com/questions/12912896/eclipse-buttons-like-step-in-step-out-resume-etc-not-working

Monday, June 9, 2014

Write RC file

private static void testWrite() throws IOException {
Configuration conf = new Configuration();
conf.addResource(new Path("C:\\etc\\Hadoop\\conf\\core-site.xml"));
conf.addResource(new Path("C:\\etc\\Hadoop\\conf\\hdfs-site.xml"));
conf.addResource(new Path("C:\\etc\\Hadoop\\conf\\mapred-site.xml"));

FileSystem fs = null;
try {
fs = FileSystem.get(conf);
} catch (IOException e1) {
e1.printStackTrace();
}

// has to set column number manually
conf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 4);

RCFile.Writer rcWriter = new RCFile.Writer(fs, conf, new Path("/user/abc/output_rcwriter/output1"));

String[]  values =  
{"111222333,1200,999999.99,abc@yahoo.com",
"1112226666,1201,999999.99,abcdefg@yahoo.com"};

for (String value : values) {
String[] columns = value.split(",");

if (columns.length>0) {
BytesRefArrayWritable outputRow = new BytesRefArrayWritable(columns.length);
for (int i=0; i<columns.length; i++) {
BytesRefWritable column = new BytesRefWritable(columns[i].getBytes("UTF-8"));
outputRow.set(i, column);
}
rcWriter.append(outputRow);
}
}
rcWriter.close();
}

Friday, May 30, 2014

Eclipse: search files outside workspace


I used to use Visual Studio and IntelliJ IDEA for coding.  When switching to use Eclipse, i found the search is very workspace oriented.  I can not search something outside of my project.

Finally, i found this useful link to solve this problem:

http://eclipse.dzone.com/articles/5-best-eclipse-plugins-system

Friday, May 23, 2014

Row wise read vs column wise read for RCFile

Row Wise Read:

private static void readRowWise(RCFile.Reader rcReader) {
int rowcounter = 0;
Text len = rcReader.getMetadata().get(new Text("hive.io.rcfile.column.number"));
int numberOfColumns = Integer.valueOf(len.toString());

try {
while (rcReader.next(new LongWritable(rowcounter))) {
BytesRefArrayWritable cols = new BytesRefArrayWritable();

  /** * Have to call 'resetValid' for all rows to allocate how many columns for each row. * This looks ugly. But this is the way to make the row wise reading working. */  cols.resetValid(numberOfColumns);

/**
* The name of getCurrentRow is kind of misleading.  It actually reads all rows in the current row group,
* column by column (due to the file format nature of RCFile) and store them internally so next call to getCurrentRow
* will actually return the same data buffer. By default, it sets 'valid' variable to number of columns so only the columns
* for first row can be gotten by calling cols.get(i).
*
* Once first row is read, a call to 'resetValid' will allow us to read next row.  The value passed to 'resetValid'
* have to be the number of columns to allow read all columns for next row.
*/
rcReader.getCurrentRow(cols);

int size = cols.size();  // this actually returns the number of columns in the current row.

for (int i= 0; i<size; i++) {
BytesRefWritable currentColumn = cols.get(i);

byte[] currentColumnBytes = currentColumn.getBytesCopy();  // get current column data for the current row
Text text = new Text(currentColumnBytes);
System.out.println("columnText="+text.toString());
}
rowcounter++;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}


Column Wise Read:

private static void readColumnWise(RCFile.Reader rcReader) {
Text len = rcReader.getMetadata().get(new Text("hive.io.rcfile.column.number"));
int numberOfColumns = Integer.valueOf(len.toString());
String[][] firstNRows = null;
int numberOfRowsNeeded = 10;  // only looking at first 10 rows
try {
// go through each row group
while (rcReader.nextColumnsBatch()) {
// go through each column in current row group
for (int i=0; i<numberOfColumns; i++) {
BytesRefArrayWritable columnData = rcReader.getColumn(i, null);
if (firstNRows==null)
firstNRows = new String[Math.min(numberOfRowsNeeded,columnData.size())][numberOfColumns];
// for a given column, go through each row in current row group
for (int j=0; j<columnData.size() && j<numberOfRowsNeeded; j++) {
BytesRefWritable cellData = columnData.get(j);
byte[] currentCell = Arrays.copyOfRange(cellData.getData(), cellData.getStart(), cellData.getStart()+cellData.getLength());
Text currentCellStr = new Text(currentCell);
System.out.println("columnText="+currentCellStr);
firstNRows[j][i] = currentCellStr.toString();
}
}
}
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
// transfer the matrix to row based from column based
for (int i=0; i<numberOfRowsNeeded; i++) {
for (int j=0; j<numberOfColumns; j++) {
if (j>0) System.out.print(",");
System.out.print(firstNRows[i][j]);
}
System.out.println();
}
}

A Test Driver:

private static void testDirectRead(boolean rowWise) {
Configuration conf = new Configuration();
conf.addResource(new Path("C:\\etc\\Hadoop\\conf\\core-site.xml"));
conf.addResource(new Path("C:\\etc\\Hadoop\\conf\\hdfs-site.xml"));
conf.addResource(new Path("C:\\etc\\Hadoop\\conf\\mapred-site.xml"));

FileSystem fs = null;
try {
fs = FileSystem.get(conf);
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}

RCFile.Reader rcReader = null;
try {
rcReader = new RCFile.Reader(fs, new Path("/user/hive/warehouse/rc_userdatatest2/000000_0"), conf);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

if (rowWise)
readRowWise(rcReader);
else
readColumnWise(rcReader);

rcReader.close();
}