Monday, October 20, 2014

Yet the best explanation for Kerberos

http://windowsitpro.com/security/kerberos-active-directory

Saturday, October 18, 2014

Find a duplicate in array

  1. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time, use O(n) extra space, and may not modify the original array.
  2. Given an array of N elements in which each element is an integer between 1 and N-1 with one element duplicated, write an algorithm to find such duplicate. Your algorithm should run in linear time, use O(1) extra space, and may not modify the original array.
  3. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time and use O(1) extra space, and may modify the original array. 
  4. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time, use O(1) extra space, and may not modify the original array. 
More info: http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html

Solutions:

let s ← array [1..n]
initialize s to all zeroes
for 1 <= i <= n:
  if s[A[i]] > 0: return A[i]
  set s[A[i] ← 1

let s ← 0
for 1 <= i <= n:
  s ← s + A[i]
return s - n*(n-1)/2
Above solution for question #2 is not the best since there may be overflow of integer.  We can do better by:
temp = 0;
for (i = 0; i < n; i++) 
    temp = temp ^ A[i] ^ i;
return temp;

for 1 <= i <= n:
  while A[i] ≠ i:
    if A[A[i]] = A[i]: return A[i]
    swap(A[A[i]], A[i])

let i ← n, j ← n
do: i ← A[i], j ← A[A[j]]; until i = j
set j ← n
do: i ← A[i], j ← A[j]; until i = j
return i

Wednesday, October 15, 2014

Add Kerberos to Hadoop

Environment: CentOS 6.5; HW 1.3;

Install Kerberos:

yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation

Intialize KDC:

kdb5_util create -s

Edit the Access Control List (/var/kerberos/krb5kdc/kadm5.acl in RHEL or CentOS and /var/lib/kerberos/krb5kdc/kadm5.acl in SLES ) to define the principals that have admin (modifying) access to the database. A simple example would be a single entry:

*/admin@EXAMPLE.COM *

Create first user principal (through root user only):

[root@centos johnz]# kadmin.local -q "addprinc adminuser/admin"
Authenticating as principal root/admin@EXAMPLE.COM with password.
WARNING: no policy specified for adminuser/admin@EXAMPLE.COM; defaulting to no policy
Enter password for principal "adminuser/admin@EXAMPLE.COM":
Re-enter password for principal "adminuser/admin@EXAMPLE.COM":
Principal "adminuser/admin@EXAMPLE.COM" created.

Start Kerberos:

[root@centos johnz]# service krb5kdc start
Starting Kerberos 5 KDC:                                   [  OK  ]
[root@centos johnz]# service kadmin start
Starting Kerberos 5 Admin Server:                          [  OK  ]

Create a directory to store keytabs:

[root@centos johnz]# mkdir -p /etc/security/keytabs/
[root@centos johnz]# chown root:hadoop /etc/security/keytabs/
[root@centos johnz]# chmod 750 /etc/security/keytabs/

Add service principles for Hadoop:

[root@centos johnz]# kadmin -p adminuser/admin
Authenticating as principal adminuser/admin with password.
Password for adminuser/admin@EXAMPLE.COM:
kadmin:  addprinc -randkey nn/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey dn/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey HTTP/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey jt/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey tt/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hbase/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey zookeeper/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hcat/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey oozie/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hdfs/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hive/centos.hw13@EXAMPLE.COM

Export these principals to keytab files:

kadmin:  xst -k /etc/security/keytabs/spnego.service.keytab HTTP/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/nn.service.keytab nn/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/dn.service.keytab dn/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/jt.service.keytab jt/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/tt.service.keytab tt/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hive.service.keytab hive/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/oozie.service.keytab oozie/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hbase.service.keytab hbase/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/zk.service.keytab zookeeper/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hdfs.headless.keytab hdfs/centos.hw13@EXAMPLE.COM

Config Hadoop:

1. add following to core-site.xml:

  <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>RULE:[2:$1@$0](jt@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0](tt@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0](nn@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](dn@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hbase@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](hbase@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](oozie@.*EXAMPLE.COM)s/.*/oozie/
DEFAULT</value>
  </property>

2. add following to hdfs-site.xml:

<property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>
<property>
        <name>dfs.namenode.kerberos.principal</name>
        <value>nn/centos.hw13@EXAMPLE.COM</value>
        <description> Kerberos principal name for the
        NameNode </description>
</property>
<property>
        <name>dfs.secondary.namenode.kerberos.principal</name>
        <value>nn/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the secondary NameNode.
        </description>
</property>
<property>
        <name>dfs.web.authentication.kerberos.principal</name>
        <value>HTTP/centos.hw13@EXAMPLE.COM</value>
        <description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification.
        </description>
</property>  
<property>
        <name>dfs.datanode.kerberos.principal</name>
        <value>dn/_HOST@EXAMPLE.COM</value>
        <description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name.
        </description>
</property>  

<property>
        <name>dfs.web.authentication.kerberos.keytab</name>
        <value>/etc/security/keytabs/spnego.service.keytab</value>
        <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP

endpoint.
        </description>
</property>  
<property>
        <name>dfs.namenode.keytab.file</name>
        <value>/etc/security/keytabs/nn.service.keytab</value>
        <description>Combined keytab file containing the NameNode service and host principals.
        </description>
</property>
<property>
        <name>dfs.secondary.namenode.keytab.file</name>
        <value>/etc/security/keytabs/nn.service.keytab</value>
        <description>Combined keytab file containing the NameNode service and host principals.
        </description>
</property>
<property>
        <name>dfs.datanode.keytab.file</name>
        <value>/etc/security/keytabs/dn.service.keytab</value>
        <description>The filename of the keytab file for the DataNode.
        </description>
</property>

3. add following to mapred-site.xml:

<property>
        <name>mapreduce.jobtracker.kerberos.principal</name>
        <value>jt/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the JobTracker   </description>
</property>
<property>
        <name>mapreduce.tasktracker.kerberos.principal</name>  
        <value>tt/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the TaskTracker."_HOST" is replaced by the host name of the TaskTracker.
        </description>
</property>
<property>
        <name>mapreduce.jobtracker.keytab.file</name>
        <value>/etc/security/keytabs/jt.service.keytab</value>
        <description>The keytab for the JobTracker principal.
        </description>
</property>
<property>
        <name>mapreduce.tasktracker.keytab.file</name>
        <value>/etc/security/keytabs/tt.service.keytab</value>
        <description>The filename of the keytab for the TaskTracker</description>
</property>
<property>
        <name>mapreduce.jobhistory.kerberos.principal</name>
        <!--cluster variant -->
        <value>jt/centos.hw13@EXAMPLE.COM</value>
        <description> Kerberos principal name for JobHistory. This must map to the same user as the JobTracker user (mapred).
        </description>
</property>
<property>  
        <name>mapreduce.jobhistory.keytab.file</name>
        <!--cluster variant -->
        <value>/etc/security/keytabs/jt.service.keytab</value>
        <description>The keytab for the JobHistory principal.
        </description>
</property>  

4. Important: have to replace existing local_policy.jar and US_export_policy.jar files ($JAVA_HOME/jre/lib/security/ directory) from

following link:

 http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html

5. Very important: do following before start data node.

export HADOOP_SECURE_DN_USER=hdfs

6. Very important: check permission of keytab files.  They should be:

[root@centos ~]# ls -l /etc/security/keytabs/
total 40
-rw-------. 1 hdfs   hadoop 400 Oct 13 14:56 dn.service.keytab
-rw-------. 1 hdfs   hadoop 412 Oct 13 14:59 hdfs.headless.keytab
-rw-------. 1 mapred hadoop 400 Oct 13 14:56 jt.service.keytab
-rw-------. 1 hdfs   hadoop 400 Oct 13 14:56 nn.service.keytab
-rw-------. 1 mapred hadoop 400 Oct 13 14:56 tt.service.keytab

7. Important: have to start all services through root user.

[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start secondarynamenode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start jobtracker
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start tasktracker

Access Hadoop:

[root@centos ~]# kinit -k -t /etc/security/keytabs/hdfs.headless.keytab -p hdfs/centos.hw13@EXAMPLE.COM
[root@centos ~]# hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x   - hdfs supergroup          0 2014-10-03 17:13 /user/hive/warehouse

Sunday, October 5, 2014

Insert a node to red-black tree: LLRB algorithm is doing the best job!

    private Node put(Node h, Key key, Value val) {
        if (h == null) return new Node(key, val, RED, 1);

        int cmp = key.compareTo(h.key);
        if      (cmp < 0) h.left  = put(h.left,  key, val);
        else if (cmp > 0) h.right = put(h.right, key, val);
        else              h.val   = val;

        // fix-up any right-leaning links
        if (isRed(h.right) && !isRed(h.left))      h = rotateLeft(h);
        if (isRed(h.left)  &&  isRed(h.left.left)) h = rotateRight(h);
        if (isRed(h.left)  &&  isRed(h.right))     flipColors(h);
        h.N = size(h.left) + size(h.right) + 1;

        return h;
    }

Saturday, September 27, 2014

Magic of the array

1. Union find:

    public void union(int p, int q) {
        int rootP = find(p);
        int rootQ = find(q);
        if (rootP == rootQ) return;

        // make smaller root point to larger one
        if   (sz[rootP] < sz[rootQ]) { id[rootP] = rootQ; sz[rootQ] += sz[rootP]; }
        else                         { id[rootQ] = rootP; sz[rootP] += sz[rootQ]; }
        count--;
    }

2. Priority queue implemented by binary heap:

    public Key delMin() {
        if (isEmpty()) throw new NoSuchElementException("Priority queue underflow");
        exch(1, N);
        Key min = pq[N--];
        sink(1);
        pq[N+1] = null;         // avoid loitering and help with garbage collection
        if ((N > 0) && (N == (pq.length - 1) / 4)) resize(pq.length  / 2);
        assert isMinHeap();
        return min;
    }

3. Heap sort:

    public static void sort(Comparable[] pq) {
        int N = pq.length;
        for (int k = N/2; k >= 1; k--)
            sink(pq, k, N);
        while (N > 1) {
            exch(pq, 1, N--);
            sink(pq, 1, N);
        }
    }

4. Randomized queue:

    public Item dequeue() {
        if (size <= 0) {
            throw new NoSuchElementException("Tried to dequeue an empty queue.");
        }

        int index = StdRandom.uniform(size);
        Item item = items[index];
        items[index] = items[--size];
        items[size + 1] = null;

        if (size > ORIG_CAPACITY && size < capacity / 4) {
            resize(capacity / 2);
        }

        return item;
    }

5. 8-puzzle:

The n-puzzle is a classical problem for modelling algorithms involving heuristics. Commonly used heuristics for this problem include counting the number of misplaced tiles and finding the sum of the taxicab(or manhanttan) distances between each block and its position in the goal configuration

   // return sum of Manhattan distances between blocks and goal
    public int manhattan() {
        int manhattan = 0;
        for (int i = 0; i < N; i++) {
            for (int j = 0; j < N; j++) {
                int current = blocks[i][j];
                if (current == 0) continue;  // skip blank square.
                if (current != (i * N + j + 1)) {
                    manhattan += Math.abs((current -1 ) / N - i) + Math.abs((current - 1) % N - j);                    
                }
            }
        }
        return manhattan;
    }

Sunday, August 31, 2014

Write ORC file

First of all, there is no way to write ORC file for hive-exec-0.12 or earlier version since OrcStruct is package private for these old versions.

For hive-exec-0.13 or later:

Configuration conf = new Configuration();
conf.addResource(new Path("C:\\etc\\Hadoop\\conf_cdh5\\core-site.xml"));
conf.addResource(new Path("C:\\etc\\Hadoop\\conf_cdh5\\hdfs-site.xml"));
conf.addResource(new Path("C:\\etc\\Hadoop\\conf_cdh5\\mapred-site.xml"));

ObjectInspector inspector = ObjectInspectorFactory.getReflectionObjectInspector(String.class,
             ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
   
Writer writer = OrcFile.createWriter(new Path("/user/output_orcwriter"),
           OrcFile.writerOptions(conf).inspector(inspector).stripeSize(1048576/2).bufferSize(1048577)
               .version(OrcFile.Version.V_0_12));

// even OrcStruct is public, its constructor and setFieldValue method are not.
    Class<?> c = Class.forName("org.apache.hadoop.hive.ql.io.orc.OrcStruct");
    Constructor<?> ctor = c.getDeclaredConstructor(int.class);
ctor.setAccessible(true);
    Method setFieldValue = c.getDeclaredMethod("setFieldValue", int.class, Object.class);
    setFieldValue.setAccessible(true);

    for (int j=0; j<5; j++) {
    OrcStruct orcRow = (OrcStruct) ctor.newInstance(8);
    for (int i=0; i<8; i++)
    setFieldValue.invoke(orcRow, i, "AAA"+j+"BBB"+i);
    writer.addRow(orcRow);
    }
       writer.close();

Thursday, August 28, 2014

Access Hive collection type data in map reduce for Orc file

public void map(Object key, Writable value,
Context context) throws IOException, InterruptedException {

OrcStruct orcStruct = (OrcStruct)value;
int numberOfFields = orcStruct.getNumFields();

for (int i=0; i<numberOfFields; i++) {

// getFieldValue is private at this moment. Use reflection to access it.
Object field = null;
try {
field = getFieldValue.invoke(orcStruct, i);
} catch (Exception e) {
e.printStackTrace();
}
if (field==null) continue;

// process Hive collection type array, struct or map
if (field instanceof List) {
List list = (List)field;
for (int j=0; j<list.size(); j++)
System.out.println(list.get(j));
}
else if (field instanceof Map) {
Map map = (Map)field;
for (Iterator entries = map.entrySet().iterator(); entries.hasNext();) {
Map.Entry entry = (Entry) entries.next();
System.out.println("key="+entry.getKey()+",value="+entry.getValue());
}
}
else if (field instanceof OrcStruct) {
OrcStruct struct = (OrcStruct)field;
int numberOfField = struct.getNumFields();
for (int j=0; j<numberOfField; j++) {
try {
System.out.println("field"+j+"="+getFieldValue.invoke(struct, j));
} catch (Exception e) {
e.printStackTrace();
}
}
}
else {
System.out.println("Unknown type for field"+ field);
}
}
}