Monday, December 15, 2014

Simple Java Book

http://www.programcreek.com/wp-content/uploads/2013/01/SimpleJava1.pdf

Saturday, November 8, 2014

IndexMinPQ

Index minimal priority queue combine the beauty of priority queue and functionality from array.

It is a priority queue sorted by Key (Key extends Comparable<Key>):

int delMin();  <-- delete the one which has lowest priority

It supports indexed insert, update and delete:

void insert(int i, Key key);
void increaseKey(int i, Key key);
void decreaseKey(int i, Key key);
void delete(int);  <-- delete the one on position i regardless its priority

All operations are log V or less.

It is used by:

1. Eager Prim MST (minimal spanning tree)

2. Dijkstra's shortest path algorithm

Invariables to keep in mind when dealing with IndexMinPQ:

1. Priority is stored in Key[]
2. Vertex is stored in pq[] -- this is the real priority queue
3. Using qp[] to reverse lookup from vertex id to its entry in pq[], i.e. pq[qp[i]] == i.

We can use 'delete' to see their relationships:

    /**
     * Remove vertex i (stored in pq[]) and its priority (stored in keys[]).
     * @param i the vertex id
     */
    public void delete(int i) {
        if (i < 0 || i >= NMAX) throw new IndexOutOfBoundsException();
        if (!contains(i)) throw new NoSuchElementException("index is not in the priority queue");
        int index = qp[i];
        exch(index, N--);
        swim(index);
        sink(index);
        keys[i] = null;
        qp[i] = -1;
    }

Both sink and swim are modified to use priority values stored in Key[] to change the order of priority queue:

private void swim(int k) {
    while (k > 1 && greater(k/2, k) {
        exch(k, k/2);
        k = k/2;
    }
}
private boolean greater(int i, int j) {
        return keys[pq[i]].compareTo(keys[pq[j]]) > 0;
}

Monday, October 20, 2014

Yet the best explanation for Kerberos

http://windowsitpro.com/security/kerberos-active-directory

Saturday, October 18, 2014

Find a duplicate in array

  1. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time, use O(n) extra space, and may not modify the original array.
  2. Given an array of N elements in which each element is an integer between 1 and N-1 with one element duplicated, write an algorithm to find such duplicate. Your algorithm should run in linear time, use O(1) extra space, and may not modify the original array.
  3. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time and use O(1) extra space, and may modify the original array. 
  4. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time, use O(1) extra space, and may not modify the original array. 
More info: http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html

Solutions:

let s ← array [1..n]
initialize s to all zeroes
for 1 <= i <= n:
  if s[A[i]] > 0: return A[i]
  set s[A[i] ← 1

let s ← 0
for 1 <= i <= n:
  s ← s + A[i]
return s - n*(n-1)/2
Above solution for question #2 is not the best since there may be overflow of integer.  We can do better by:
temp = 0;
for (i = 0; i < n; i++) 
    temp = temp ^ A[i] ^ i;
return temp;

for 1 <= i <= n:
  while A[i] ≠ i:
    if A[A[i]] = A[i]: return A[i]
    swap(A[A[i]], A[i])

let i ← n, j ← n
do: i ← A[i], j ← A[A[j]]; until i = j
set j ← n
do: i ← A[i], j ← A[j]; until i = j
return i

Wednesday, October 15, 2014

Add Kerberos to Hadoop

Environment: CentOS 6.5; HW 1.3;

Install Kerberos:

yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation

Intialize KDC:

kdb5_util create -s

Edit the Access Control List (/var/kerberos/krb5kdc/kadm5.acl in RHEL or CentOS and /var/lib/kerberos/krb5kdc/kadm5.acl in SLES ) to define the principals that have admin (modifying) access to the database. A simple example would be a single entry:

*/admin@EXAMPLE.COM *

Create first user principal (through root user only):

[root@centos johnz]# kadmin.local -q "addprinc adminuser/admin"
Authenticating as principal root/admin@EXAMPLE.COM with password.
WARNING: no policy specified for adminuser/admin@EXAMPLE.COM; defaulting to no policy
Enter password for principal "adminuser/admin@EXAMPLE.COM":
Re-enter password for principal "adminuser/admin@EXAMPLE.COM":
Principal "adminuser/admin@EXAMPLE.COM" created.

Start Kerberos:

[root@centos johnz]# service krb5kdc start
Starting Kerberos 5 KDC:                                   [  OK  ]
[root@centos johnz]# service kadmin start
Starting Kerberos 5 Admin Server:                          [  OK  ]

Create a directory to store keytabs:

[root@centos johnz]# mkdir -p /etc/security/keytabs/
[root@centos johnz]# chown root:hadoop /etc/security/keytabs/
[root@centos johnz]# chmod 750 /etc/security/keytabs/

Add service principles for Hadoop:

[root@centos johnz]# kadmin -p adminuser/admin
Authenticating as principal adminuser/admin with password.
Password for adminuser/admin@EXAMPLE.COM:
kadmin:  addprinc -randkey nn/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey dn/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey HTTP/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey jt/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey tt/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hbase/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey zookeeper/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hcat/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey oozie/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hdfs/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hive/centos.hw13@EXAMPLE.COM

Export these principals to keytab files:

kadmin:  xst -k /etc/security/keytabs/spnego.service.keytab HTTP/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/nn.service.keytab nn/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/dn.service.keytab dn/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/jt.service.keytab jt/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/tt.service.keytab tt/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hive.service.keytab hive/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/oozie.service.keytab oozie/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hbase.service.keytab hbase/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/zk.service.keytab zookeeper/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hdfs.headless.keytab hdfs/centos.hw13@EXAMPLE.COM

Config Hadoop:

1. add following to core-site.xml:

  <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>RULE:[2:$1@$0](jt@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0](tt@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0](nn@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](dn@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hbase@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](hbase@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](oozie@.*EXAMPLE.COM)s/.*/oozie/
DEFAULT</value>
  </property>

2. add following to hdfs-site.xml:

<property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>
<property>
        <name>dfs.namenode.kerberos.principal</name>
        <value>nn/centos.hw13@EXAMPLE.COM</value>
        <description> Kerberos principal name for the
        NameNode </description>
</property>
<property>
        <name>dfs.secondary.namenode.kerberos.principal</name>
        <value>nn/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the secondary NameNode.
        </description>
</property>
<property>
        <name>dfs.web.authentication.kerberos.principal</name>
        <value>HTTP/centos.hw13@EXAMPLE.COM</value>
        <description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification.
        </description>
</property>  
<property>
        <name>dfs.datanode.kerberos.principal</name>
        <value>dn/_HOST@EXAMPLE.COM</value>
        <description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name.
        </description>
</property>  

<property>
        <name>dfs.web.authentication.kerberos.keytab</name>
        <value>/etc/security/keytabs/spnego.service.keytab</value>
        <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP

endpoint.
        </description>
</property>  
<property>
        <name>dfs.namenode.keytab.file</name>
        <value>/etc/security/keytabs/nn.service.keytab</value>
        <description>Combined keytab file containing the NameNode service and host principals.
        </description>
</property>
<property>
        <name>dfs.secondary.namenode.keytab.file</name>
        <value>/etc/security/keytabs/nn.service.keytab</value>
        <description>Combined keytab file containing the NameNode service and host principals.
        </description>
</property>
<property>
        <name>dfs.datanode.keytab.file</name>
        <value>/etc/security/keytabs/dn.service.keytab</value>
        <description>The filename of the keytab file for the DataNode.
        </description>
</property>

3. add following to mapred-site.xml:

<property>
        <name>mapreduce.jobtracker.kerberos.principal</name>
        <value>jt/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the JobTracker   </description>
</property>
<property>
        <name>mapreduce.tasktracker.kerberos.principal</name>  
        <value>tt/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the TaskTracker."_HOST" is replaced by the host name of the TaskTracker.
        </description>
</property>
<property>
        <name>mapreduce.jobtracker.keytab.file</name>
        <value>/etc/security/keytabs/jt.service.keytab</value>
        <description>The keytab for the JobTracker principal.
        </description>
</property>
<property>
        <name>mapreduce.tasktracker.keytab.file</name>
        <value>/etc/security/keytabs/tt.service.keytab</value>
        <description>The filename of the keytab for the TaskTracker</description>
</property>
<property>
        <name>mapreduce.jobhistory.kerberos.principal</name>
        <!--cluster variant -->
        <value>jt/centos.hw13@EXAMPLE.COM</value>
        <description> Kerberos principal name for JobHistory. This must map to the same user as the JobTracker user (mapred).
        </description>
</property>
<property>  
        <name>mapreduce.jobhistory.keytab.file</name>
        <!--cluster variant -->
        <value>/etc/security/keytabs/jt.service.keytab</value>
        <description>The keytab for the JobHistory principal.
        </description>
</property>  

4. Important: have to replace existing local_policy.jar and US_export_policy.jar files ($JAVA_HOME/jre/lib/security/ directory) from

following link:

 http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html

5. Very important: do following before start data node.

export HADOOP_SECURE_DN_USER=hdfs

6. Very important: check permission of keytab files.  They should be:

[root@centos ~]# ls -l /etc/security/keytabs/
total 40
-rw-------. 1 hdfs   hadoop 400 Oct 13 14:56 dn.service.keytab
-rw-------. 1 hdfs   hadoop 412 Oct 13 14:59 hdfs.headless.keytab
-rw-------. 1 mapred hadoop 400 Oct 13 14:56 jt.service.keytab
-rw-------. 1 hdfs   hadoop 400 Oct 13 14:56 nn.service.keytab
-rw-------. 1 mapred hadoop 400 Oct 13 14:56 tt.service.keytab

7. Important: have to start all services through root user.

[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start secondarynamenode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start jobtracker
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start tasktracker

Access Hadoop:

[root@centos ~]# kinit -k -t /etc/security/keytabs/hdfs.headless.keytab -p hdfs/centos.hw13@EXAMPLE.COM
[root@centos ~]# hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x   - hdfs supergroup          0 2014-10-03 17:13 /user/hive/warehouse

Sunday, October 5, 2014

Insert a node to red-black tree: LLRB algorithm is doing the best job!

    private Node put(Node h, Key key, Value val) {
        if (h == null) return new Node(key, val, RED, 1);

        int cmp = key.compareTo(h.key);
        if      (cmp < 0) h.left  = put(h.left,  key, val);
        else if (cmp > 0) h.right = put(h.right, key, val);
        else              h.val   = val;

        // fix-up any right-leaning links
        if (isRed(h.right) && !isRed(h.left))      h = rotateLeft(h);
        if (isRed(h.left)  &&  isRed(h.left.left)) h = rotateRight(h);
        if (isRed(h.left)  &&  isRed(h.right))     flipColors(h);
        h.N = size(h.left) + size(h.right) + 1;

        return h;
    }

Saturday, September 27, 2014

Magic of the array

1. Union find:

    public void union(int p, int q) {
        int rootP = find(p);
        int rootQ = find(q);
        if (rootP == rootQ) return;

        // make smaller root point to larger one
        if   (sz[rootP] < sz[rootQ]) { id[rootP] = rootQ; sz[rootQ] += sz[rootP]; }
        else                         { id[rootQ] = rootP; sz[rootP] += sz[rootQ]; }
        count--;
    }

2. Priority queue implemented by binary heap:

    public Key delMin() {
        if (isEmpty()) throw new NoSuchElementException("Priority queue underflow");
        exch(1, N);
        Key min = pq[N--];
        sink(1);
        pq[N+1] = null;         // avoid loitering and help with garbage collection
        if ((N > 0) && (N == (pq.length - 1) / 4)) resize(pq.length  / 2);
        assert isMinHeap();
        return min;
    }

3. Heap sort:

    public static void sort(Comparable[] pq) {
        int N = pq.length;
        for (int k = N/2; k >= 1; k--)
            sink(pq, k, N);
        while (N > 1) {
            exch(pq, 1, N--);
            sink(pq, 1, N);
        }
    }

4. Randomized queue:

    public Item dequeue() {
        if (size <= 0) {
            throw new NoSuchElementException("Tried to dequeue an empty queue.");
        }

        int index = StdRandom.uniform(size);
        Item item = items[index];
        items[index] = items[--size];
        items[size + 1] = null;

        if (size > ORIG_CAPACITY && size < capacity / 4) {
            resize(capacity / 2);
        }

        return item;
    }

5. 8-puzzle:

The n-puzzle is a classical problem for modelling algorithms involving heuristics. Commonly used heuristics for this problem include counting the number of misplaced tiles and finding the sum of the taxicab(or manhanttan) distances between each block and its position in the goal configuration

   // return sum of Manhattan distances between blocks and goal
    public int manhattan() {
        int manhattan = 0;
        for (int i = 0; i < N; i++) {
            for (int j = 0; j < N; j++) {
                int current = blocks[i][j];
                if (current == 0) continue;  // skip blank square.
                if (current != (i * N + j + 1)) {
                    manhattan += Math.abs((current -1 ) / N - i) + Math.abs((current - 1) % N - j);                    
                }
            }
        }
        return manhattan;
    }