Monday, October 20, 2014

Yet the best explanation for Kerberos

http://windowsitpro.com/security/kerberos-active-directory

Saturday, October 18, 2014

Find a duplicate in array

  1. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time, use O(n) extra space, and may not modify the original array.
  2. Given an array of N elements in which each element is an integer between 1 and N-1 with one element duplicated, write an algorithm to find such duplicate. Your algorithm should run in linear time, use O(1) extra space, and may not modify the original array.
  3. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time and use O(1) extra space, and may modify the original array. 
  4. Given an array of N elements in which each element is an integer between 1 and N-1, write an algorithm to find any duplicate. Your algorithm should run in linear time, use O(1) extra space, and may not modify the original array. 
More info: http://aperiodic.net/phil/archives/Geekery/find-duplicate-elements.html

Solutions:

let s ← array [1..n]
initialize s to all zeroes
for 1 <= i <= n:
  if s[A[i]] > 0: return A[i]
  set s[A[i] ← 1

let s ← 0
for 1 <= i <= n:
  s ← s + A[i]
return s - n*(n-1)/2
Above solution for question #2 is not the best since there may be overflow of integer.  We can do better by:
temp = 0;
for (i = 0; i < n; i++) 
    temp = temp ^ A[i] ^ i;
return temp;

for 1 <= i <= n:
  while A[i] ≠ i:
    if A[A[i]] = A[i]: return A[i]
    swap(A[A[i]], A[i])

let i ← n, j ← n
do: i ← A[i], j ← A[A[j]]; until i = j
set j ← n
do: i ← A[i], j ← A[j]; until i = j
return i

Wednesday, October 15, 2014

Add Kerberos to Hadoop

Environment: CentOS 6.5; HW 1.3;

Install Kerberos:

yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation

Intialize KDC:

kdb5_util create -s

Edit the Access Control List (/var/kerberos/krb5kdc/kadm5.acl in RHEL or CentOS and /var/lib/kerberos/krb5kdc/kadm5.acl in SLES ) to define the principals that have admin (modifying) access to the database. A simple example would be a single entry:

*/admin@EXAMPLE.COM *

Create first user principal (through root user only):

[root@centos johnz]# kadmin.local -q "addprinc adminuser/admin"
Authenticating as principal root/admin@EXAMPLE.COM with password.
WARNING: no policy specified for adminuser/admin@EXAMPLE.COM; defaulting to no policy
Enter password for principal "adminuser/admin@EXAMPLE.COM":
Re-enter password for principal "adminuser/admin@EXAMPLE.COM":
Principal "adminuser/admin@EXAMPLE.COM" created.

Start Kerberos:

[root@centos johnz]# service krb5kdc start
Starting Kerberos 5 KDC:                                   [  OK  ]
[root@centos johnz]# service kadmin start
Starting Kerberos 5 Admin Server:                          [  OK  ]

Create a directory to store keytabs:

[root@centos johnz]# mkdir -p /etc/security/keytabs/
[root@centos johnz]# chown root:hadoop /etc/security/keytabs/
[root@centos johnz]# chmod 750 /etc/security/keytabs/

Add service principles for Hadoop:

[root@centos johnz]# kadmin -p adminuser/admin
Authenticating as principal adminuser/admin with password.
Password for adminuser/admin@EXAMPLE.COM:
kadmin:  addprinc -randkey nn/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey dn/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey HTTP/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey jt/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey tt/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hbase/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey zookeeper/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hcat/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey oozie/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hdfs/centos.hw13@EXAMPLE.COM
kadmin:  addprinc -randkey hive/centos.hw13@EXAMPLE.COM

Export these principals to keytab files:

kadmin:  xst -k /etc/security/keytabs/spnego.service.keytab HTTP/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/nn.service.keytab nn/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/dn.service.keytab dn/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/jt.service.keytab jt/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/tt.service.keytab tt/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hive.service.keytab hive/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/oozie.service.keytab oozie/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hbase.service.keytab hbase/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/zk.service.keytab zookeeper/centos.hw13@EXAMPLE.COM
kadmin:  xst -k /etc/security/keytabs/hdfs.headless.keytab hdfs/centos.hw13@EXAMPLE.COM

Config Hadoop:

1. add following to core-site.xml:

  <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
  <property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>RULE:[2:$1@$0](jt@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0](tt@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0](nn@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](dn@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hbase@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](hbase@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](oozie@.*EXAMPLE.COM)s/.*/oozie/
DEFAULT</value>
  </property>

2. add following to hdfs-site.xml:

<property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>
<property>
        <name>dfs.namenode.kerberos.principal</name>
        <value>nn/centos.hw13@EXAMPLE.COM</value>
        <description> Kerberos principal name for the
        NameNode </description>
</property>
<property>
        <name>dfs.secondary.namenode.kerberos.principal</name>
        <value>nn/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the secondary NameNode.
        </description>
</property>
<property>
        <name>dfs.web.authentication.kerberos.principal</name>
        <value>HTTP/centos.hw13@EXAMPLE.COM</value>
        <description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification.
        </description>
</property>  
<property>
        <name>dfs.datanode.kerberos.principal</name>
        <value>dn/_HOST@EXAMPLE.COM</value>
        <description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name.
        </description>
</property>  

<property>
        <name>dfs.web.authentication.kerberos.keytab</name>
        <value>/etc/security/keytabs/spnego.service.keytab</value>
        <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP

endpoint.
        </description>
</property>  
<property>
        <name>dfs.namenode.keytab.file</name>
        <value>/etc/security/keytabs/nn.service.keytab</value>
        <description>Combined keytab file containing the NameNode service and host principals.
        </description>
</property>
<property>
        <name>dfs.secondary.namenode.keytab.file</name>
        <value>/etc/security/keytabs/nn.service.keytab</value>
        <description>Combined keytab file containing the NameNode service and host principals.
        </description>
</property>
<property>
        <name>dfs.datanode.keytab.file</name>
        <value>/etc/security/keytabs/dn.service.keytab</value>
        <description>The filename of the keytab file for the DataNode.
        </description>
</property>

3. add following to mapred-site.xml:

<property>
        <name>mapreduce.jobtracker.kerberos.principal</name>
        <value>jt/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the JobTracker   </description>
</property>
<property>
        <name>mapreduce.tasktracker.kerberos.principal</name>  
        <value>tt/centos.hw13@EXAMPLE.COM</value>
        <description>Kerberos principal name for the TaskTracker."_HOST" is replaced by the host name of the TaskTracker.
        </description>
</property>
<property>
        <name>mapreduce.jobtracker.keytab.file</name>
        <value>/etc/security/keytabs/jt.service.keytab</value>
        <description>The keytab for the JobTracker principal.
        </description>
</property>
<property>
        <name>mapreduce.tasktracker.keytab.file</name>
        <value>/etc/security/keytabs/tt.service.keytab</value>
        <description>The filename of the keytab for the TaskTracker</description>
</property>
<property>
        <name>mapreduce.jobhistory.kerberos.principal</name>
        <!--cluster variant -->
        <value>jt/centos.hw13@EXAMPLE.COM</value>
        <description> Kerberos principal name for JobHistory. This must map to the same user as the JobTracker user (mapred).
        </description>
</property>
<property>  
        <name>mapreduce.jobhistory.keytab.file</name>
        <!--cluster variant -->
        <value>/etc/security/keytabs/jt.service.keytab</value>
        <description>The keytab for the JobHistory principal.
        </description>
</property>  

4. Important: have to replace existing local_policy.jar and US_export_policy.jar files ($JAVA_HOME/jre/lib/security/ directory) from

following link:

 http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html

5. Very important: do following before start data node.

export HADOOP_SECURE_DN_USER=hdfs

6. Very important: check permission of keytab files.  They should be:

[root@centos ~]# ls -l /etc/security/keytabs/
total 40
-rw-------. 1 hdfs   hadoop 400 Oct 13 14:56 dn.service.keytab
-rw-------. 1 hdfs   hadoop 412 Oct 13 14:59 hdfs.headless.keytab
-rw-------. 1 mapred hadoop 400 Oct 13 14:56 jt.service.keytab
-rw-------. 1 hdfs   hadoop 400 Oct 13 14:56 nn.service.keytab
-rw-------. 1 mapred hadoop 400 Oct 13 14:56 tt.service.keytab

7. Important: have to start all services through root user.

[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start secondarynamenode
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start jobtracker
[root@centos security]# /usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start tasktracker

Access Hadoop:

[root@centos ~]# kinit -k -t /etc/security/keytabs/hdfs.headless.keytab -p hdfs/centos.hw13@EXAMPLE.COM
[root@centos ~]# hadoop fs -ls /user/hive
Found 1 items
drwxr-xr-x   - hdfs supergroup          0 2014-10-03 17:13 /user/hive/warehouse

Sunday, October 5, 2014

Insert a node to red-black tree: LLRB algorithm is doing the best job!

    private Node put(Node h, Key key, Value val) {
        if (h == null) return new Node(key, val, RED, 1);

        int cmp = key.compareTo(h.key);
        if      (cmp < 0) h.left  = put(h.left,  key, val);
        else if (cmp > 0) h.right = put(h.right, key, val);
        else              h.val   = val;

        // fix-up any right-leaning links
        if (isRed(h.right) && !isRed(h.left))      h = rotateLeft(h);
        if (isRed(h.left)  &&  isRed(h.left.left)) h = rotateRight(h);
        if (isRed(h.left)  &&  isRed(h.right))     flipColors(h);
        h.N = size(h.left) + size(h.right) + 1;

        return h;
    }