library(igraph)
source("http://michael.hahsler.net/SMU/ScientificCompR/code/map.R")
# 1->2; 1->3, 2->1, 3->4, 4->3
g <- graph.formula(1 -+ 2, 1 -+ 3, 2 -+ 1, 3 -+ 4, 4 -+ 3)
plot(g)
# Normal PageRank
page.rank(g, damping=0.70)$vector
# Topic-specific PageRank (assume node '1' and '2' are interesting topic)
page.rank(g, damping=0.70, personalized=c(1/2, 1/2, 0, 0))$vector
# Topic-specific PageRank with weight (node '1' weight 2 when node '2' weight 1)
page.rank(g, damping=0.70, personalized=c(2/3, 1/3, 0, 0))$vector
More info:
http://michael.hahsler.net/SMU/LearnROnYourOwn/code/igraph.html
http://stats.stackexchange.com/questions/175517/calculating-personalized-pagerank-in-r
Your project can be on or off. Your project's priority can be changed. Your job can be changed. But the technology is always heading north!
Wednesday, November 4, 2015
Tuesday, October 27, 2015
Good articles for shingle, minhash and LSH
http://matthewcasperson.blogspot.com/2013/11/minhash-for-dummies.html
http://infolab.stanford.edu/~ullman/mmds/bookL.pdf (Chapter 3, round page 83)
Doc --> Shingle --> hash --> minhash (may lose accuracy by using a much smaller signature matrix) --> LSH (not compute the similarity for every pair; possible false positive or false negative)
50,000 bytes --> 400,000 bytes --> 200,000 bytes --> 1,000 bytes (based on 250 signatures) --> additional b (band) x r (row) hash matrix
http://infolab.stanford.edu/~ullman/mmds/bookL.pdf (Chapter 3, round page 83)
Doc --> Shingle --> hash --> minhash (may lose accuracy by using a much smaller signature matrix) --> LSH (not compute the similarity for every pair; possible false positive or false negative)
50,000 bytes --> 400,000 bytes --> 200,000 bytes --> 1,000 bytes (based on 250 signatures) --> additional b (band) x r (row) hash matrix
Friday, September 25, 2015
How to override cluster jar files
If you need to use your own version of a jar file in your map reduce job, you can do so through following steps:
1. Bundle the jar file into your map reduce jar or pass the jar file through distributed cache.
2. From your map reduce client, update mapred-site.xml file to put following section into it:
mapreduce.job.user.classpath.first=true
(of course, you need to call conf.addResource() to add mapred-site.xml on your client code)
3. Restart your client.
1. Bundle the jar file into your map reduce jar or pass the jar file through distributed cache.
2. From your map reduce client, update mapred-site.xml file to put following section into it:
mapreduce.job.user.classpath.first=true
(of course, you need to call conf.addResource() to add mapred-site.xml on your client code)
3. Restart your client.
Saturday, July 25, 2015
Dynamically add jar files to classpath
/**
* Adds jar files to classpath.
* @param file the folder for jar files.
* @param classLoader the URLClassLoader
* @throws IOException
*/
public void addCustomJars(File file, URLClassLoader classLoader) {
File[] jarFiles = file.listFiles();
if (jarFiles != null) {
if (logger.isDebugEnabled()) {
URL[] urls = classLoader.getURLs();
for (URL url : urls) {
logger.debug("URL before custom jars are added:" + url.toString());
}
}
Class<?> sysclass = URLClassLoader.class;
Method method = null;
try {
method = sysclass.getDeclaredMethod("addURL",parameters);
} catch (NoSuchMethodException e1) {
logger.error("Unable to find addURL method", e1);
return;
} catch (SecurityException e1) {
logger.error("Unable to get addURL method", e1);
return;
}
method.setAccessible(true);
// add each jar file under such folder
for (File jarFile : jarFiles) {
if (jarFile.isFile() && jarFile.getName().endsWith("jar")) {
try {
method.invoke(classLoader,new Object[]{ jarFile.toURI().toURL() });
} catch (Exception e) {
logger.error("Failed to add classpath for " + jarFile.getName(), e);;
}
}
}
if (logger.isDebugEnabled()) {
URL[] urls = classLoader.getURLs();
for (URL url : urls) {
logger.debug("URL after custom jars are added:" + url.toString());
}
}
}
In your main:
URLClassLoader classLoader = (URLClassLoader)Thread.currentThread().getContextClassLoader();
addCustomJars(new File("path_to_jar_files"), classLoader);
* Adds jar files to classpath.
* @param file the folder for jar files.
* @param classLoader the URLClassLoader
* @throws IOException
*/
public void addCustomJars(File file, URLClassLoader classLoader) {
File[] jarFiles = file.listFiles();
if (jarFiles != null) {
if (logger.isDebugEnabled()) {
URL[] urls = classLoader.getURLs();
for (URL url : urls) {
logger.debug("URL before custom jars are added:" + url.toString());
}
}
Class<?> sysclass = URLClassLoader.class;
Method method = null;
try {
method = sysclass.getDeclaredMethod("addURL",parameters);
} catch (NoSuchMethodException e1) {
logger.error("Unable to find addURL method", e1);
return;
} catch (SecurityException e1) {
logger.error("Unable to get addURL method", e1);
return;
}
method.setAccessible(true);
// add each jar file under such folder
for (File jarFile : jarFiles) {
if (jarFile.isFile() && jarFile.getName().endsWith("jar")) {
try {
method.invoke(classLoader,new Object[]{ jarFile.toURI().toURL() });
} catch (Exception e) {
logger.error("Failed to add classpath for " + jarFile.getName(), e);;
}
}
}
if (logger.isDebugEnabled()) {
URL[] urls = classLoader.getURLs();
for (URL url : urls) {
logger.debug("URL after custom jars are added:" + url.toString());
}
}
}
In your main:
URLClassLoader classLoader = (URLClassLoader)Thread.currentThread().getContextClassLoader();
addCustomJars(new File("path_to_jar_files"), classLoader);
Wednesday, June 10, 2015
Skip Hadoop mapper when condition is satisfied
If conf is an instance of JobConf:
conf.setMapRunnerClass(MyMapRunner.class);
conf.setMapRunnerClass(MyMapRunner.class);
A sample of your MyMapRunner,
public class FindAndExitMapRunner<K1, V1, K2, V2> extends MapRunner<K1, V1, K2, V2> {
private Mapper<K1, V1, K2, V2> mapper;
private boolean incrProcCount;
public static final String FIND_INDICATE = "Exit after finding at least one data";
public static final String REACH_MAX = "Exit after reaching max lines";
@SuppressWarnings("unchecked")
public void configure(JobConf job) {
this.mapper = ReflectionUtils.newInstance(job.getMapperClass(), job);
//increment processed counter only if skipping feature is enabled
this.incrProcCount = SkipBadRecords.getMapperMaxSkipRecords(job)>0 &&
SkipBadRecords.getAutoIncrMapperProcCount(job);
}
@Override
public void run(RecordReader<K1, V1> input, OutputCollector<K2, V2> output,
Reporter reporter) throws IOException {
try {
// allocate key & value instances that are re-used for all entries
K1 key = input.createKey();
V1 value = input.createValue();
while (input.next(key, value)) {
// map pair to output
mapper.map(key, value, output, reporter);
if(incrProcCount) {
reporter.incrCounter(SkipBadRecords.COUNTER_GROUP,
SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS, 1);
}
}
} catch (IOException e) {
// break if exception is thrown
if (!FIND_INDICATE.equalsIgnoreCase(e.getMessage()) &&
!REACH_MAX.equalsIgnoreCase(e.getMessage())) {
// re-throw except if it is not what we expect.
throw e;
}
} finally {
mapper.close();
}
}
protected Mapper<K1, V1, K2, V2> getMapper() {
return mapper;
}
}
Then, you throw exception in your mapper when your condition is satisfied:
throw new Exception(MyMapRunner.FIND_INDICATE);
Tuesday, February 3, 2015
Awesome solutions for some super good problems in Cryptograph
http://webstersprodigy.net/tag/aes/
Just some quotations from above website:
One of the biggest reasons I think the class was so good was its focus on offense. I don’t really understand how defensive security people can try to defend stuff without understanding offense… yet the crypto classes I’d taken before tried to do exactly that. How was I supposed to understand why things needed to be done a certain way if I don’t know how it can break? Crypto books have been the same way – every crypto book I’ve read before (e.g. Bruce Schneier books) don’t seem to give much page space to offense. Dan brings the attacker’s perspective into every lecture, and I have a much better understanding of practical cryptography because of it.
Just some quotations from above website:
One of the biggest reasons I think the class was so good was its focus on offense. I don’t really understand how defensive security people can try to defend stuff without understanding offense… yet the crypto classes I’d taken before tried to do exactly that. How was I supposed to understand why things needed to be done a certain way if I don’t know how it can break? Crypto books have been the same way – every crypto book I’ve read before (e.g. Bruce Schneier books) don’t seem to give much page space to offense. Dan brings the attacker’s perspective into every lecture, and I have a much better understanding of practical cryptography because of it.
Week 4 – CBC with IV
Problem:
An attacker intercepts the following ciphertext (hex encoded):
1
| 20814804c1767293b99f1d9cab3bc3e7 ac1e37bfb15599e5f40eef805488281d |
He knows that the plaintext is the ASCII encoding of the message “Pay Bob 100$” (excluding the quotes). He also knows that the cipher used is CBC encryption with a random IV using AES as the underlying block cipher. Show that the attacker can change the ciphertext so that it will decrypt to “Pay Bob 500$”. What is the resulting ciphertext (hex encoded)? This shows that CBC provides no integrity.
Solution:
This is insecure because the first message block is xored with the random IV
20814804c1767293b99f1d9cab3bc3e7 ac1e37bfb15599e5f40eef805488281d
P a y B o b 1 0 0 $
P a y B o b 1 0 0 $
9th char
0xb9 decrypts to 1
0xb9 xor ascii (1 xor 5)
0xb9 xor 0x31 xor 0x35
= 0xbd
0xb9 decrypts to 1
0xb9 xor ascii (1 xor 5)
0xb9 xor 0x31 xor 0x35
= 0xbd
20814804c1767293bd9f1d9cab3bc3e7 ac1e37bfb15599e5f40eef805488281d
Monday, December 15, 2014
Simple Java Book
http://www.programcreek.com/wp-content/uploads/2013/01/SimpleJava1.pdf
Subscribe to:
Posts (Atom)