## [Algorithm 101] Unbounded Knapsack Problem

We started this problem with Project Euler #31 Coin Sums:

In England the currency is made up of pound, £, and pence, p, and there are eight coins in general circulation:

1p, 2p, 5p, 10p, 20p, 50p, £1 (100p) and £2 (200p).

It is possible to make £2 in the following way:

1×£1 + 1×50p + 2×20p + 1×5p + 1×2p + 3×1p

How many different ways can £2 be made using any number of coins?

Unlike the 0-1 knapsack problem we discussed before, this time we get unlimited coins of each to sum up a certain number. Can we transform this problem into 0-1 knapsack problem?

The first thought may come up with: considering we can pick the ith item at most for $\left \lfloor W/value[i] \right \rfloor$ times, this problem is equivalent to 0-1 knapsack problem with $\left \lfloor W/value[i] \right \rfloor$ items which are the same with the original ith item to choose, and the transform formulation of f[w] will be:

$$f[w] = max\left \{ f[w], f[w - weight[i]] + value[i] \right \}$$

However, we don’t need go that far. Remember the solution of 0-1 knapsack problem? We used a reversed loop to keep the new-generated records won’t affect the calculated results(the new-generated ones are always larger), for each item we can pick once or not. But here we need to re-calculate the new-generated results, so we can just change the second reversed loop to a straight-forward loop, and here is our code for the problem above:

public class Solution {
public int coinSums(int N) {
int[] table = new int[N + 1], coins = new int[]{1, 2, 5, 10, 20, 50, 100, 200};
table[0] = 1;
for (int coin : coins) {
for (int i = coin; i < table.length; i++) {
table[i] += table[i - coin];
}
}
return table[N];
}

public static void main(String[] args) {
Solution solution = new Solution();
System.out.println(solution.coinSums(5));
System.out.println(solution.coinSums(10));
System.out.println(solution.coinSums(20));
System.out.println(solution.coinSums(200));
}
}


## [Algorithm 101] 0-1 Knapsack Problem

We start this problem with leetcode #494: Target Sum:

You are given a list of non-negative integers, a1, a2, …, an, and a target, S. Now you have 2 symbols + and -. For each integer, you should choose one from + and - as its new symbol.

Find out how many ways to assign symbols to make sum of integers equal to target S.

Assume positive subset P and negative subset N are of one possible way, hence P and N meet the requirement:

$$sum\left ( P \right ) - sum\left ( N \right ) = S$$

As the problem implies, the union of subset P and subset N is nums, and the intersection of subset P and subset N is empty(for each element of nums is either positive or negative), so:

$$sum\left ( P \right ) + sum\left ( N \right ) = sum\left ( nums \right )$$

So the problem is equivalent to: find subset P from set nums that $sum(P) = (S + sum(nums)) / 2$, which is a variation of 0-1 knapsack problem.

## 0-1 Knapsack Problem

Given two integer arrays value[0, …, i – 1] and weight[0, …, i – 1] which represent values and weights associated with i items respectively, and a knapsack of capacity W, find the most valuable subset of the items that fit into the knapsack. We assume here that all the weights and the knapsack’s capacity are positive integers; the items values do not have to be integers.

To design a dynamic programming algorithm, we need to derive a recurrence relation that express a solution to a instance of the knapsack problem in terms of solutions to its smaller subinstances. Let f[i][w] to be the value of an optimal solution to this instance, i.e., the most valuable subset of the first ith items that fit into the knapsack of capacity w.

First, we need to consider some corner cases:

• f[0][w] = 0(0 <= w <= W ), for i = 0 means when we don’t choose any item, the max value must be 0.
• f[i][0] = 0(i >= 0), for w = 0 means no capacity is available, the max value must be 0.

For f[i][w](i >= 1), we have two choices:

• among the subsets that do not include the ith item, the value of an optimal subset is f[i – 1][w];
• among the subsets that do include the ith item, an optimal subset is made up of this item and an optimal subset of the first i – 1 items that fit into the knapsack of capacity w – weight[i], the value of such an optimal subset is f[i – 1][w – weight[i]] + value[i](w >= weight[i]).

Thus f[i][w] = max{f[i – 1][w] , f[i – 1][w – weight[i]] + value[i] }, the overall formulation is:

$$f\left [ i \right ]\left [ w \right ] = \left\{\begin{matrix}\\0 & if\ i = 0\ or\ w = 0\\f\left [ i - 1 \right ]\left [ w \right ] & weight\left [ i \right ] > w\\max \left \{ f\left [ i - 1 \right ]\left [ w \right ], f\left [ i - 1 \right ]\left [ w - weight\left [ i \right ] \right ] + value\left [ i \right ] \right \} & weight\left [ i \right ] \leq w\\\end{matrix}\right.$$

And here is our algorithm:

public class Knapsack {
static int knapsack(int W, int weight[], int value[]) {
int f[][] = new int[weight.length + 1][W + 1];

for (int i = 1; i <= weight.length; i++) {
for (int w = 1; w <= W; w++) {
if (weight[i - 1] <= w) {
f[i][w] = Math.max(f[i - 1][w], f[i - 1][w - weight[i - 1]] + value[i - 1]);
} else {
f[i][w] = f[i - 1][w];
}
}
}

return f[weight.length][W];
}

public static void main(String[] args) {
System.out.println(knapsack(5, new int[]{2, 3, 4, 5}, new int[]{3, 4, 5, 6}));
}
}


## Reduce Space Complexity

The method’s space complexity and time complexity are both O(W * weight.length), is there any further optimization?

State f[i][w] is associated with two previous states only: f[i – 1][w] and f[i – 1][w – weight[i]], if f[i – 1][w] and f[i – 1][w – weight[i]] can be kept while calculating f[i][w], space complexity should be reduced to O(W), and this can be achieved by reversing the inner loop:

public class Knapsack {
static int knapsack(int W, int weight[], int value[]) {
int f[] = new int[W + 1];

for (int i = 1; i <= weight.length; i++) {
for (int w = W; w >= 1; w--) {
if (weight[i - 1] <= w) {
f[w] = Math.max(f[w], f[w - weight[i - 1]] + value[i - 1]);
}
}
}

return f[W];
}

public static void main(String[] args) {
System.out.println(knapsack(5, new int[]{2, 3, 4, 5}, new int[]{3, 4, 5, 6}));
}
}


And of course, the if statement can be removed by updating the inner loop condition:

public class Knapsack {
static int knapsack(int W, int weight[], int value[]) {
int f[] = new int[W + 1];
for (int i = 1; i <= weight.length; i++) {
for (int w = W; w >= weight[i - 1]; w--) {
f[w] = Math.max(f[w], f[w - weight[i - 1]] + value[i - 1]);
}
}
return f[W];
}

public static void main(String[] args) {
System.out.println(knapsack(5, new int[]{2, 3, 4, 5}, new int[]{3, 4, 5, 6}));
}
}


## A Minor Improvement On Time Complexity

f[W] is what we care, which is affected by state f[W – weight[i – 1]], and f[W – weight[i – 1]] is affected by state f[W – weight[i – 1] – weight[i – 2]], etc. For the kth item, the inner loop end condition can be subjected to max{W – sum{weight[ki]}, weight[i-1]}:

public class Knapsack {
static int knapsack(int W, int weight[], int value[]) {
int f[] = new int[W + 1], sum = 0;
for (int w : weight) sum += w;

for (int i = 1; i <= weight.length; i++) {
int bound = Math.max(W - sum, weight[i - 1]);
for (int w = W; w >= bound; w--) {
f[w] = Math.max(f[w], f[w - weight[i - 1]] + value[i - 1]);
}
sum -= weight[i - 1];
}

return f[W];
}

public static void main(String[] args) {
System.out.println(knapsack(5, new int[]{2, 3, 4, 5}, new int[]{3, 4, 5, 6}));
}
}


## Back To Problem Target Sum

Let dp[n] to be count of the possible ways to add up to n, here is the DP solution for the original problem:

public class Solution {
public int findTargetSumWays(int[] nums, int S) {
int sum = 0;
for (int num : nums) sum += num;
if (sum < S || -sum > S || (sum + S) % 2 == 1) return 0;

int target = (sum + S) >>> 1, dp[] = new int[target + 1];
dp[0] = 1;
for (int num : nums) {
for (int i = target; i >= num; i--) {
dp[i] += dp[i - num];
}
}
return dp[target];
}
}


## Higher-order functions in Javascript

In Javascript, functions are just like the other variables(AKA first-class citizen), and here are several interesting tricks implemented by treating functions as first-class citizen.

• ## currying

Currying is for partial evaluation, a practical example is Function.prototype.bind() in Javascript:

var add = function(a, b) {
return a + b;
};



And we can implement bind()(under this circumstance only of course) by ourselves:

var bind = function(f, n) {
return function() {
return f.apply(null, [n].concat([...arguments]));
};
};

var add = function(a, b) {
return a + b;
};


• ## trampolining

Javascript(ES5) does not implement tail call optimization, so using recursion incorrectly(such as calculating Factorial) will lead to a stack-overflow exception. However, this can be avoided by returning a higher-order function, which is called trampolining.

First, we can easily get written factorial() done:

function factorial(n){
return n == 1 ? n : n * factorial(n-1);
}


But this is not a tail-call recursion, let’s update this method by adding another parameter:

function tail_factorial(n, result) {
return n == 1 ? result : tail_factorial(n - 1, n * result);
}


Now, we get a tail-call recursion function which ES5 can not optimize, and we are still facing the stack-overflow problem. So let’s look at our trampoline function:

function trampoline(fn){
return function(){
var result = fn.apply(this, arguments);
while(result instanceof Function){
result = result();
}
return result;
}
}

function tail_factorial(n, result) {
return n == 1 ? result : function() {
return tail_factorial(n - 1, n * result)
};
}

trampoline(tail_factorial)(7, 1)  // 5040


However, using trampoline here is just showing concepts of higher-order function and not recommended, for in programming languages like PHP and Javascript it may cause a great performance loss.

## [Algorithm 101] Bit Manipulation: Count Bits of Integer (Hamming weight)

Function int bitCount(int n) returns how many “1”s does the number n have expressed in the binary form. For example, 0b101 is the binary form of number 5, so bitCount(5) returns 2.

• ### the naive way

Check if each bit equals 1 and return the sum:

int bitCount(int n) {
int count = 0;

for (int i = 0; i < 32; i++) {
count += n & 1;
n >>= 1;
}

return count;
}

• ### the optimized naive way

Consider the number 0b1. After the first right shift, 31 “0”s’ left and the function can return count immediately: there is no need adding to count when n becomes “0” after one certain right shift. The optimized code is as follows:

int bitCount(int n) {
int count = 0;
while (n != 0) {
count += n & 1;
n >>= 1;
}
return count;
}

• ### Brian Kernighan’s way

n & (n – 1) will clear the last significant bit set, e.g. n = 112

     n     | 1 1 1 0 0 0 0
n - 1   | 1 1 0 1 1 1 1
n &= n - 1 | 1 1 0 0 0 0 0
---------------------------
n     | 1 1 0 0 0 0 0
n - 1   | 1 0 1 1 1 1 1
n &= n - 1 | 1 0 0 0 0 0 0
---------------------------
n     | 1 0 0 0 0 0 0
n - 1   | 0 1 1 1 1 1 1
n &= n - 1 | 0 0 0 0 0 0 0

Hence loop will go through as many iterations as there are set bits, and when n becomes zero, count is exactly the answer.

int bitCount(int n) {
int count = 0;
while (n != 0) {
n &= n - 1;
count++;
}
return count;
}

• ### Lookup Table

int bitCount(int n) {
int[] lookupTable = new int[256];
for (int i = 0; i < 256; i++) {
lookupTable[i] = lookupTable[i >> 1] + (i & 1);
}

return lookupTable[n & 0xFF]
+ lookupTable[(n >> 8) & 0xFF]
+ lookupTable[(n >> 16) & 0xFF]
+ lookupTable[n >> 24];
}


Lookup table stores count for each number, and lookupTable[i] can be calculated by lookupTable[i >> 1] + (i & 1):

lookupTable[i >> 1] | 1 1 1 0 0 0 1
......          |     ......
lookupTable[i]      | 1 1 1 0 0 0 1 0
lookupTable[i + 1]  | 1 1 1 0 0 0 1 1

However, Integer.MAX_VALUE is too large for an array, and we cannot afford the O(n) space complexity.

A divide and conquer way would be better:

1. Construct lookupTable with size 256(8-bit);
2. Calculate the lower 8 bits of n, find calculated result in lookupTable and add to count;
3. right shift n with 8 bits, and continue (2) until n equals zero.
• ### Parallel

int bitCount(int n) {
int bits[] = {1, 2, 4, 8, 16};
0b01010101010101010101010101010101,  // 0x55555555
0b00110011001100110011001100110011,  // 0x33333333
0b00001111000011110000111100001111,  // 0x0F0F0F0F
0b00000000111111110000000011111111,  // 0x00FF00FF
0b00000000000000001111111111111111,  // 0x0000FFFF
};

int count = n - ((n >> 1) & masks[0]);
count = ((count >> bits[2]) + count) & masks[2];
count = ((count >> bits[3]) + count) & masks[3];
count = ((count >> bits[4]) + count) & masks[4];
return count;
}


Shift-and-add is another divide and conquer way for this problem, the algorithm can be explained by this gragh:

1 | 0 | 1 | 1 | 1 | 0 | 0 | 1
\+/     \+/     \+/     \+/
1   |   2   |   1   |   1
\  +  /         \  +  /
3       |       2
\      +      /
5

And this gragh is for the same number(8-bit) in binary form:

        n          | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1
n >> 1        | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0
mask[0]       | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1
n & mask[0]     | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1
(n >> 1) & mask[0] | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0
add         | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1
--------------------------------------------------
n          | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1
n >> 2        | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1
mask[1]       | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1
n & mask[1]     | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1
(n >> 2) & mask[1] | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1
add         | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0
--------------------------------------------------
n          | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0
n >> 4        | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1
mask[2]       | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1
n & mask[2]     | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0
(n >> 4) & mask[2] | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1
add         | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 (5)
--------------------------------------------------

For a 32-bit number, we have to do this 5 times (log232):

1. right shift 1 bit, & 01010101 01010101 01010101 01010101 (0x55555555 in HEX);
2. right shift 2 bits, & 00110011 00110011 00110011 00110011 (0x33333333 in HEX);
3. right shift 4 bits, & 00001111 00001111 00001111 00001111 (0x0F0F0F0F in HEX);
4. right shift 8 bits, & 00000000 11111111 00000000 11111111 (0x00FF00FF in HEX);
5. right shift 16 bits, & 00000000 00000000 11111111 11111111 (0x0000FFFF in HEX);

And this algorithm can be optimized as follows(AKA int Integer.bitCount() which implemented by JDK, as the title picture shows up):

int bitCount(int n) {
n = n - ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
return ((n + (n >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}

• ### MIT HAKMEM

Consider a 3-bit binary number “abc”( a, b, c ∈ {0, 1} and value of “abc” equals a * 4 + b * 2 + c):

      abc      | a * 4 + b * 2 + c
ab(abc >> 1)  | a * 2 + b
a(abc >> 2)  | a

Hence abc - (abc >> 1) - (abc >> 2) == a + b + c, which is count of 3-bit binary number “abc”. For a 32-bit number, we need one more step: mask the high bits.

        n        | 01 | 010 | 111 | 000 | 111 | 000 | 100 | 000 | 001 | 010 | 111
n >> 1      | 00 | 101 | 011 | 100 | 011 | 100 | 010 | 000 | 000 | 101 | 011
mask'      | 11 | 011 | 011 | 011 | 011 | 011 | 011 | 011 | 011 | 011 | 011
(n >> 1) & mask' | 00 | 001 | 011 | 000 | 011 | 000 | 010 | 000 | 000 | 001 | 011
---------------------------------------------------------------------------------
n >> 2      | 00 | 010 | 101 | 110 | 001 | 110 | 001 | 000 | 000 | 010 | 101
mask"      | 01 | 001 | 001 | 001 | 001 | 001 | 001 | 001 | 001 | 001 | 001
(n >> 2) & mask" | 00 | 000 | 001 | 000 | 001 | 000 | 001 | 000 | 000 | 000 | 001
---------------------------------------------------------------------------------
tmp        | 01 | 001 | 011 | 000 | 011 | 000 | 001 | 000 | 001 | 001 | 011

Next, like Parallel algorithm, we will calculate every 6-bit count:

       tmp      | 01 | 001 | 011 | 000 | 011 | 000 | 001 | 000 | 001 | 001 | 011
tmp >> 3    | 00 | 001 | 001 | 011 | 000 | 011 | 000 | 001 | 000 | 001 | 001
tmp + tmp >> 3 | 01 | 010 | 100 | 011 | 011 | 011 | 001 | 001 | 001 | 010 | 100
mask     | 11 | 000 | 111 | 000 | 111 | 000 | 111 | 000 | 111 | 000 | 111
--------------------------------------------------------------------------------
result     | 01(a) | 000100(b) | 000011(c) | 000001(d) | 000001(e) | 000100(f)

But we don’t need calculate every-12bit of result anymore, result % 63 is exactly the answer. Here is proof:

\begin{aligned} result &= a\cdot 64^{5} + b\cdot 64^{4} + c\cdot 64^{3} + d\cdot 64^{2} + e\cdot 64 + f \\ \\ result \% 63 &= \left ( a\cdot 64^{5} + b\cdot 64^{4} + c\cdot 64^{3} + d\cdot 64^{2} + e\cdot 64 + f \right ) \%63 \\ &= \left ( a\cdot 64^{5} \% 63 + b\cdot 64^{4} \% 63 + c\cdot 64^{3} \% 63 + d\cdot 64^{2} \% 63 + e\cdot 64 \% 63 + f \% 63 \right ) \% 63 \\ &= \left ( a \% 63 + b \% 63 + c \% 63 + d \% 63 + e \% 63 + f \% 63 \right ) \% 63 \\ &= \left ( a + b + c + d + e + f \right ) \% 63 \end{aligned}

∵ a + b + c + d + e + f ≤ 32
∴ (a + b + c + d + e + f) % 63 = a + b + c + d + e + f
∴ count = result % 63 = a + b + c + d + e + f


Here is our code:

int bitCount(int n) {
int tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}


## Solutions of Longest Palindromic Substring

• #### suffix tree

public class Solution {
public String longestPalindrome(String s) {
if (s.length() <= 1) return s;
String result = "";
Node root = new Node();

// original string
for (int i = 0; i < s.length(); i++) {
String subString = s.substring(i);
Node node = root;
for(int j = 0; j < subString.length(); j++) {
if (node.nodeList.get(subString.charAt(j)) == null) {
Node leaf = new Node();
leaf.letter = subString.charAt(j);
leaf.isOriginal = 1;
node.nodeList.put(subString.charAt(j), leaf);
}
else {
node.nodeList.get(subString.charAt(j)).isOriginal = 1;
}

node = node.nodeList.get(subString.charAt(j));
}
}

// reversed string
String rs = new StringBuilder(s).reverse().toString();
for (int i = 0; i < rs.length(); i++) {
String subReversedString = rs.substring(i);

Node node = root;
String tmpString = "";

for(int j = 0; j < subReversedString.length(); j++) {
if (node.nodeList.get(subReversedString.charAt(j)) == null) {
Node leaf = new Node();
leaf.letter = subReversedString.charAt(j);
leaf.isOriginal = 0;
node.nodeList.put(subReversedString.charAt(j), leaf);
}
else {
// target node exists, and which is generated by the original string.
if (node.nodeList.get(subReversedString.charAt(j)).isOriginal == 1) {
tmpString += subReversedString.charAt(j);
}
}
node = node.nodeList.get(subReversedString.charAt(j));
}

if (tmpString.length() > result.length()) {
result = tmpString;
}
}

return result;
}
}

class Node {
char letter;
int isOriginal;
Map<Character, Node> nodeList = new HashMap<>();
}

• #### normal

public class Solution {
public String longestPalindrome(String s) {
int start = 0, end = 0;
for (int i = 0; i < s.length(); i++) {
int len1 = expandAroundCenter(s, i, i);
int len2 = expandAroundCenter(s, i, i + 1);
int len = Math.max(len1, len2);
if (len > end - start) {
start = i - (len - 1) / 2;
end = i + len / 2;
}
}
return s.substring(start, end + 1);
}

private int expandAroundCenter(String s, int left, int right) {
int L = left, R = right;
while (L >= 0 && R < s.length() && s.charAt(L) == s.charAt(R)) {
L--; R++;
}
return R - L - 1;
}
}

• #### manacher’s algorithm

public class Solution {
public String longestPalindrome(String s) {
if (s.length() <= 2) return s;

String separator = "@", starter = "^";
StringBuilder stringBuilder = new StringBuilder(starter).append(separator);
for (int i = 0; i < s.length(); i++) {
stringBuilder.append(s.charAt(i)).append(separator);
}

String formattedString = stringBuilder.toString();
int[] result = new int[formattedString.length()];
// id stands for **the center of furthest palindrome string**
// mx stands for **the right boundary of furthest palindrome string**
int mx = 0, id = 0;

for (int i = 1; i < formattedString.length(); i++) {
if (mx > i) {
result[i] = result[2*id - i] > mx - i ? mx - i : result[2*id - i];
}
else {
result[i] = 1;
}

while ((i + result[i]) < formattedString.length() && (i - result[i]) >= 0
&& formattedString.charAt(i + result[i]) == formattedString.charAt(i - result[i])) {
result[i]++;
}

if (i + result[i] > mx) {
mx = i + result[i];
id = i;
}
}

int maxIndex = 0, maxValue = 0;
for (int k = 0; k < formattedString.length(); k++) {
if (result[k] > maxValue) {
maxIndex = k;
maxValue = result[k];
}
}

return formattedString
.substring(maxIndex - result[maxIndex] + 1, maxIndex + result[maxIndex] - 1)
.replace(separator, "");
}
}

• #### DP

public class Solution {
public String longestPalindrome(String s) {
if (s.length() <= 2) return s;

int start = 0, end = 0;
// matrix[x][y] stands for if s.substring(x, y + 1) is a palindrome string
boolean matrix[][] = new boolean[s.length()][s.length()];

for (int y = 1; y < s.length(); y++) {
int x = 0;
matrix[y][y] = true;
while (x < y) {
if (s.charAt(x) == s.charAt(y)) {
if ((y - x == 1) || matrix[x + 1][y - 1]) {
// is palindrome
matrix[x][y] = true;
// compare
if (y - x + 1 > end - start + 1) {
start = x;
end = y;
}
}
}
x++;
}
}

return s.substring(start, end + 1);
}
}


Reference:

leetcode articles

(Mandarin Chinese) Manacher’s ALGORITHM: O(n)时间求字符串的最长回文子串

## online judge sandbox 设计思路（2）

#### seccomp()

seccomp 是 Linux 内核提供的一种应用程序沙箱机制，seccomp 通过只允许应用程序调用 exit(), sigreturn(), read() 和 write() 四种系统调用来达到沙箱的效果。如果应用程序调用了除了这四种之外的系统调用， kernel 会向进程发送 SIGKILL 信号。

seccomp 很难在实际中得到推广，因为限制实在是太多了，Linus 本人也对它的应用持怀疑的态度，直到出现了 seccomp-bpf。seccomp-bpf 是 seccomp 的一个扩展，它可以通过配置来允许应用程序调用其他的系统调用。chrome 中第一个应用 seccomp-bpf 的场景是把 Flash 放到了沙箱里运行（实在是不放心），后续也把 render 的过程放到了沙箱里。

[root@localhost ~]# yum install -y libseccomp libseccomp-devel
[root@localhost ~]# cat a.c
#include <stdio.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <unistd.h>

int main() {
printf("starting...\n");
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
printf("i just removed /etc/hosts!\n");
return 0;
}
[root@localhost ~]# gcc a.c -lseccomp -o a.out
[root@localhost ~]# ll /etc/hosts
-rw-r--r--. 1 root root 158 May  7 23:04 /etc/hosts
[root@localhost ~]# ./a.out
starting...
Killed
[root@localhost ~]# echo $? 137 [root@localhost ~]# ll /etc/hosts -rw-r--r--. 1 root root 158 May 7 23:04 /etc/hosts  删除失败了。 下面，我们写一个程序 b.out，让 seccomp 允许调用 unlink() ，但是不允许调用 fork()： #include <stdio.h> #include <unistd.h> #include <seccomp.h> int main() { printf("starting...\n"); scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_KILL); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigreturn), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0); // 自定义白名单 seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(unlink), 0); seccomp_load(ctx); printf("we can call unlink()\n"); unlink("/etc/hosts"); printf("let's fork()\n"); fork(); printf("U cannot see me\n"); return 0; }  编译运行下： [root@localhost ~]# ll /etc/hosts -rw-r--r--. 1 root root 158 May 7 23:30 /etc/hosts [root@localhost ~]# ./b.out starting... we can call unlink() let's fork() Bad system call [root@localhost ~]# ll /etc/hosts ls: cannot access /etc/hosts: No such file or directory  我们的目的达到了。 #### seccomp-nurse seccomp-nurse 是一个基于 seccomp 的 sandbox 框架，主页是：http://chdir.org/~nico/seccomp-nurse/，这里只用一张主页上的图来描述他的架构 seccomp-nurse #### Docker 大法好 Docker 在 1.10 之后引入了 seccomp 选项，可以借助一个 json 配置文件来限定运行程序可以使用的系统调用。详情见：https://github.com/docker/docker/blob/master/docs/security/seccomp.md ## online judge sandbox 设计思路（1） 实现 OJ 的 sandbox 主要有两种思路：ptrace 和 seccomp，下面我们分别讨论。 ### ptrace() ptrace 是类 Unix 系统上一个可以观察、控制其他进程内部状态的工具。它的用途有很多，例如程序调试（gdb、dbx等）、代码覆盖率检测、甚至可以用来做运行时补丁（想没想起来IDA和OllyDbg ）。今天我们讨论的是如何通过对系统调用的限制来达到在沙箱(sandbox)中运行程序的效果。 我们先写一个程序，用来删除 /etc/hosts 文件： [root@localhost ~]# cat a.c #include <stdlib.h> int main() { unlink("/etc/hosts"); return 0; } [root@localhost ~]# gcc a.c [root@localhost ~]# ll -al /etc/hosts -rw-r--r--. 1 root root 158 Jun 7 2013 /etc/hosts [root@localhost ~]# ./a.out [root@localhost ~]# ll -al /etc/hosts ls: cannot access /etc/hosts: No such file or directory  a.out 完成了它的使命，成功的删除了 hosts 文件。 现在，我们需要设计一个程序：通过 ptrace() 来限制 unlink() 删除文件系统中的文件。 首先，我们来看下手册上是怎么介绍 ptrace() 的： #include <sys/ptrace.h> long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);  其中，request 参数决定了 ptrace() 的行为，我们选几个用得到的选项介绍： • PTRACE_TRACEME: 表明子进程(tracee)正在被父进程(tracer)所控制。注意：只有本选项适用于子进程（tracee）； • PTRACE_PEEKUSER: 可以从子进程中获取通用寄存器中的值，寄存器(orig_rax /orig_eax，取决于平台 )中保存的是最近一次的syscall number； • PTRACE_SYSCALL: 可以将子进程从停止的状态唤醒并继续； 这样，我们就有思路了：每次程序调用系统调用，我们都去看看本次系统调用是不是 unlink()，如果是，杀掉子进程并退出即可： #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/user.h> #include <syscall.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #if __WORDSIZE == 64 #define REG(reg) reg.orig_rax #else #define REG(reg) reg.orig_eax #endif int main(int argc, char* argv[]) { pid_t child; child = fork(); if(child == 0) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); execl("/root/a.out", "a.out", NULL); } else { int status; while(waitpid(child, &status, 0) && ! WIFEXITED(status)) { struct user_regs_struct regs; ptrace(PTRACE_GETREGS, child, NULL, &regs); if (REG(regs) == 87) { fprintf(stderr, "error: call syscall unlink() is not allowed!\n"); kill(child, SIGKILL); return 0; } fprintf(stdout, "Process executed systemcallID: [%ld]\n", REG(regs)); ptrace(PTRACE_SYSCALL, child, NULL, NULL); } } exit(0); }  当系统调用发生时，内核会把当前的 %eax/%rax 中的内容（即系统调用的编号）保存到子进程的用户态代码段中，我们可以传入 PTRACE_PEEKUSER 调用 ptrace() 来读取这个的值，然后通过传入 PTRACE_SYSCALL 调用 ptrace() 让子进程重新恢复运行。一旦发现子进程调用了 87 号系统调用(sys_unlink)就会终止操作，并退出。 我们编译运行下： [root@localhost ~]# ll /etc/hosts -rw-r--r--. 1 root root 158 Apr 27 03:05 /etc/hosts [root@localhost ~]# ./pt Process executed systemcallID: 59 Process executed systemcallID: 12 Process executed systemcallID: 12 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 21 Process executed systemcallID: 21 Process executed systemcallID: 2 Process executed systemcallID: 2 Process executed systemcallID: 5 Process executed systemcallID: 5 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 3 Process executed systemcallID: 3 Process executed systemcallID: 2 Process executed systemcallID: 2 Process executed systemcallID: 0 Process executed systemcallID: 0 Process executed systemcallID: 5 Process executed systemcallID: 5 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 10 Process executed systemcallID: 10 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 3 Process executed systemcallID: 3 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 9 Process executed systemcallID: 158 Process executed systemcallID: 158 Process executed systemcallID: 10 Process executed systemcallID: 10 Process executed systemcallID: 10 Process executed systemcallID: 10 Process executed systemcallID: 10 Process executed systemcallID: 10 Process executed systemcallID: 11 Process executed systemcallID: 11 Process executed systemcallID: 87 error: call syscall unlink() is not allowed! [root@localhost ~]# ll /etc/hosts -rw-r--r--. 1 root root 158 Apr 27 03:05 /etc/hosts  可以看到： (1) 在删除文件这个操作过程中，发生了很多次系统调用； 实际上，strace() 也是基于这个原理，我们可以用 strace 跟踪下执行 /root/a.out 是什么结果： [root@localhost ~]# strace /root/a.out execve("/root/a.out", ["/root/a.out"], [/* 27 vars */]) = 0 brk(0) = 0x19c2000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8f962c7000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=29929, ...}) = 0 mmap(NULL, 29929, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8f962bf000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \34\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2107816, ...}) = 0 mmap(NULL, 3932736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8f95ce6000 mprotect(0x7f8f95e9c000, 2097152, PROT_NONE) = 0 mmap(0x7f8f9609c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7f8f9609c000 mmap(0x7f8f960a2000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8f960a2000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8f962be000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8f962bc000 arch_prctl(ARCH_SET_FS, 0x7f8f962bc740) = 0 mprotect(0x7f8f9609c000, 16384, PROT_READ) = 0 mprotect(0x600000, 4096, PROT_READ) = 0 mprotect(0x7f8f962c8000, 4096, PROT_READ) = 0 munmap(0x7f8f962bf000, 29929) = 0 unlink("/etc/hosts") = -1 ENOENT (No such file or directory) exit_group(0) = ? +++ exited with 0 +++  通过系统调用表对比着看，strace() 给出的分析和 ptrace() 给出的是完全相同的。 (2) hosts 文件还在，程序现在无法调用 unlink() 来删除文件了； 同样的思路：我们可以维护一个系统调用白名单，用户上传的代码调用白名单上的系统调用一律放行，否则直接报错 Runtime Error 即可。 以上是使用 ptrace() 来实现 judger 的一个思路。虽然可以达到我们的目的，但是 ptrace() 的缺点也是显而易见的： 1. 诡异的模型和侵入式的检测； 2. 每进行一次系统调用，ptrace() 就要在系统空间和用户空间进行两次切换，过于浪费资源。 针对以上两个缺点，我们下面会继续探讨 seccomp(secure computing module) 的实现。 ##### 参考： http://man7.org/linux/man-pages/man2/ptrace.2.html http://www.linuxjournal.com/article/6100 http://www.linuxjournal.com/article/6210 https://gist.github.com/willb/14488 ptrace() Tutorial ## 使用 docker 部署 PHP 环境 ### Dockerfile FROM centos:7.2.1511 MAINTAINER liuchao "thesedays@126.com" ENV PHP_VERSION 7.0.4 # add epel RUN yum install -y epel-release # update OS RUN yum update -y # install deps RUN yum install -y \ wget \ git \ ntpdate \ gcc gcc-c++ \ autoconf \ iproute \ libcurl libcurl-devel \ libxml2 libxml2-devel \ openssl openssl-devel \ pcre pcre-devel \ zlib zlib-devel \ libpng libpng-devel \ freetype freetype-devel \ libjpeg-turbo libjpeg-turbo-devel \ libvpx libvpx-devel \ fontconfig fontconfig-devel \ libXpm libXpm-devel \ libtiff libtiff-devel \ libmcrypt libmcrypt-devel \ gd gd-devel \ libicu libicu-devel # install PHP RUN mkdir -p /data/ RUN wget http://hk1.php.net/get/php-$PHP_VERSION.tar.gz/from/this/mirror \
-O /data/php-$PHP_VERSION.tar.gz RUN cd /data/ \ && tar zxf php-$PHP_VERSION.tar.gz
RUN cd /data/php-$PHP_VERSION/ \ && ./configure \ --prefix=/opt/php-$PHP_VERSION \
--enable-fpm \
--with-fpm-user=nobody \
--with-fpm-group=nobody \
--with-pear \
--with-zlib \
--with-pcre-regex \
--with-gd \
--with-mysqli=mysqlnd \
--with-pdo-mysql=mysqlnd \
--enable-xml \
--enable-bcmath \
--with-curl \
--with-mcrypt \
--enable-sockets \
--with-xmlrpc \
--enable-zip \
--enable-soap \
--enable-mbstring \
--enable-exif \
--enable-pcntl \
--enable-intl \
--with-openssl \
--with-freetype-dir=/usr/include/freetype2 \
&& make \
&& make install
RUN ln -s /opt/php-\$PHP_VERSION /opt/php

# install PHP redis
RUN cd /data/ \
&& git clone https://github.com/phpredis/phpredis.git --branch php7
RUN cd /data/phpredis \
&& /opt/php/bin/phpize \
&& ./configure --with-php-config=/opt/php/bin/php-config \
&& make \
&& make install

# init config files
COPY php-fpm.conf /opt/php/etc/php-fpm.conf
COPY php.ini /opt/php/lib/php.ini
COPY cert.pem /opt/php/etc/cert.pem


### 构建镜像

# docker build -t "php/base:v1.0" .


# docker images
REPOSITORY    TAG     IMAGE ID        CREATED          SIZE
php/base      v1.0    78f156daef65    8 seconds ago    1.472 GB


### 打包镜像

# docker save php/base:v1.0 > php_base_v1.0.tar


### 使用打包镜像

# docker load -i php_base_v1.0.tar


### 运行 docker 镜像

# docker run \
-it \
-d \
-p 10.1.1.1:12306:9000 \
-v 本地代码目录:镜像内代码目录 \
-v 本地log目录:镜像内log目录 \
php/base:v1.0 \
/opt/php/sbin/php-fpm


## 使用 docker 作为 online judge(OJ) 的 sandbox (2)

（1）测试集

（2）用户提交代码

（3）将测试集和用户提交的代码结合在一起，并运行出测试结果的预置代码（以下简称测试代码）

#### 测试代码

// var input = [["aa","ab","ac"]];
// var output = "a";
if(output != longestCommonPrefix.apply(this, input)) {
// 没有通过本测试
}


PHP – call_user_func_array

Python – getattr

Javascript’s Apply in Ruby

Object result = this.getClass().getDeclaredMethod("multiply", classes).invoke(this,ints);


Java 的反射也有其弊端：很难在运行时确定提供的参数类型，而 C / C++ 则根本没有反射，所以我们需要换一个办法来处理：将用户提交的代码和测试代码拼接到一起，并写入到独立的文件中进行测试。缺点是：每道题都需要提供一段独立的测试代码来进行拼接（而如果使用 Javascript 的 apply 方法进行测试则没有这种问题：这段测试代码可以去和任意一段用户提交的代码 + 测试数据拼接来进行测试）

// ========  用户提交代码  ========
/**
* @param {string[]} strs
* @return {string}
*/
var longestCommonPrefix = function(strs) {
if(strs.length === 0) return "";

var shortest_string_length = strs[0].length;
for(var i = 0; i < strs.length; i++) {
if(strs[i].length < shortest_string_length) {
shortest_string_length = strs[i].length;
}
}

var result = "";
loop:
for(var j = 0; j < shortest_string_length; j++) {
var mark = strs[0].charAt(j);
for(var k = 0; k < strs.length; k++) {
if(strs[k].charAt(j) != mark) {
break loop;
}
}
result = result + mark;
}

return result;
};
// ========  用户提交代码  ========

// ========  由DB数据生成的测试集  ========
var test_set = [
{"input":[[]], "output":""},
{"input":[[""]], "output":""},
{"input":[["a"]], "output":"a"},
{"input":[["","a"]], "output":""},
{"input":[["a","b"]], "output":""},
{"input":[["aa","ab","ac"]], "output":"a"},
{"input":[["","ab","ac"]], "output":""},
{"input":[["aaaaaaaaaaaaaaa","aaaaavvvvvv"]], "output":"aaaaa"},
{"input":[["a","bbbbbbbbbbbbbbbbbbbbbbb"]], "output":""},
{"input":[["","",""]], "output":""},
{"input":[["0123456789","0111"]], "output":"01"}
];
// ========  由DB数据生成的测试集  ========

// ========  测试代码  ========
for (var i = 0; i < test_set.length; i++) {
var input  = test_set[i]["input"];
var output = test_set[i]["output"];
try {
if(output != longestCommonPrefix.apply(this, input)) {
console.log(JSON.stringify({
"code": 1,
"data": {
"input": input[0],
"output": output,
"calculated": longestCommonPrefix.apply(this, input)
}
}));
process.exit();
}
}
catch (err) {
console.log(JSON.stringify({
"message": err.toString(),
"code": 2
}));
process.exit();
}
}

console.log(JSON.stringify({
"message": "AC",
"code": 0
}));
process.exit();
// ========  测试代码  ========

import java.util.ArrayList;

import com.fasterxml.jackson.databind.ObjectMapper;

public class Oj {
public static void main(String[] args) {

// ========  由DB数据生成的测试集  ========
ArrayList<TestCase> testCases = new ArrayList<>();
// ========  由DB数据生成的测试集  ========

ObjectMapper objectMapper = new ObjectMapper();

try {
Solution s = new Solution();

for (TestCase tc : testCases) {
String calculatedResult = s.longestCommonPrefix(tc.getStrs());
String result = tc.getResult();

if(!calculatedResult.equals(result)) {
ErrorResponseData errorResponseData = new ErrorResponseData(tc.getStrs(), tc.getResult(), calculatedResult);
ErrorResponse errorResponse = new ErrorResponse("Wrong Answer", 1, errorResponseData);
System.out.println(objectMapper.writeValueAsString(errorResponse));
System.exit(0);
}
}

CommonResponse commonResponse = new CommonResponse("AC", 0);
System.out.println(objectMapper.writeValueAsString(commonResponse));
System.exit(0);
} catch (Exception ex) {
ex.printStackTrace();
System.exit(1);
}
}
}

class TestCase {
private String[] strs;
private String result;

public TestCase(String[] strs, String result) {
this.strs = strs;
this.result = result;
}

public String[] getStrs() {
return strs;
}

public void setStrs(String[] strs) {
this.strs = strs;
}

public String getResult() {
return result;
}

public void setResult(String result) {
this.result = result;
}
}

class CommonResponse {
private String message;
private Integer code;

public CommonResponse(String message, Integer code) {
this.message = message;
this.code = code;
}

public String getMessage() {
return message;
}

public void setMessage(String message) {
this.message = message;
}

public Integer getCode() {
return code;
}

public void setCode(Integer code) {
this.code = code;
}
}

class ErrorResponse {
private String message;
private Integer code;
private ErrorResponseData errorResponseData;

public ErrorResponse(String message, Integer code, ErrorResponseData errorResponseData) {
this.message = message;
this.code = code;
this.errorResponseData = errorResponseData;
}

public String getMessage() {
return message;
}

public void setMessage(String message) {
this.message = message;
}

public Integer getCode() {
return code;
}

public void setCode(Integer code) {
this.code = code;
}

public ErrorResponseData getErrorResponseData() {
return errorResponseData;
}

public void setErrorResponseData(ErrorResponseData errorResponseData) {
this.errorResponseData = errorResponseData;
}
}

class ErrorResponseData {
private String[] input;
private String output;
private String calculated;

public ErrorResponseData(String[] input, String output, String calculated) {
this.input = input;
this.output = output;
this.calculated = calculated;
}

public String[] getInput() {
return input;
}

public void setInput(String[] input) {
this.input = input;
}

public String getOutput() {
return output;
}

public void setOutput(String output) {
this.output = output;
}

public String getCalculated() {
return calculated;
}

public void setCalculated(String calculated) {
this.calculated = calculated;
}
}

// ========  用户提交代码  ========
class Solution {
public String longestCommonPrefix(String[] strs) {
if(strs.length == 0) return "";
if(strs.length == 1) return strs[0];

String result = strs[0];
int index = result.length();

for (int i = 1; i < strs.length; i++) {
index = index < strs[i].length() ? index : strs[i].length();
for (int j = 0; j < index; j++) {
if (result.charAt(j) != strs[i].charAt(j)) {
index = j;
break;
}
}
}

return result.substring(0, index);
}
}
// ========  用户提交代码  ========


## RecordRTC 压缩视频和音频

RecordRTC 是一个基于WebRCT实现的一个录制视频/音频的js库。项目主页是：http://RecordRTC.org/

• 压缩视频

1. 修改图片大小

https://github.com/muaz-khan/RecordRTC/blob/master/RecordRTC.js#L2447-L2471

2. 初始化 RecordRTC 时指定 frameInterval；

RecordRTC(stream, {
type: 'video',
frameInterval: 150
});

• 压缩音频

1. 保留一个声道，设置 numberOfAudioChannels: 1，音频文件提及可以减少为原来的1/2；

2. 降低采样率，设置 sampleRate: 7350，即默认采样率44100的1/6；

3. 减小采样位数，从默认的16减小到8；