Problem B
Sublinear Algorithms: Minimum Spanning Forest
Usually a graph algorithm requires $\Omega ( |E| )$ time, because the algorithm has to read the entire input, i.e. every edge of the graph. As a result classical graph algorithms can often not be used for extremely large graphs. Interestingly, for many problems we do not need to read the entire graph in order to give an approximate solution.
The task
Write a program that estimates the weight of a minimum spanning forest, such that:
-
The program only needs to know small fraction of the input.
-
The program outputs a $(1\pm 0.1)$-approximation of the weight of a minimum spanning forest. This means $0.9 \cdot \text{opt} \le \text{output} \le 1.1 \cdot \text{opt}$.
-
Even though the optimal value is an integer, your program is allowed to return a float value as approximation. E.g. you could output $9.5$ if you don’t know if the optimal value is $9$ or $10$.
-
The program is allowed to be randomized.
You may assume the following about the input graphs:
-
The graph is undirected.
-
The input graphs will have less than $2^{31}$ nodes, so you can index them with signed 32-bit integers.
-
The input graphs will have small degree, i.e. bounded by some constant.
-
The weights $c_{ij}$ of the edges are natural numbers.
-
You may assume the optimal spanning forest has weight at least $n/2$.
-
The nodes are indexed as $0,1,...,n-1$.
Interaction
As you are supposed to only read a small fraction of the input, your program is not given the entire graph at once. At first, you are only given the number of nodes $n$, the maximum edge weight $W$, and a bound $Q$ on how many nodes of the graph your program is allowed to read. (Reading more nodes than $Q$ does not fail the submission, but you will get a worse score.) Your program is then allowed to request more data from Kattis.
Requesting data:
If your program wants to know the neighbors of node number $v$, then it should print the number $v$ to stdout. Kattis will then give the neighbor information on stdin. The format of the answer is as follows: The answer is given by a single line, starting with the number of neighbors. The rest of the line is a list of neighbors and the cost of the respective edges. For example 3 5 2 9 4 1 1 means the node $v$ has $3$ neighbors $5$, $9$, and $1$. The cost of these edges is $c_{v,5} = 2$, $c_{9,5} = 4$, and $c_{v,1} = 1$.
Reporting your result:
If your program wants to report a result, it should print end followed by the weight of the spanning forest. For example you should print end 10.312, when your program computed a weight of $10.312$.
Communication example
Your program’s output (stdout) |
Kattis’ input/answer (stdin) |
Interpretation |
10 |
The graph has $10$ nodes. |
|
4 |
The edge weights are at most $c_{i,j} \le 4$. |
|
3 |
Your program is supposed to request at most $5$ nodes. |
|
7 |
Your program asks for the neighbors of node $7$. |
|
3 1 3 6 2 3 1 |
Node $7$ has $3$ incident edges: $(7,1)$ with weight $3$, $(7, 6)$ with weight $2$ and $(7,3)$ with weight $1$. |
|
1 |
Your program asks for the neighbors of node $1$. |
|
1 7 3 |
Node $1$ has only one neighbor: $7$ and the corresponding edge has weight $3$. |
|
2 |
Your program asks for the neighbors of node $2$. |
|
0 |
Node $2$ has no neighbors |
|
end 4.12 |
Your program thinks $4.12$ is a $1.2$-approximation for the weight of the minimum spanning forest. Your program should exit now. |
Note that the empty lines in the table above are just for formatting. The actual content of stdin/stdout looks like this:
Your program’s output (stdout) |
Kattis’ input/answer (stdin) |
7 |
10 |
1 |
4 |
2 |
3 |
end 4.12 |
3 1 3 6 2 3 1 |
1 7 3 |
|
0 |
Scores and Passing
The test cases consist of two groups: The first group ($12$ test cases) checks that your program is correct. The second group ($20$ test cases) contains additional test cases used to assign your program a score. This second group of test cases is optional. It is not required to pass this group but passing it results in a better score.
Your submission passes a test group if all results are a valid $(1\pm 0.1)$-approximation of the weight of the minimum spanning forest. Additionally, your submission receives the following score for each test case:
\[ max(0, 100 \cdot (P - 10 \cdot \epsilon )), \]where $\epsilon $ is
the achieved approximation and $P$ is a penalty term for when your
submission requested too many nodes.
More accurately, if your submission returned result
$r$ and the correct value
of the minimum spanning forest was $s$, then $\epsilon = |s-r|/s$.
If your submission requested $m$ nodes from Kattis and $Q$ is the value of nodes that your
submission is allowed to read, then $P = min(1, Q/r)$. ($Q$ is given to your program, see
further above in section "Interaction".)
The total score will be the sum for each test case.
Remarks on performance
The best score is obtained by computing a good approximation while not requesting too many nodes of the graph. The actual runtime of your submission does not matter, so you do not need to write your program in C++ or other low-level languages in order to get a good score.