Problem E
RNA Secondary Structure
RNA, which stands for Ribonucleic Acid, is one of the major macromolecules that are essential for all known forms of life. It is made up of a long chain of components called nucleotides. Each component is made up of one of $4$ bases and are represented using A, C, G or U. The primary structure of RNA is a sequence of these characters. The secondary structure of RNA refers to the base pairing interactions between different components. More specifically, base A can pair up with base U and base C can pair up with base G. The stability of the RNA secondary structure depends on the total number of base pairs that can be formed. The final structure is the one that contains the maximum number of base pairs.
Let’s represent the primary structure as a string consisting of characters from the set (ACGU). The rules of secondary structure formation are as follows:
-
Any base A can form a pair with any base U
-
Any base C can form a pair with any base G
-
Each base can be part of at most one pair.
-
Let’s assume $w < x$, $y < z$ and $w < y$. If base at index $w$ forms a pair with base at index $x$ and base at index $y$ forms a pair with base at index $z$, then one of the following two conditions must be true:
\begin{align*} y& >x\\ z& <x\\ \end{align*} -
There can be at most $K$ pairs between C and G.
You will be given the primary structure of the RNA of a certain species and your job is to figure out the total number of base pairings in the final secondary structure based on the constraints mentioned above.
You will be given the primary structure in a compressed format that uses Run-length encoding. In this type of data compression, consecutive characters having the same value is replaced with a single character followed by its frequency. For example, “AAAACCGAAUUG” will be represented using “A4C2G1A2U2G1”. That means the primary structure will be given in the format $\langle c_1 f_1 c_2 f_2 c_3 f_3 \ldots c_ n f_ n \rangle $, where $c_ i$ is from the set (ACGU) and $f_ i$ is a positive integer.
The species that we are dealing with have the following properties:
-
$f_1 + f_2 + f_3 + \ldots f_ n \leq 10\, 050$
-
$f_1 \leq 5\, 000$
-
$f_ n \leq 5\, 000$
-
$f_2 + f_3 + f_4 + \ldots + f_{n - 1} \leq 50$
(note that $n\leq 52$).
Input
The first line of input is an integer $T$ ($T \leq 200$) that indicates the number of test cases.
Each case contains two lines. The first line is the primary structure given in run-length encoded format. The second line gives you the value of $K$ ($0 \leq K \leq 20$), that gives an upper limit on the number of C–G base pairs that can be in the final secondary structure.
Output
For each case, output the case number followed by the maximum number of base pairs that can be formed. Look at the samples for exact format.
Sample Input 1 | Sample Output 1 |
---|---|
3 A3C1G1C1U4A2U1 1 A3C1G1C1U4A2U1 0 A100U200 2 |
Case 1: 6 Case 2: 5 Case 3: 100 |