\documentclass[12pt]{article}

\voffset -2.8 cm
\textwidth 17 cm
\textheight 24.7 cm
\oddsidemargin -.54 cm
\evensidemargin -1.54 cm
\usepackage[latin1]{inputenc}
\usepackage{amsmath}
\usepackage[pdftex]{graphicx}

\pagestyle{plain}


\begin{document}

\centerline {\Large \sc Information and coding theory}
\bigskip
\bigskip
\hrule
\bigskip
\bigskip
\centerline {\underline{\Large Solution of exercise sheet n$^\circ$ 4}:}
\bigskip


\noindent {\bf 4-1.}
\par
\medskip
\noindent (a) The table of the Lempel-Ziv code:
\newline
\newline\newline
\begin{tabular}{| c | c | c | l |}
\hline Position & Substring & Output & Output-encoded\\\hline
1. & \mbox{A}   &  \mbox{(0,A)}  &  (00,000)\\\hline
2. & \mbox{A\_} &  \mbox{(1,\_)} &  (01,011)\\\hline
3. & \mbox{AB}  &  \mbox{(1,B)}  &  (01,001)\\\hline
4. & \mbox{ABB} &  \mbox{(3,B)}  &  (11,001)\\\hline
5. & \mbox{ABC} &  \mbox{(3,C)}  &  (11,010)\\\hline
6. & \mbox{\_}  &  \mbox{(0,\_)} &  (00,011)\\\hline
7. & \mbox{ABA} &  \mbox{(3,A)}  &  (11,000)\\\hline
8. & \mbox{B}   &  \mbox{(0,B)}  &  (00,001)\\\hline
9. & \mbox{C}   &  \mbox{(0,C)}  &  (00,010)\\\hline
10. & \mbox{.}  &  \mbox{(0,.)}  &  (00,100)\\\hline
\end{tabular}
\newline
\newline
\newline
The alphabet of the source reads $\mathcal{A} = \{\mathtt{A}, \mathtt{B}, \mathtt{C}, \mathtt{\_} \}$. To encode the position on can use 2 bits (sufficient to encode 0,1,2,3). Furthermore, one needs three bits to encode the character (taking into account adding the ``.'' at the end). Thus, one needs 5 bits per substring, which gives in total 50 bits. In a naive way one needs $3\times 18=54$ bits to send the sequence. We remark that asymptotically one tends to a number of bits per symbol which is equal to the entropy of the source, and thus smaller than the naive encoding, except in the case of a source with uniform distributed probabilities.

The elements A,B and C of the last ABC appear in different substrings for the first example. For the second example, ABC is part of the sub string that contains the last character.
\newline
\par
\medskip
\noindent (b)
\newline
\newline\newline
\begin{tabular}{| c | c | c | l |}
\hline Position & Substring & Output & Output-encoded\\\hline
1. & \mbox{A}   &  \mbox{(0,A)}  &  \mbox{(000,00)}\\\hline
2. & \mbox{B} &  \mbox{(0,B)} &  \mbox{(000,01)}\\\hline
3. & \mbox{AA}  &  \mbox{(1,A)}  &  \mbox{(001,00)}\\\hline
4. & \mbox{AAA} &  \mbox{(3,A)}  &  \mbox{(011,00)}\\\hline
5. & \mbox{AAAA} &  \mbox{(4,A)}  &  \mbox{(100,00)}\\\hline
6. & \mbox{AAAAA}  &  \mbox{(5,A)} &  \mbox{(101,00)}\\\hline
7. & \mbox{BB} &  \mbox{(2,B)}  &  \mbox{(010,01)}\\\hline
8. & \mbox{.} &  \mbox{(0,.)}  &  \mbox{(000,11)}\\\hline \end{tabular}
\newline
\newline
\newline
The alphabet of the source reads $\mathcal{A} = \{\mathtt{A}, \mathtt{B} \}$.\\
Lempel-Ziv: $5*8=40$ bits\\
Naive encoding: $2*19=38$ bits

\newpage
\par
\medskip
\noindent (c)
\newline
\newline\newline
\begin{tabular}{| c | c | c | l |}
\hline Position & Substring & Output & Output-encoded\\\hline
1. & \mbox{A}   &  \mbox{(0,A)}  &  \mbox{(000,00)}\\\hline
2. & \mbox{B} &  \mbox{(0,B)} &  \mbox{(000,01)}\\\hline
3. & \mbox{AA}  &  \mbox{(1,A)}  &  \mbox{(001,00)}\\\hline
4. & \mbox{AAA} &  \mbox{(3,A)}  &  \mbox{(011,00)}\\\hline
5. & \mbox{AAAA} &  \mbox{(4,A)}  &  \mbox{(100,00)}\\\hline
6. & \mbox{AAAAA}  &  \mbox{(5,A)} &  \mbox{(101,00)}\\\hline
7. & \mbox{AAAAAA} &  \mbox{(6,A)}  &  \mbox{(110,00)}\\\hline
8. & \mbox{.} &  \mbox{(0,.)}  &  \mbox{(000,11)}\\\hline
\end{tabular}
\newline
\newline
\newline
The alphabet of the source reads $\mathcal{A} = \{\mathtt{A}, \mathtt{B} \}$.\\
Lempel-Ziv: $5*8=40$ bits\\
Naive encoding: $2*23=46$ bits\\
\medskip
\noindent (d) The original sequence is: AABABC\_ ABBBBBBBB.
\par


\bigskip
\noindent {\bf 4-2.}
\medskip

\medskip
\noindent (a) The optimal method of asking questions can be found by considering the Huffman code applied to the source $X_1 X_2 ... X_n$.

In order to apply this code one needs to define a new random variable $Y$ which has an alphabet that contains all outcomes of the sequence $X_1 X_2 ... X_n$
That is,\\
\begin{equation}
	\begin{split}
		\mathcal{Y}=\{
		 &\underbrace{111...111}_{\textrm{``All objects are faulty''}}, \underbrace{111...110}_{\textrm{``The first $n-1$ objects are faulty, the last one is not''}},\\
		 &\underbrace{111...100}_{\textrm{``The first $n-2$ objects are faulty, the last two are not''}},\\
		 &\underbrace{111...101}_{\textrm{``The first $n-2$ objects and the last one are faulty, the $(n-1)$th object is not faulty''}} , \\
		 &...,  \underbrace{000...000}_{\textrm{``No object is faulty.''}}\}. 		
	\end{split}
\end{equation}
In total there are $2^n$ possible sequences and we allocate to them the following probability distribution
\begin{equation}
	\begin{split}
		q(y)=\{
		& p_1 p_2 p_3 \cdots p_n, p_1 p_2 p_3 \cdots p_{n-2}p_{n-1}(1-p_{n}),\\
		& p_1 p_2 p_3 \cdots p_{n-2}(1-p_{n-1})(1-p_n),\\
		& p_1 p_2 p_3 \cdots p_{n-2}(1-p_{n-1})p_n,\\
		& ..., (1-p_1)(1-p_2) \cdots (1-p_n) \}.
	\end{split}	
\end{equation}
For given numbers for $p_1, p_2$ etc. we can rank all values of $Y$ according to the probabilities $q(y)$ ($q(y)$ has to be normalized) and then construct the Huffman code. %[There will be an example in a future version].
% \underline{Example:}
% We consider $n=3$ and $p_1=0.66, p_2=0.65, p_3=0.64$. We obtain the following table when applying the Huffman code:
% \begin{figure}
% 	\centering
% \includegraphics[width=1\textwidth]{Inf_Th_TP4_C_en-huffmanEx.pdf}
% %	\caption{Codewords.}
% \end{figure}

% \begin{center}
% 	\begin{tabular}{| l | c | c c c c c c c|}
% 		\hline
% 		Codeword & Y & \multicolumn{7}{|c|}{Probability} \\
% 		\hline
% 		00 & 111 & 0.2746 & 0.2746 & 0.2746 & 0.2746 & 0.3023 & 0.4232 & 0.5768\\
% 		010 & 110 & 0.1544 & 0.1544 & 0.1627 & 0.2604 & 0.2746 & 0.3023 & 0.4232\\
% 		011 & 101 & 0.1478 & 0.1478 & 0.1544 & 0.1627 & 0.2604 & 0.2746 & \\
% 		100 & 011 & 0.1414 & 0.1414 & 0.1478 & 0.1544 & 0.1627 & & \\
% 		110 & 100 & 0.0832 & 0.1190 & 0.1414 & 0.1478 & & & \\
% 		111 & 010 & 0.0796 & 0.0832 & 0.1190 & & & & \\
% 		1010 & 001 & 0.0762 & 0.0796 & & & & & \\
% 		1011 & 000 & 0.0428 & &  & & & &\\
% 		\hline
% 	\end{tabular}
% \end{center}

\newpage
(b) The longest sequence of questions is associated to the case which is the least probable. Since $p_i> \frac{1}{2}$ the least probable case is the case when all objects are faulty. The second least probable is the case where all except the last one is faulty. These two cases are the leafs of the longest branches of a binary tree associated to the Huffman code. The last question is thus: ``The last object, is it faulty?''.


\bigskip
\noindent {\bf 4-3.}
\par
\medskip
\noindent The player $B$ uses the optimal code. We also know that this player needs to ask, on average, 35 questions (``bits'') to identify the object. This implies the inequality
\begin{eqnarray}
H(X)&\le& 35 < H(X)+1 \nonumber \\
&\Rightarrow& 34 < H(X) \nonumber
\end{eqnarray}
For a certain fixed number of objects, the distribution of objects of player $A$ which gives the highest entropy is the uniform distribution. Conversely, for a fixed value of the entropy the uniform distribution minimizes the number of objects that are associated to the fixed entropy. The uniform distribution offers a possibility to calculate a lower bound on the number of objects:
\begin{eqnarray}
34 < \log_2 m \nonumber.
\end{eqnarray}
We conclude that there are at least $2^{34}\simeq 1.7$ $10^{10}$ objects in the ensemble.

\bigskip
\noindent {\bf 4-4.}
\par
\medskip
\noindent (a)
The probability distribution that maximizes the Shannon entropy is the one where all possible elements have the same probability to occur. There are $n+1$ possible elements: $n$ where one of the coins is counterfeited and one element taking into account the possibility that no coin is counterfeited. The probability distribution of this case is $p_i = \frac{1}{n+1}$, and the Shannon entropy is $H(X) = \log_2 (n+1)$ bits.
\medskip


\noindent
(b) With the optimal code, one needs on average $k$ weighs with $k$ such that
\begin{eqnarray}
 H(X) &\le& k < H(X) + 1.
 \end{eqnarray}
For the case of maximal entropy, we have
 \begin{eqnarray}
 \log_2(n+1) &\le& k < \log_2(n+1) + 1.
\end{eqnarray}


\noindent
(c) With the help of (b) one deduces that $ n+1 \le  2^k < 2(n+1)$, thus, $n\le  2^k-1$.


\medskip
\noindent
(d) The probability distribution attains the bound, if $n = 2^{k}-1$ and thus $p_i = \frac{1}{2^k}$ (probability distribution which is 2-adic).

\medskip
\begin{tabular}{| c | c |}
\hline Weigh & Max. coins  \\\hline
1 & 1   \\\hline
2 & 3   \\\hline
3 & 7   \\\hline
4 & 15   \\\hline
\end{tabular}
\newline
\newline
The bound is attained if one has a number of coins equal to $1, 3, 7, 15, ...$ .

\bigskip
\noindent {\bf 4-5.}
\par
\medskip
\noindent (a) We are given $L-H_5(X)=0$
\[
L-H_5(X)=\sum_{i=1}^m p_il_i+\sum_{i=1}^m p_i \log_5 p_i =0,
\]
but $l_i=-\log_5 5^{-l_i}$, so we can rewrite
\[
L-H_5(X)=-\sum_{i=1}^m p_i\log_5 5^{-l_i}+\sum_{i=1}^m p_i \log_5 p_i =0.
\]
Defining $r_i=\frac{5^{-l_i}}{\sum_{j}5^{-l_j}}$ and $R=\sum_i 5^{-l_i}$, we have
\[
L-H_5(X)=\sum_{i} p_i \log_5\frac{p_i}{r_i}-\log_5 R =0.
\]
The Kraft inequality tells us that $R\leq 1$ and we remark that $r_i$ is a probability distrubtion, since $r_i \geq 0$ and $\sum_i r_i=1$, so we can rewrite the previous expression, using the definition of the relative entropy $D(x||y)$:
\[
L-H_5(X)=D(p||r)+\log_5\frac{1}{R}=0.
\]
We know that $D(p||r) \geq 0$ and $\log_5\frac{1}{R} \geq 0$ since $R\leq 1$. The inequality implies
$R=1$ and  $p_i=r_i=5^{-l_i}$.


\medskip
\noindent (b) Since $L=H_5(X)$ we know that the 5-adic code is optimal. Thus, we can construct it using the Huffman code. For each step of this encoding one regroups 5 elements to one, thus the number of elements is decreased by 4 at each step. If at the beginning one had $m$ elements, at the end of the encoding, i.e. after $k$ steps, there remains only one element ($m-4k=1$). Thus, $m=4k+1$.

\end{document}