Group of the Week (III): James Blake

It has been a while since the last Group of the Week (I had a feeling it turn out this way…).

Anyway, if you do a bold extrapolation from the two groups featured so far, I like Metal and Electro-Pop. While this represent a minor portion of the music I listen too, I will continue with James Blake — another Electro-Pop act. His latest album is entitled overgrown, and I really like it. He also has a cool webpage.

Today, I found a show he performed at the Pitchfork Music Festival in Paris.

James Blake on Spotify:

Group of the Week (I): Woods of Ypres

I don’t listen to Metal that much but I really like Woods of Ypres and particularly their last album Woods 5: Grey Skies & Electric Light. The reason I write ‘last album’ and not ‘latest album’ is that their founder, singer and guitarist David Gold was killed in a car accident in 2011.

I intend to make Group of the Week a series, though it will certainly not be a weekly series. Let’s see 🙂

A Youtube playlist:

On Spotify:

A Python Hash Table Implementation

A simplicial implementation of a hash table in Python3.

(Some browsers display the html code &quot rather than the double quote. If this is the case for you please replace all occurences of &quot.)

#!/usr/bin/env python

import sys

class Item:
    key   = ""
    value = 0

    def __init__(self,key,value):
        self.key = key
        self.value = value

    def print(self):
        print("  '" + self.key + "' / " + str(self.value) )
    

class HashTable:
    'Common base class for a hash table'
    tableSize    = 0
    entriesCount = 0
    alphabetSize = 2*26
    hashTable    = []
    

    def __init__(self, size):
        self.tableSize = size
        self.hashTable = [[] for i in range(size)]

    def char2int(self,char):
        if char >= 'A' and char <= 'Z':
            return ord(char)-65
        elif char >= 'a' and char <= 'z':
            return ord(char)-65-7
        else:
            raise NameError('Invalid character in key! Alphabet is [a-z][A-Z]')
        
    def hashing(self,key):
        hash = 0
        for i,c in enumerate ( key ):
            hash += pow(self.alphabetSize, len(key)-i-1) * self.char2int(c)
        return hash % self.tableSize

    def insert(self,item):
        hash = self.hashing(item.key)
        # print(hash)
        for i,it in enumerate(self.hashTable[hash]):
            if it.key == item.key:
                del self.hashTable[hash][i]
                self.entriesCount -= 1
        self.hashTable[hash].append(item)
        self.entriesCount += 1            

    def get(self,key):
        print ("Getting item(s) with key '" + key + "'")
        hash = self.hashing(key)
        for i,it in enumerate(self.hashTable[hash]):
            if it.key == key:
                return self.hashTable[hash]
        print (" NOT IN TABLE!")            
        return None

    def delete(self,key):
        print ("Deleting item with key '" + key + "'")
        hash = self.hashing(key)
        for i,it in enumerate(self.hashTable[hash]):
            if it.key == key:
                del self.hashTable[hash][i]
                self.entriesCount -= 1
                return
        print (" NOT IN TABLE!")            
            
    def print(self):
        print ( ">>>>> CURRENT TABLE BEGIN >>>>" )
        print ( str(self.getNumEntries()) + " entries in table" )
        for i in range(self.tableSize):
            print ( " [" + str(i) + "]: " )
            for j in range(len(self.hashTable[i])):
                self.hashTable[i][j].print()
        print ( "<<<<< CURRENT TABLE END <<<<<" )

    def getNumEntries(self):
        return self.entriesCount


if __name__ == "__main__":
    hs = HashTable(11)

    item = Item("one",1)
    hs.insert(item)
    hs.print()
    hs.insert(item)
    hs.print()

    item = Item("two",2)
    hs.insert(item)

    item = Item("three",3)
    hs.insert(item)
    hs.print()

    item = Item("one",4);
    hs.insert(item);

    items = hs.get("one");
    if items != None:
        for j in range(len(items)):
            items[j].print()
    
    item = Item("PheewThisIsALongOne",12345)
    hs.insert(item)
    hs.print()

    items = hs.get("PheewThisIsALongOne")
    if items != None:
        for j in range(len(items)):
            items[j].print()
        
    items = hs.get("PheewThisIsOne")
    if items != None:
        for j in range(len(items)):
            items[j].print()
        
    hs.delete("PheewThisIsALongOne")
    hs.print()

    hs.delete("PheewThisIsTheOne")
    hs.print()

    hs.delete("one")
    hs.print()

    # This one leads to an exception as '!' is not part of the allowed alphabet
    # item = Item("!", 5)
    # hs.insert(item)

Skiena’s TADM Problems Chapter 8

From Skiena’s Algorithm Design Manual
(no guarantee that any solution is good or even correct)

Problem 8-1

Typists often make transposition errors exchanging neighboring characters, such as typing setve when you mean steve. This requires two substitutions to fix under the conventional definition of edit distance. Incorporate a swap operation into our edit distance function, so that such neighboring transposition errors can be fixed at the cost of one operation.

Solution
(no guarantee that the solution is good or even correct)

I kept the somewhat quirky requirement that all strings start in position 1 (instead of 0). Be sure to add a blank at the front if you test your own string literals. Note that I changed the meaning of “S” in the edit sequence to SWAP (I suppose it indicated SUBSTITUTE originally). Substitutions are now indicated using R for REPLACE.

Compile with gcc -std=c99 and please leave comments if you find a mistake or an improvement.

#include <stdio.h>
#include <string.h>

#define MATCH       0 /* enumerated type symbol for match */
#define INSERT      1 /* enumerated type symbol for insert */
#define DELETE      2 /* enumerated type symbol for delete */
#define SWAP        3 /* enumerated type symbol for swap */

typedef struct {
    int cost;     /* cost of reaching this cell */
    int parent;   /* parent cell */
} cell;

/* EXAMPLE 1
#define MAXLEN 14
cell m[MAXLEN+1][MAXLEN+1];
char s[] = " thou shalt not";
char t[] = " you should not";
*/

/* EXAMPLE 2
#define MAXLEN 5
cell m[MAXLEN+1][MAXLEN+1];
char s[] = " setev";
char t[] = " steve";
*/

/* EXAMPLE 3 */
#define MAXLEN 35
cell m[MAXLEN+1][MAXLEN+1];
char s[] = " Heigth si on of my favurite tapas! ";
char t[] = " Height is one of my favorite typos!";

/*********** initializations ***********/
void row_init (int i) {
    m[0][i].cost = i;
    if (i&gt;0)
        m[0][i].parent =  INSERT;
    else
        m[0][i].parent = -1;
}

void column_init (int i) {
    m[i][0].cost = i;
    if (i>0)
        m[i][0].parent = DELETE;
    else
        m[i][0].parent = -1;
}

/*********** edit distance ***********/
void goal_cell(char *s, char *t, int *i, int *j) {
    *i = strlen(s) - 1;
    *j = strlen(t) - 1;
}

int match ( char s, char t ) {
    if ( s==t )
        return 0;
    else
        return 1;
}

int indel ( char c ) {
    return 1;
}

int swap ( char *s, char *t, int *i, int *j ) {
    /* check for end of string */
    if ( !( (*i < MAXLEN) && (*j < MAXLEN) ))
        return 10;
    /* Swap if the next two chars of s and t
      are cross-wise identical but not if they are
      pairwise the same - in this case we want
      to match */
    if ( s[*i] == t[(*j)+1] && s[(*i)+1] == t[*j] && (s[*i] != s[(*i)+1]) && (*i == *j) ) {
        return -1; /* -1 will beat all other 1-to-1 edits */
    }
    else
        return 10;
}

int string_edit_distance (char *s, char *t)
{
    int i, j;
    int opt[4];             /* cost of the options */
    for (unsigned int i=0; i < MAXLEN; i++) {
        row_init(i);
        column_init(i);
    }

    for (i=1; i < (int) strlen(s); i++) {
        for (j=1; j < (int) strlen(t); j++) {
            opt[MATCH]  = m[i-1][j-1].cost + match(s[i],t[j]);
            opt[INSERT] = m[i][j-1].cost + indel(t[j]);
            opt[DELETE] = m[i-1][j].cost + indel(s[i]);
            opt[SWAP]   = m[i-1][j-1].cost + swap(s,t,&i,&j);
            m[i][j].cost = opt[MATCH];
            m[i][j].parent = MATCH;
            for (int k=INSERT; k<=SWAP; k++)
                if (opt[k] < m[i][j].cost) {
                    m[i][j].cost = opt[k];
                    m[i][j].parent = k;
                    if ( k == SWAP ) {
                        /* correct for the -1 added before */
                        m[i][j].cost += 1;
                        m[i][j].parent = k;
                        ++i, ++j;
                        m[i][j].cost = m[i-1][j-1].cost + 1;
                        m[i][j].parent = k;
                    }
                }
        }
    }
    goal_cell(s,t,&i,&j);
    return( m[i][j].cost );
}

/*********** reconstruction ***********/
void insert_out(char *t, int j) {
    printf("I");
}

void delete_out(char *s, int i) {
    printf("D");
}

void match_out(char *s, char *t, int i, int j) {
    if (s[i]==t[j]) printf("M");
    else printf("R");
}

void swap_out(char *s, char *t, int i, int j) {
    printf("S");
}

void reconstruct_path(char *s, char *t, int i, int j) {
    if (m[i][j].parent == -1) return;
    if (m[i][j].parent == MATCH) {
        reconstruct_path(s,t,i-1,j-1);
        match_out(s, t, i, j);
        return;
    }
    if (m[i][j].parent == INSERT) {
        reconstruct_path(s,t,i,j-1);
        insert_out(t,j);
        return;
    }
    if (m[i][j].parent == DELETE) {
        reconstruct_path(s,t,i-1,j);
        delete_out(s,i);
        return;
    }
    if (m[i][j].parent == SWAP) {
        reconstruct_path(s,t,i-2,j-2);
        swap_out(s,t,i,j);
        return;
    }
}

/*********** control ***********/
int main () {

    printf ("Edit distance of \"%s\" and \"%s\" is %i\n", s, t, string_edit_distance (s,t) );
    reconstruct_path (s,t,MAXLEN,MAXLEN);
    printf ("\n");

    return 0;
}

Output of example 1:

Edit distance of " thou shalt not" and " you should not" is 5
DRMMMMMIRMRMMMM

Output of example 2:

Edit distance of " setev" and " steve" is 2
MSS

Output of example 3:

Edit distance of " Heigth si on of my favurite tapas! " and " Height is one of my favorite typos!" is 7
MMMMSMSMMMIMMMMMMMMMMRMMMMMMRMRMMD

Problem 8-2

Suppose you are given three strings of characters: X, Y , and Z, where |X| = n, |Y| = m, and |Z| = n+m. Z is said to be a shuffle of X and Y iff Z can be formed by interleaving the characters from X and Y in a way that maintains the left-to-right ordering of the characters from each string.

  1. Show that cchocohilaptes is a shuffle of chocolate and chips, but chocochilatspe is not.
  2. Give an efficient dynamic-programming algorithm that determines whether Z is a shuffle of X and Y. Hint: the values of the dynamic programming matrix you construct should be Boolean, not numeric.

Solution
(no guarantee that the solution is good or even correct)

  1. This one is easy to see if you start from the end. The ‘s’ cannot occur before the ‘p’
  2. Give an efficient dynamic-programming algorithm that determines whether Z is a shuffle of X and Y . Hint: the values of the dynamic programming matrix you construct should be Boolean, not numeric.Compile with -std=c++11 option and please leave comments if you find a mistake or an improvement.
#include <iostream>
#include <vector>

class Shuffle {
public: 
	Shuffle(const std::string& a, const std::string& b, const std::string& c);
    void is_shuffle ( const unsigned int i, const unsigned int j );
    void init ();
    void printXYZ () const;
    void printDPmat () const;
    bool get ( const unsigned int i, const unsigned int j ) const;

private:
    const std::string x;
    const std::string y;
    const std::string z;
    const std::string::size_type n, m;
    std::vector<std::vector<bool>> DPmat;
};

Shuffle::Shuffle(const std::string& a, const std::string& b, const std::string& c) : x{a}, y{b}, z{c}, n{x.size()}, m{y.size()} {
    DPmat.resize(n+1);
    for ( unsigned int i = 0 ; i <= n; i++ )
        DPmat[i].resize(m+1);
}

void Shuffle::printXYZ () const {
    std::cout << x << std::endl << y << std::endl << z << std::endl;
}

void Shuffle::printDPmat () const {
    for ( unsigned int i = 0; i <= n; ++i ) {
        for ( unsigned int j = 0; j <= m; ++j ) 
            std::cout << DPmat[i][j] << " ";
        std::cout << std::endl;
    }
}

void Shuffle::is_shuffle ( const unsigned int i, const unsigned int j ) {
    DPmat[i][j] = ((x[i-1] == z[i+j-1]) && DPmat[i-1][j]) || ((y[j-1] == z[i+j-1]) && DPmat[i][j-1] );
//-SMS     printf ( "x[%i]: %c y[%i]: %c z[%i]: %c\n", i-1, x[i-1], j-1, y[j-1], i+j-1, z[i+j-1] );
//-SMS     printDPmat();
}

bool Shuffle::get ( const unsigned int i, const unsigned int j ) const {
    return DPmat[i][j];
}

void Shuffle::init () {
    DPmat[0][0] = true;

    for ( std::string::size_type i = 0; i <= n; ++i )
    for ( std::string::size_type j = 0; j <= m; ++j )
        DPmat[i][j] = false;

    DPmat[0][0] = true;

    for ( std::string::size_type i = 1; i <= n; ++i )
        if ( !x.compare(0,i,z,0,i) )
            DPmat[i][0] = true;
            
    for ( std::string::size_type j = 1; j <= m; ++j )
        if ( !y.compare(0,j,z,0,j) )
            DPmat[0][j] = true;
}

int main () {
    const std::string X {"chocolate"};
    const std::string Y {"chips"};
    const std::string Z {"cchocohilaptes"};
//-SMS     const std::string Z {"chocochilatspe"};
    Shuffle shuffle(X, Y, Z);
    shuffle.printXYZ();

    shuffle.init();

    for ( unsigned int i = 1; i <= X.size(); ++i ) {
        for ( unsigned int j = 1; j <= Y.size(); ++j ) {
            if ( !shuffle.get(i-1,j) && !shuffle.get(i,j-1) )
                continue;
            shuffle.is_shuffle ( i, j );
        }
    }

    shuffle.printDPmat();

    if ( shuffle.get (X.size(), Y.size() ) )
        std::cout << Z << " is a shuffle of " << X << " and " << Y << std::endl;
    else
        std::cout << Z << " is NOT a shuffle of " << X << " and " << Y << std::endl;
    return 0;
}

Output:

chocolate
chips
cchocohilaptes
1 1 0 0 0 0 
1 1 1 0 0 0 
0 1 0 0 0 0 
0 1 0 0 0 0 
0 1 0 0 0 0 
0 1 1 1 0 0 
0 0 0 1 0 0 
0 0 0 1 1 0 
0 0 0 0 1 0 
0 0 0 0 1 1 
cchocohilaptes is a shuffle of chocolate and chips

Problem 8-3

The longest common substring (not subsequence) of two strings X and Y is the longest string that appears as a run of consecutive letters in both strings. For example, the longest common substring of photograph and tomography is ograph.
  1. Let n = |X| and m = |Y |. Give a Θ(nm) dynamic programming algorithm for longest common substring based on the longest common subsequence/edit distance algorithm.
  2. Give a simpler Θ(nm) algorithm that does not rely on dynamic programming.

Solution
(no guarantee that the solution is good or even correct)

  1. This one is based on Solution 8-2. Compile with -std=c++11 option.
    #include <iostream>
    #include <vector>
    #include <string>
    
    class LongestSubstring {
    public:
    	LongestSubstring(const std::string& a, const std::string& b);
        void find_longest_substring ();
        void init ();
        void printDPmat () const;
        std::string getZ () const;
        unsigned int getMaxlen () const;
    
    private:
        const std::string x;
        const std::string y;
        std::string z;
        const std::string::size_type n, m;
        std::vector<std::vector<bool>> DPmat;
        unsigned int maxlen;
    };
    
    LongestSubstring::LongestSubstring(const std::string& a, const std::string& b) : x{a}, y{b}, n{x.size()}, m{y.size()}, maxlen{0} {
        DPmat.resize(n);
        for ( unsigned int i = 0 ; i <= n; i++ )
            DPmat[i].resize(m);
    }
    
    void LongestSubstring::printDPmat () const {
        for ( unsigned int i = 0; i != n; ++i ) {
            for ( unsigned int j = 0; j != m; ++j )
                std::cout << DPmat[i][j] << " ";
            std::cout << std::endl;
        }
    }
    
    // the longest diagonal sequence of true values in the DP matrix
    // constitutes the longest substring
    void LongestSubstring::find_longest_substring () {
        for ( unsigned int i = 0; i != n; ++i ) {
            for ( unsigned int j = 0; j != m; ++j ) {
                unsigned int k = i, l = j, len = 0;
                std::string tmp = "";
                while ( DPmat[k][l] == true ) {
                    tmp += x[k];
                    ++k, ++l, ++len;
                }
                if ( len > maxlen ) {
                    maxlen = len;
                    z = tmp;
                }
            }
        }
    }
    
    std::string LongestSubstring::getZ () const {
        return z;
    }
    
    unsigned int LongestSubstring::getMaxlen () const {
        return maxlen;
    }
    
    // the DP matrix can be constructed entirely during initialization
    void LongestSubstring::init () {
        DPmat[0][0] = true;
    
        for ( std::string::size_type i = 0; i != n; ++i )
        for ( std::string::size_type j = 0; j != m; ++j )
            DPmat[i][j] = x[i] == y[j];
    }
    
    int main () {
        const std::string X {"photograph"};
        const std::string Y {"tomography"};
        LongestSubstring ls(X, Y);
    
        ls.init();
        ls.find_longest_substring ();
    
        ls.printDPmat();
        std::cout << "Longest common substring of " << X << " and " << Y << " is " << ls.getZ() << " (length: " << ls.getMaxlen() << ")" << std::endl;
        return 0;
    }
    

    Output:

    0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 1 0
    0 1 0 1 0 0 0 0 0 0
    1 0 0 0 0 0 0 0 0 0
    0 1 0 1 0 0 0 0 0 0
    0 0 0 0 1 0 0 0 0 0
    0 0 0 0 0 1 0 0 0 0
    0 0 0 0 0 0 1 0 0 0
    0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 1 0
    Longest common substring of photograph and tomography is ograph (length: 6)
    
  2. A Python 3 solution
    def longest_substring_b( s, t ):
        u = []
        S = len(s)
        T = len(t)
        maxlen = 0
    
        i = 0
        j = 0
        for i in range(S):
            for j in range(T):
                tmplen = 0
                v = []
                if s[i] == t[j]:
                    n = i
                    m = j
                    while n < S and m < T and s[n] == t[m]:
                        v.append(s[n])
                        n += 1
                        m += 1
                    if ( len(v) > maxlen ):
                        maxlen = len(v)
                        u = v
    
        return ''.join(u)
    
    if __name__ == "__main__":
    
        s = "photograph"
        t = "tomography"
    
        u = longest_substring_b ( s, t )
        print ("Longest substring of '" + s + "' and '" + t + "' is '" + u + "' (length: " + str(len(u)) + ")" )
    

    Output:

    Longest substring of 'photograph' and 'tomography' is 'ograph' (length: 6)

Problem 8-4

The longest common subsequence (LCS) of two sequences T and P is the longest sequence L such that L is a subsequence of both T and P. The shortest common supersequence (SCS) of T and P is the smallest sequence L such that both T and P are a subsequence of L.

  1. Give efficient algorithms to find the LCS and SCS of two given sequences.
  2. Let d(T,P) be the minimum edit distance between T and P when no substitutions are allowed (i.e., the only changes are character insertion and deletion). Prove that d(T, P ) = |SCS(T, P )| − |LCS(T, P )| where |SCS(T, P )| (|LCS(T,P)|) is the size of the shortest SCS (longest LCS) of T and P.

Solution
(no guarantee that the solution is good or even correct)

  1. From how I understand dynamic programming by now there is not THE ONE solution. You can rather build the DP matrix in different ways, which result in different paths across the matrix for obtaining the desired result. Here, is a Python 3 solution.
    def longest_common_subsequence (s, t):
        DP = [[0 for j in range(len(t)+1)] for i in range(len(s)+1)]
    
        # build the DP matrix
        for i, x in enumerate(s):
            for j, y in enumerate(t):
                if x == y:
                    DP[i+1][j+1] = DP[i][j] + 1
                else:
                    DP[i+1][j+1] = max(DP[i+1][j], DP[i][j+1])
        # print(DP)
    
        # assemble subsequence
        lcs = ""
        x, y = len(s), len(t)
        while x and y:
            if DP[x][y] == DP[x-1][y]:
                x -= 1
            elif DP[x][y] == DP[x][y-1]:
                y -= 1
            else:
                lcs = s[x-1] + lcs
                x -= 1
                y -= 1
        return lcs
    
    def shortest_common_supersequence (s, t):
        DP = [[0 for j in range(len(t)+1)] for i in range(len(s)+1)]
    
        # build the DP matrix
        for i, x in enumerate(s):
            for j, y in enumerate(t):
                if x == y:
                    DP[i+1][j+1] = max(DP[i+1][j], DP[i][j+1]) + 1
                else:
                    DP[i+1][j+1] = max(DP[i+1][j], DP[i][j+1])
        # print (DP)
    
        # assemble supersequence
        scs = ""
        x, y = len(s), len(t)
        while x and y:
            # take both if the current chars are different
            if x > 0 and y > 0 and DP[x][y] == DP[x][y-1] and DP[x][y] == DP[x-1][y]:
                scs = s[x-1] + t[y-1] + scs
                x -= 1
                y -= 1
            # take one if they are equal
            if x > 0 and y > 0 and DP[x][y] != DP[x][y-1] and DP[x][y] != DP[x-1][y]:
                scs = s[x-1] + scs
                x -= 1
                y -= 1
            # take one from t if that brings us to the next common char
            if y > 0 and DP[x][y] == DP[x][y-1] and DP[x][y] != DP[x-1][y]:
                scs = t[y-1] + scs
                y -= 1
            # take one from s if that brings us to the next common char
            if x > 0 and DP[x][y] == DP[x-1][y] and DP[x][y] != DP[x][y-1]:
                scs = s[x-1] + scs
                x -= 1
    
        # take the remaining chars from s once t is exhausted
        if x > 0:
            scs = s[0:x] + scs
        # take the remaining chars from t once s is exhausted
        if y > 0:
            scs = t[0:y] + scs
    
        return scs
    
    if __name__ == "__main__":
    
        s = "photograph"
        t = "tomography"
    
        u = longest_common_subsequence ( s, t )
        print ("Longest common subsequence of '" + s + "' and '" + t + "' is '" + u + "' (length: " + str(len(u)) + ")" )
    
        u = shortest_common_supersequence ( s, t )
        print ("Shortest common supersequence of '" + s + "' and '" + t + "' is '" + u + "' (length: " + str(len(u)) + ")" )
        # photomography would be another valid SCS
    
    
  2. The obvious case is T = P, where d = |SCS| – |LCS| = 0. Otherwise, the sequences T and P differ by the set of letters D = SCS \ LCS. To edit from P to T (or vice versa) for each letter of D exactly one of the operations insert or delete has to be applied. The edit distance is hence d = |D| = |SCS| – |LCS|.

Problem 8-5

Let \(P_1 ,P_2, \ldots, P_n\) be n programs to be stored on a disk with capacity D megabytes. Program \(P_i\) requires \(s_i\) megabytes of storage. We cannot store them all because \(D < \sum_{i=1}^n s_i\)

  1. Does a greedy algorithm that selects programs in order of nondecreasing \(s_i\) maximize the number of programs held on the disk? Prove or give a counter-example.
  2. Does a greedy algorithm that selects programs in order of nonincreasing order \(s_i\) use as much of the capacity of the disk as possible? Prove or give a counter-example.

Solution

  1. Yes, it does. Assume \(D_m = \sum_{i=1}^m s_i\) with \(s_1 \le s_2 \le \ldots \le s_m\), then replacing any of these with a \(s_j, j \in [m+1, n]\) leads to a larger \(D\).
  2. No. Consider, e.g., \(D = 9\) and \(s = \{2,3,4,8\}\). The algorithm would choose {8} and terminate but {2,3,4} fills the entire disk.

Problem 8-6

Coins in the United States are minted with denominations of 1, 5, 10, 25, and 50 cents. Now consider a country whose coins are minted with denominations of \(\{d_1, \ldots, d_k \}\) units. We seek an algorithm to make change of n units using the minimum number of coins for this country.
  1. The greedy algorithm repeatedly selects the biggest coin no bigger than the amount to be changed and repeats until it is zero. Show that the greedy algorithm does not always use the minimum number of coins in a country whose denominations are {1,6,10}.
  2. Give an efficient algorithm that correctly determines the minimum number of coins needed to make change of n units using denominations \(\{d_1, \ldots, d_k \}\). Analyze its running time.

Solution

  1. Consider, e.g., \(D = 12\). The greedy algorithm will select \(s = \{10,1,1\}\) but a better set is \(s = \{6,6\}\).
  2. No solution here but you can find one on panictank.

No solution to Problem 8-7

Problem 8-8

In the single-processor scheduling problem, we are given a set of n jobs J. Each job i has a processing time ti, and a deadline di. A feasible schedule is a permutation of the jobs such that when the jobs are performed in that order, every job is finished before its deadline. The greedy algorithm for single-processor scheduling selects the job with the earliest deadline first. Show that if a feasible schedule exists, then the schedule produced by this greedy algorithm is feasible.

Solution

I am not completely sure that I get the question right because it seems rather obvious that if such a schedule exists and the jobs are executed in order of their deadline, then that will be a feasible solution (and maybe the only one).
The formalize, it the proof by induction concept might be the right choice

  1. Start: \(i = 1\): if \(t_1 < d_1\text{ feasible}\)
  2. Assume: \(i = k, \,\, \forall j \in [1,k-1] \text{ it holds } \sum_{l=1}^k t_l \le d_l\)
  3. Then: For \(i = k+1, \,\, \sum_{l=1}^k t_l + t_{k+1} \le t_{k+1} \Leftrightarrow t_{k+1} \le d_{k+1} – \sum_{l=1}^k t_l\)
    This can be interpreted as the feasibility condition. If it fulfilled the schedule is valid.

In words: If all jobs up to k finished on time and a feasible solution exists, the greedy algorithm will correctly pick the next job. As I said, maybe I misunderstood the question…

Problem 8-9

The knapsack problem is as follows: given a set of integers \(S = \{s_1, s_2, \ldots, s_n\}\) and a given target number T, find a subset of S that adds up exactly to T. For example, within S = {1,2,5,9,10} there is a subset that adds up to T = 22 but not T = 23. Give a correct programming algorithm for knapsack that runs in O(nT) time.

Solution

The code is written as backtracking solution, where pruning and dynamic programming can be enabled as optimizations. Also, it constructs all possible subsets that fill the knapsack or just stop once a solution is found. The respective variables to set/unset are generate_all_solutions{true/false} and useDP{true/false}.

The dynamic programming matrix in this case is a table. Every time a set with some total of t < T is constructed, it is stored in the table. Then every time a number is added to the knapsack, we check if there is a known combination for the remainder computed already, thus avoiding to compute a feasible set for any intermediate knapsack total smaller than T twice.

The DP algorithm runs at \(\mathcal{O}(nT)\) at worst. This is the case if all intermediate knapsack sets for totals from 1 to T are really constructed and all of them require testing all n numbers of the set. In average the algorithm will perform far better. If useDP is true, the DP table is printed at the end, and you can see that is not very populated.

#include <iostream>
#include <vector>
#include <chrono>

class Knapsack {
  public:
    Knapsack ( const std::vector<unsigned int>& numbers, const unsigned int T );
    bool is_a_solution () const;
    void process_solution ();
    unsigned int get_total () const;
    void init_dp ();
    bool use_dp () const;
    void backtrack ( );
    unsigned int num_solutions () const;
    unsigned int num_skips () const;
    void print_dp () const;

  private:
    const unsigned int target;
    const std::vector<unsigned int> set;
    const std::vector<unsigned int>::size_type set_size;
    std::vector<bool> in_sack;
    unsigned int solutions_found;
    unsigned int skips;
    const bool generate_all_solutions;
    const bool useDP;
    std::vector<std::vector<unsigned int>> DP_sack;
    std::vector<std::vector<bool>> DP_in_sack;
    std::vector<unsigned int> sack;
};

Knapsack::Knapsack ( const std::vector<unsigned int>& numbers, const unsigned int T ) : 
  target{T}, set{numbers}, set_size{numbers.size()}, solutions_found{0}, skips{0}, 
  generate_all_solutions{true}, // change here to construct one/all solutions
  useDP{true}                   // change here to use dynamic programming
    {
    in_sack.resize(numbers.size(), false);
}

bool Knapsack::use_dp () const {
    return useDP;
}

void Knapsack::init_dp () {
    for ( unsigned int i = 0; i != target; ++i ) {
        DP_sack.resize(target+1);
        DP_in_sack.resize(target+1);
    }
}

void Knapsack::print_dp () const {
    for ( unsigned int n = 0; n != DP_sack.size(); ++n ) {
        if ( DP_sack[n].size() == 0 ) continue;
        std::cout << n << ": ";
        for ( auto& m : DP_sack[n] )
            std::cout << m << " ";
        std::cout << std::endl;
    }
}

unsigned int Knapsack::get_total () const {
    unsigned int sum {0};
    for ( auto& i : sack )
        sum += i;
    return sum;
}

bool Knapsack::is_a_solution ( ) const {
    return get_total() == target;
}

unsigned int Knapsack::num_solutions () const {
    return solutions_found;
}

// check how often we pruned
unsigned int Knapsack::num_skips () const {
    return skips;
}

// processing = print it + count it
void Knapsack::process_solution ( ) {
    ++solutions_found;
    printf ( "Solution #%4u: ", solutions_found );
    for ( auto& i : sack )
        printf ("%u ", i);
    printf ( "\n" );
}

void Knapsack::backtrack ( ) {

    if ( generate_all_solutions == false && solutions_found > 0 ) return;

    if ( is_a_solution () ) {
        process_solution ();
    } else {
        // >>> dynamic programming part I
        if ( useDP ) {
            unsigned int total = get_total();
            if ( DP_sack[total].size() == 0 ) {
                DP_sack[total].insert( end(DP_sack[total]), begin(sack), end(sack) );
                DP_in_sack[total] = in_sack;
            }
        }
        // <<<

        for ( unsigned int i = 0; i != set_size; ++i ) {
            if ( !in_sack[i] ) {
                // >>> comment this to disable pruning
                if ( get_total () + set[i] > target ) {
                    ++skips;
                    continue;
                } 
                // <<<

                in_sack[i] = true;
                sack.push_back (set[i]);

                // >>> dynamic programming part II
                if ( useDP && (get_total() <= target) ) {
                    const unsigned int remainder = target - get_total ();
                    if ( DP_sack[remainder].size() > 0 ) {
                        sack.insert(end(sack), begin(DP_sack[remainder]), end(DP_sack[remainder]));
                        in_sack    = DP_in_sack[remainder];
                        in_sack[i] = true;
                    }
                }
                // <<<

                backtrack ();
                in_sack[i] = false;
                sack.pop_back ();
            }
        }
    }
}

int main () {

    const std::vector<unsigned int> numbers {1,2,5,9,10,21,33,57,98,111,1001};
    const unsigned int T {1111};
    Knapsack ks ( numbers, T );

    auto start = std::chrono::steady_clock::now();
    if ( ks.use_dp() )
        ks.init_dp ();
    ks.backtrack ( );
    auto diff = std::chrono::steady_clock::now() - start;

    if ( ks.use_dp() ) ks.print_dp ();
    std::cout << "Solution time:  " << std::chrono::duration <double, std::milli> (diff).count() << " ms" << std::endl;
    if ( ks.num_solutions () == 0 )
        printf ("No solution found!\n");
    printf ("We pruned %u times!\n", ks.num_skips());

    return 0;
}

For the set {1,2,5,9,10,21,33,57,98,111,1001} and the target 1111 the execution times for constructing all 864 possible knapsack sets on my machine are:

Basic backtracking: 66513 ms
Backtracking with pruning: 9368 ms
DP: 234 ms

Problem 8-10

The integer partition takes a set of positive integers \(S=s_1, \ldots, s_n\) and asks if there is a subset \(I \in S\) such that \(\sum_{i \in I} s_i= \sum_{ i \notin I} s_i\). Let \(\sum_{i \in S} s_i = M\). Give an O(nM) dynamic programming algorithm to solve the integer partition problem.

Solution

This can be solved with the code of Problem 8-9 setting T = M/2.

No solution to Problems 8-11 to 8-13
If you have one, I am happy to post it and credit it to you.

Problem 8-14

The traditional world chess championship is a match of 24 games. The current champion retains the title in case the match is a tie. Each game ends in a win, loss, or draw (tie) where wins count as 1, losses as 0, and draws as 1/2. The players take turns playing white and black. White has an advantage, because he moves first. The champion plays white in the first game. He has probabilities ww, wd, and wl of winning, drawing, and losing playing white, and has probabilities bw, bd, and bl of winning, drawing, and losing playing black.

  1. Write a recurrence for the probability that the champion retains the title. Assume that there are g games left to play in the match and that the champion needs to win i games (which may end in a 1/2).
  2. Based on your recurrence, give a dynamic programming to calculate the champion’s probability of retaining the title.
  3. Analyze its running time for an n game match.

Solution

  1. This is achieved by an exhaustive search over all possible game series writing a trackback program like the one in Problem 8-9. The candidate set is composed from the possible outcomes {w,d,l} with multiple inclusion (i.e. without cutting the candidates). A solution is found if the length of the outcome vector \(v_n\) has length g. There are \(N = 3^g\) such outcome vectors (permutations).The process_solution step consists of computing the number of points won by the defending champion given the current outcome vector \(v_n\). In this step we have to take into account whether the champion plays white or black, which can be determined from the number of games remaining r by r % 2.We count for how many outcome vectors the defending champion wins and divide by \(N\). This gives the requested probability.
  2. The DP matrix maintains the points won at intermediate stages of a match, i.e., the sum of points for vectors \(v_k\) with \(k < g\).
  3. Once we consider game k, all required numbers of points won in games 1 to k-1 are stored in the DP matrix. At game k there are hence \(3^{k-1}\) lookups and \(N = 3 \times 3^{k-1}\) multiplications and DP writes to be performed. The runtime should thus be \(\mathcal{O}(3^{g+1})\).

Problem 8-24

Given a set of coin denominators, find the minimum number of coins to make a certain amount of change.

Solution

A Python 3 solution using the denominations of Euro coins.

coins  = [1,2,5,10,20,50,100,200]
target = 11111
DP = [[] for i in range(target+1)]

def construct_hierachically ( i ):
    shortest = []
    
    for n,c in reversed(list(enumerate(coins))):
        # one coin suffices
        if i - c == 0:
            DP[i].append(c)
            return
        # find the minimal set
        elif i - c > 0:
            set = 
            if len(DP[i-c]) == 0:
                construct_hierachically ( i-c )
            set.extend ( DP[i-c] )
            if len(shortest) == 0:
                shortest = set
            else:
                if len(set) < len(shortest):
                    shortest = set
    DP[i] = shortest

if __name__ == "__main__":
    construct_hierachically ( target )
    print (DP[target])

Please leave comments if you find mistakes or improvements.;; This buffer is for notes you don’t want to save, and for Lisp evaluation.
;; If you want to create a file, visit that file with C-x C-f,
;; then enter the text in that file’s own buffer.

From Skiena’s Algorithm Design Manual
(no guarantee that any solution is good or even correct)

Problem 8-1

Typists often make transposition errors exchanging neighboring characters, such as typing setve when you mean steve. This requires two substitutions to fix under the conventional definition of edit distance. Incorporate a swap operation into our edit distance function, so that such neighboring transposition errors can be fixed at the cost of one operation.

Solution
(no guarantee that the solution is good or even correct)

I kept the somewhat quirky requirement that all strings start in position 1 (instead of 0). Be sure to add a blank at the front if you test your own string literals. Note that I changed the meaning of “S” in the edit sequence to SWAP (I suppose it indicated SUBSTITUTE originally). Substitutions are now indicated using R for REPLACE.

Compile with gcc -std=c99 and please leave comments if you find a mistake or an improvement.

#include <stdio.h>
#include <string.h>

#define MATCH       0 /* enumerated type symbol for match */
#define INSERT      1 /* enumerated type symbol for insert */
#define DELETE      2 /* enumerated type symbol for delete */
#define SWAP        3 /* enumerated type symbol for swap */

typedef struct {
    int cost;     /* cost of reaching this cell */
    int parent;   /* parent cell */
} cell;

/* EXAMPLE 1
#define MAXLEN 14
cell m[MAXLEN+1][MAXLEN+1];
char s[] = " thou shalt not";
char t[] = " you should not";
*/

/* EXAMPLE 2
#define MAXLEN 5
cell m[MAXLEN+1][MAXLEN+1];
char s[] = " setev";
char t[] = " steve";
*/

/* EXAMPLE 3 */
#define MAXLEN 35
cell m[MAXLEN+1][MAXLEN+1];
char s[] = " Heigth si on of my favurite tapas! ";
char t[] = " Height is one of my favorite typos!";

/*********** initializations ***********/
void row_init (int i) {
    m[0][i].cost = i;
    if (i&gt;0)
        m[0][i].parent =  INSERT;
    else
        m[0][i].parent = -1;
}

void column_init (int i) {
    m[i][0].cost = i;
    if (i>0)
        m[i][0].parent = DELETE;
    else
        m[i][0].parent = -1;
}

/*********** edit distance ***********/
void goal_cell(char *s, char *t, int *i, int *j) {
    *i = strlen(s) - 1;
    *j = strlen(t) - 1;
}

int match ( char s, char t ) {
    if ( s==t )
        return 0;
    else
        return 1;
}

int indel ( char c ) {
    return 1;
}

int swap ( char *s, char *t, int *i, int *j ) {
    /* check for end of string */
    if ( !( (*i < MAXLEN) && (*j < MAXLEN) ))
        return 10;
    /* Swap if the next two chars of s and t
      are cross-wise identical but not if they are
      pairwise the same - in this case we want
      to match */
    if ( s[*i] == t[(*j)+1] && s[(*i)+1] == t[*j] && (s[*i] != s[(*i)+1]) && (*i == *j) ) {
        return -1; /* -1 will beat all other 1-to-1 edits */
    }
    else
        return 10;
}

int string_edit_distance (char *s, char *t)
{
    int i, j;
    int opt[4];             /* cost of the options */
    for (unsigned int i=0; i < MAXLEN; i++) {
        row_init(i);
        column_init(i);
    }

    for (i=1; i < (int) strlen(s); i++) {
        for (j=1; j < (int) strlen(t); j++) {
            opt[MATCH]  = m[i-1][j-1].cost + match(s[i],t[j]);
            opt[INSERT] = m[i][j-1].cost + indel(t[j]);
            opt[DELETE] = m[i-1][j].cost + indel(s[i]);
            opt[SWAP]   = m[i-1][j-1].cost + swap(s,t,&i,&j);
            m[i][j].cost = opt[MATCH];
            m[i][j].parent = MATCH;
            for (int k=INSERT; k<=SWAP; k++)
                if (opt[k] < m[i][j].cost) {
                    m[i][j].cost = opt[k];
                    m[i][j].parent = k;
                    if ( k == SWAP ) {
                        /* correct for the -1 added before */
                        m[i][j].cost += 1;
                        m[i][j].parent = k;
                        ++i, ++j;
                        m[i][j].cost = m[i-1][j-1].cost + 1;
                        m[i][j].parent = k;
                    }
                }
        }
    }
    goal_cell(s,t,&i,&j);
    return( m[i][j].cost );
}

/*********** reconstruction ***********/
void insert_out(char *t, int j) {
    printf("I");
}

void delete_out(char *s, int i) {
    printf("D");
}

void match_out(char *s, char *t, int i, int j) {
    if (s[i]==t[j]) printf("M");
    else printf("R");
}

void swap_out(char *s, char *t, int i, int j) {
    printf("S");
}

void reconstruct_path(char *s, char *t, int i, int j) {
    if (m[i][j].parent == -1) return;
    if (m[i][j].parent == MATCH) {
        reconstruct_path(s,t,i-1,j-1);
        match_out(s, t, i, j);
        return;
    }
    if (m[i][j].parent == INSERT) {
        reconstruct_path(s,t,i,j-1);
        insert_out(t,j);
        return;
    }
    if (m[i][j].parent == DELETE) {
        reconstruct_path(s,t,i-1,j);
        delete_out(s,i);
        return;
    }
    if (m[i][j].parent == SWAP) {
        reconstruct_path(s,t,i-2,j-2);
        swap_out(s,t,i,j);
        return;
    }
}

/*********** control ***********/
int main () {

    printf ("Edit distance of \"%s\" and \"%s\" is %i\n", s, t, string_edit_distance (s,t) );
    reconstruct_path (s,t,MAXLEN,MAXLEN);
    printf ("\n");

    return 0;
}

Output of example 1:

Edit distance of " thou shalt not" and " you should not" is 5
DRMMMMMIRMRMMMM

Output of example 2:

Edit distance of " setev" and " steve" is 2
MSS

Output of example 3:

Edit distance of " Heigth si on of my favurite tapas! " and " Height is one of my favorite typos!" is 7
MMMMSMSMMMIMMMMMMMMMMRMMMMMMRMRMMD

Problem 8-2

Suppose you are given three strings of characters: X, Y , and Z, where |X| = n, |Y| = m, and |Z| = n+m. Z is said to be a shuffle of X and Y iff Z can be formed by interleaving the characters from X and Y in a way that maintains the left-to-right ordering of the characters from each string.

  1. Show that cchocohilaptes is a shuffle of chocolate and chips, but chocochilatspe is not.
  2. Give an efficient dynamic-programming algorithm that determines whether Z is a shuffle of X and Y. Hint: the values of the dynamic programming matrix you construct should be Boolean, not numeric.

Solution
(no guarantee that the solution is good or even correct)

  1. This one is easy to see if you start from the end. The ‘s’ cannot occur before the ‘p’
  2. Give an efficient dynamic-programming algorithm that determines whether Z is a shuffle of X and Y . Hint: the values of the dynamic programming matrix you construct should be Boolean, not numeric.Compile with -std=c++11 option and please leave comments if you find a mistake or an improvement.
#include <iostream>
#include <vector>

class Shuffle {
public: 
	Shuffle(const std::string& a, const std::string& b, const std::string& c);
    void is_shuffle ( const unsigned int i, const unsigned int j );
    void init ();
    void printXYZ () const;
    void printDPmat () const;
    bool get ( const unsigned int i, const unsigned int j ) const;

private:
    const std::string x;
    const std::string y;
    const std::string z;
    const std::string::size_type n, m;
    std::vector<std::vector<bool>> DPmat;
};

Shuffle::Shuffle(const std::string& a, const std::string& b, const std::string& c) : x{a}, y{b}, z{c}, n{x.size()}, m{y.size()} {
    DPmat.resize(n+1);
    for ( unsigned int i = 0 ; i <= n; i++ )
        DPmat[i].resize(m+1);
}

void Shuffle::printXYZ () const {
    std::cout << x << std::endl << y << std::endl << z << std::endl;
}

void Shuffle::printDPmat () const {
    for ( unsigned int i = 0; i <= n; ++i ) {
        for ( unsigned int j = 0; j <= m; ++j ) 
            std::cout << DPmat[i][j] << " ";
        std::cout << std::endl;
    }
}

void Shuffle::is_shuffle ( const unsigned int i, const unsigned int j ) {
    DPmat[i][j] = ((x[i-1] == z[i+j-1]) && DPmat[i-1][j]) || ((y[j-1] == z[i+j-1]) && DPmat[i][j-1] );
//-SMS     printf ( "x[%i]: %c y[%i]: %c z[%i]: %c\n", i-1, x[i-1], j-1, y[j-1], i+j-1, z[i+j-1] );
//-SMS     printDPmat();
}

bool Shuffle::get ( const unsigned int i, const unsigned int j ) const {
    return DPmat[i][j];
}

void Shuffle::init () {
    DPmat[0][0] = true;

    for ( std::string::size_type i = 0; i <= n; ++i )
    for ( std::string::size_type j = 0; j <= m; ++j )
        DPmat[i][j] = false;

    DPmat[0][0] = true;

    for ( std::string::size_type i = 1; i <= n; ++i )
        if ( !x.compare(0,i,z,0,i) )
            DPmat[i][0] = true;
            
    for ( std::string::size_type j = 1; j <= m; ++j )
        if ( !y.compare(0,j,z,0,j) )
            DPmat[0][j] = true;
}

int main () {
    const std::string X {"chocolate"};
    const std::string Y {"chips"};
    const std::string Z {"cchocohilaptes"};
//-SMS     const std::string Z {"chocochilatspe"};
    Shuffle shuffle(X, Y, Z);
    shuffle.printXYZ();

    shuffle.init();

    for ( unsigned int i = 1; i <= X.size(); ++i ) {
        for ( unsigned int j = 1; j <= Y.size(); ++j ) {
            if ( !shuffle.get(i-1,j) && !shuffle.get(i,j-1) )
                continue;
            shuffle.is_shuffle ( i, j );
        }
    }

    shuffle.printDPmat();

    if ( shuffle.get (X.size(), Y.size() ) )
        std::cout << Z << " is a shuffle of " << X << " and " << Y << std::endl;
    else
        std::cout << Z << " is NOT a shuffle of " << X << " and " << Y << std::endl;
    return 0;
}

Output:

chocolate
chips
cchocohilaptes
1 1 0 0 0 0 
1 1 1 0 0 0 
0 1 0 0 0 0 
0 1 0 0 0 0 
0 1 0 0 0 0 
0 1 1 1 0 0 
0 0 0 1 0 0 
0 0 0 1 1 0 
0 0 0 0 1 0 
0 0 0 0 1 1 
cchocohilaptes is a shuffle of chocolate and chips

Problem 8-3

The longest common substring (not subsequence) of two strings X and Y is the longest string that appears as a run of consecutive letters in both strings. For example, the longest common substring of photograph and tomography is ograph.
  1. Let n = |X| and m = |Y |. Give a Θ(nm) dynamic programming algorithm for longest common substring based on the longest common subsequence/edit distance algorithm.
  2. Give a simpler Θ(nm) algorithm that does not rely on dynamic programming.

Solution
(no guarantee that the solution is good or even correct)

  1. This one is based on Solution 8-2. Compile with -std=c++11 option.
    #include <iostream>
    #include <vector>
    #include <string>
    
    class LongestSubstring {
    public:
    	LongestSubstring(const std::string& a, const std::string& b);
        void find_longest_substring ();
        void init ();
        void printDPmat () const;
        std::string getZ () const;
        unsigned int getMaxlen () const;
    
    private:
        const std::string x;
        const std::string y;
        std::string z;
        const std::string::size_type n, m;
        std::vector<std::vector<bool>> DPmat;
        unsigned int maxlen;
    };
    
    LongestSubstring::LongestSubstring(const std::string& a, const std::string& b) : x{a}, y{b}, n{x.size()}, m{y.size()}, maxlen{0} {
        DPmat.resize(n);
        for ( unsigned int i = 0 ; i <= n; i++ )
            DPmat[i].resize(m);
    }
    
    void LongestSubstring::printDPmat () const {
        for ( unsigned int i = 0; i != n; ++i ) {
            for ( unsigned int j = 0; j != m; ++j )
                std::cout << DPmat[i][j] << " ";
            std::cout << std::endl;
        }
    }
    
    // the longest diagonal sequence of true values in the DP matrix
    // constitutes the longest substring
    void LongestSubstring::find_longest_substring () {
        for ( unsigned int i = 0; i != n; ++i ) {
            for ( unsigned int j = 0; j != m; ++j ) {
                unsigned int k = i, l = j, len = 0;
                std::string tmp = "";
                while ( DPmat[k][l] == true ) {
                    tmp += x[k];
                    ++k, ++l, ++len;
                }
                if ( len > maxlen ) {
                    maxlen = len;
                    z = tmp;
                }
            }
        }
    }
    
    std::string LongestSubstring::getZ () const {
        return z;
    }
    
    unsigned int LongestSubstring::getMaxlen () const {
        return maxlen;
    }
    
    // the DP matrix can be constructed entirely during initialization
    void LongestSubstring::init () {
        DPmat[0][0] = true;
    
        for ( std::string::size_type i = 0; i != n; ++i )
        for ( std::string::size_type j = 0; j != m; ++j )
            DPmat[i][j] = x[i] == y[j];
    }
    
    int main () {
        const std::string X {"photograph"};
        const std::string Y {"tomography"};
        LongestSubstring ls(X, Y);
    
        ls.init();
        ls.find_longest_substring ();
    
        ls.printDPmat();
        std::cout << "Longest common substring of " << X << " and " << Y << " is " << ls.getZ() << " (length: " << ls.getMaxlen() << ")" << std::endl;
        return 0;
    }
    

    Output:

    0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 1 0
    0 1 0 1 0 0 0 0 0 0
    1 0 0 0 0 0 0 0 0 0
    0 1 0 1 0 0 0 0 0 0
    0 0 0 0 1 0 0 0 0 0
    0 0 0 0 0 1 0 0 0 0
    0 0 0 0 0 0 1 0 0 0
    0 0 0 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 1 0
    Longest common substring of photograph and tomography is ograph (length: 6)
    
  2. A Python 3 solution
    def longest_substring_b( s, t ):
        u = []
        S = len(s)
        T = len(t)
        maxlen = 0
    
        i = 0
        j = 0
        for i in range(S):
            for j in range(T):
                tmplen = 0
                v = []
                if s[i] == t[j]:
                    n = i
                    m = j
                    while n < S and m < T and s[n] == t[m]:
                        v.append(s[n])
                        n += 1
                        m += 1
                    if ( len(v) > maxlen ):
                        maxlen = len(v)
                        u = v
    
        return ''.join(u)
    
    if __name__ == "__main__":
    
        s = "photograph"
        t = "tomography"
    
        u = longest_substring_b ( s, t )
        print ("Longest substring of '" + s + "' and '" + t + "' is '" + u + "' (length: " + str(len(u)) + ")" )
    

    Output:

    Longest substring of 'photograph' and 'tomography' is 'ograph' (length: 6)

Problem 8-4

The longest common subsequence (LCS) of two sequences T and P is the longest sequence L such that L is a subsequence of both T and P. The shortest common supersequence (SCS) of T and P is the smallest sequence L such that both T and P are a subsequence of L.

  1. Give efficient algorithms to find the LCS and SCS of two given sequences.
  2. Let d(T,P) be the minimum edit distance between T and P when no substitutions are allowed (i.e., the only changes are character insertion and deletion). Prove that d(T, P ) = |SCS(T, P )| − |LCS(T, P )| where |SCS(T, P )| (|LCS(T,P)|) is the size of the shortest SCS (longest LCS) of T and P.

Solution
(no guarantee that the solution is good or even correct)

  1. From how I understand dynamic programming by now there is not THE ONE solution. You can rather build the DP matrix in different ways, which result in different paths across the matrix for obtaining the desired result. Here, is a Python 3 solution.
    def longest_common_subsequence (s, t):
        DP = [[0 for j in range(len(t)+1)] for i in range(len(s)+1)]
    
        # build the DP matrix
        for i, x in enumerate(s):
            for j, y in enumerate(t):
                if x == y:
                    DP[i+1][j+1] = DP[i][j] + 1
                else:
                    DP[i+1][j+1] = max(DP[i+1][j], DP[i][j+1])
        # print(DP)
    
        # assemble subsequence
        lcs = ""
        x, y = len(s), len(t)
        while x and y:
            if DP[x][y] == DP[x-1][y]:
                x -= 1
            elif DP[x][y] == DP[x][y-1]:
                y -= 1
            else:
                lcs = s[x-1] + lcs
                x -= 1
                y -= 1
        return lcs
    
    def shortest_common_supersequence (s, t):
        DP = [[0 for j in range(len(t)+1)] for i in range(len(s)+1)]
    
        # build the DP matrix
        for i, x in enumerate(s):
            for j, y in enumerate(t):
                if x == y:
                    DP[i+1][j+1] = max(DP[i+1][j], DP[i][j+1]) + 1
                else:
                    DP[i+1][j+1] = max(DP[i+1][j], DP[i][j+1])
        # print (DP)
    
        # assemble supersequence
        scs = ""
        x, y = len(s), len(t)
        while x and y:
            # take both if the current chars are different
            if x > 0 and y > 0 and DP[x][y] == DP[x][y-1] and DP[x][y] == DP[x-1][y]:
                scs = s[x-1] + t[y-1] + scs
                x -= 1
                y -= 1
            # take one if they are equal
            if x > 0 and y > 0 and DP[x][y] != DP[x][y-1] and DP[x][y] != DP[x-1][y]:
                scs = s[x-1] + scs
                x -= 1
                y -= 1
            # take one from t if that brings us to the next common char
            if y > 0 and DP[x][y] == DP[x][y-1] and DP[x][y] != DP[x-1][y]:
                scs = t[y-1] + scs
                y -= 1
            # take one from s if that brings us to the next common char
            if x > 0 and DP[x][y] == DP[x-1][y] and DP[x][y] != DP[x][y-1]:
                scs = s[x-1] + scs
                x -= 1
    
        # take the remaining chars from s once t is exhausted
        if x > 0:
            scs = s[0:x] + scs
        # take the remaining chars from t once s is exhausted
        if y > 0:
            scs = t[0:y] + scs
    
        return scs
    
    if __name__ == "__main__":
    
        s = "photograph"
        t = "tomography"
    
        u = longest_common_subsequence ( s, t )
        print ("Longest common subsequence of '" + s + "' and '" + t + "' is '" + u + "' (length: " + str(len(u)) + ")" )
    
        u = shortest_common_supersequence ( s, t )
        print ("Shortest common supersequence of '" + s + "' and '" + t + "' is '" + u + "' (length: " + str(len(u)) + ")" )
        # photomography would be another valid SCS
    
    
  2. The obvious case is T = P, where d = |SCS| – |LCS| = 0. Otherwise, the sequences T and P differ by the set of letters D = SCS \ LCS. To edit from P to T (or vice versa) for each letter of D exactly one of the operations insert or delete has to be applied. The edit distance is hence d = |D| = |SCS| – |LCS|.

Problem 8-5

Let \(P_1 ,P_2, \ldots, P_n\) be n programs to be stored on a disk with capacity D megabytes. Program \(P_i\) requires \(s_i\) megabytes of storage. We cannot store them all because \(D < \sum_{i=1}^n s_i\)

  1. Does a greedy algorithm that selects programs in order of nondecreasing \(s_i\) maximize the number of programs held on the disk? Prove or give a counter-example.
  2. Does a greedy algorithm that selects programs in order of nonincreasing order \(s_i\) use as much of the capacity of the disk as possible? Prove or give a counter-example.

Solution

  1. Yes, it does. Assume \(D_m = \sum_{i=1}^m s_i\) with \(s_1 \le s_2 \le \ldots \le s_m\), then replacing any of these with a \(s_j, j \in [m+1, n]\) leads to a larger \(D\).
  2. No. Consider, e.g., \(D = 9\) and \(s = \{2,3,4,8\}\). The algorithm would choose {8} and terminate but {2,3,4} fills the entire disk.

Problem 8-6

Coins in the United States are minted with denominations of 1, 5, 10, 25, and 50 cents. Now consider a country whose coins are minted with denominations of \(\{d_1, \ldots, d_k \}\) units. We seek an algorithm to make change of n units using the minimum number of coins for this country.
  1. The greedy algorithm repeatedly selects the biggest coin no bigger than the amount to be changed and repeats until it is zero. Show that the greedy algorithm does not always use the minimum number of coins in a country whose denominations are {1,6,10}.
  2. Give an efficient algorithm that correctly determines the minimum number of coins needed to make change of n units using denominations \(\{d_1, \ldots, d_k \}\). Analyze its running time.

Solution

  1. Consider, e.g., \(D = 12\). The greedy algorithm will select \(s = \{10,1,1\}\) but a better set is \(s = \{6,6\}\).
  2. No solution here but you can find one on panictank.

No solution to Problem 8-7

Problem 8-8

In the single-processor scheduling problem, we are given a set of n jobs J. Each job i has a processing time ti, and a deadline di. A feasible schedule is a permutation of the jobs such that when the jobs are performed in that order, every job is finished before its deadline. The greedy algorithm for single-processor scheduling selects the job with the earliest deadline first. Show that if a feasible schedule exists, then the schedule produced by this greedy algorithm is feasible.

Solution

I am not completely sure that I get the question right because it seems rather obvious that if such a schedule exists and the jobs are executed in order of their deadline, then that will be a feasible solution (and maybe the only one).
The formalize, it the proof by induction concept might be the right choice

  1. Start: \(i = 1\): if \(t_1 < d_1\text{ feasible}\)
  2. Assume: \(i = k, \,\, \forall j \in [1,k-1] \text{ it holds } \sum_{l=1}^k t_l \le d_l\)
  3. Then: For \(i = k+1, \,\, \sum_{l=1}^k t_l + t_{k+1} \le t_{k+1} \Leftrightarrow t_{k+1} \le d_{k+1} – \sum_{l=1}^k t_l\)
    This can be interpreted as the feasibility condition. If it fulfilled the schedule is valid.

In words: If all jobs up to k finished on time and a feasible solution exists, the greedy algorithm will correctly pick the next job. As I said, maybe I misunderstood the question…

Problem 8-9

The knapsack problem is as follows: given a set of integers \(S = \{s_1, s_2, \ldots, s_n\}\) and a given target number T, find a subset of S that adds up exactly to T. For example, within S = {1,2,5,9,10} there is a subset that adds up to T = 22 but not T = 23. Give a correct programming algorithm for knapsack that runs in O(nT) time.

Solution

The code is written as backtracking solution, where pruning and dynamic programming can be enabled as optimizations. Also, it constructs all possible subsets that fill the knapsack or just stop once a solution is found. The respective variables to set/unset are generate_all_solutions{true/false} and useDP{true/false}.

The dynamic programming matrix in this case is a table. Every time a set with some total of t < T is constructed, it is stored in the table. Then every time a number is added to the knapsack, we check if there is a known combination for the remainder computed already, thus avoiding to compute a feasible set for any intermediate knapsack total smaller than T twice. The DP algorithm runs at \(\mathcal{O}(nT)\) at worst. This is the case if all intermediate knapsack sets for totals from 1 to T are really constructed and all of them require testing all n numbers of the set. In average the algorithm will perform far better. If useDP is true, the DP table is printed at the end, and you can see that is not very populated. [code lang="cpp"] #include <iostream> #include <vector> #include <chrono> class Knapsack { public: Knapsack ( const std::vector<unsigned int>& numbers, const unsigned int T ); bool is_a_solution () const; void process_solution (); unsigned int get_total () const; void init_dp (); bool use_dp () const; void backtrack ( ); unsigned int num_solutions () const; unsigned int num_skips () const; void print_dp () const; private: const unsigned int target; const std::vector<unsigned int> set; const std::vector<unsigned int>::size_type set_size; std::vector<bool> in_sack; unsigned int solutions_found; unsigned int skips; const bool generate_all_solutions; const bool useDP; std::vector<std::vector<unsigned int>> DP_sack; std::vector<std::vector<bool>> DP_in_sack; std::vector<unsigned int> sack; }; Knapsack::Knapsack ( const std::vector<unsigned int>& numbers, const unsigned int T ) : target{T}, set{numbers}, set_size{numbers.size()}, solutions_found{0}, skips{0}, generate_all_solutions{true}, // change here to construct one/all solutions useDP{true} // change here to use dynamic programming { in_sack.resize(numbers.size(), false); } bool Knapsack::use_dp () const { return useDP; } void Knapsack::init_dp () { for ( unsigned int i = 0; i != target; ++i ) { DP_sack.resize(target+1); DP_in_sack.resize(target+1); } } void Knapsack::print_dp () const { for ( unsigned int n = 0; n != DP_sack.size(); ++n ) { if ( DP_sack[n].size() == 0 ) continue; std::cout << n << ": "; for ( auto& m : DP_sack[n] ) std::cout << m << " "; std::cout << std::endl; } } unsigned int Knapsack::get_total () const { unsigned int sum {0}; for ( auto& i : sack ) sum += i; return sum; } bool Knapsack::is_a_solution ( ) const { return get_total() == target; } unsigned int Knapsack::num_solutions () const { return solutions_found; } // check how often we pruned unsigned int Knapsack::num_skips () const { return skips; } // processing = print it + count it void Knapsack::process_solution ( ) { ++solutions_found; printf ( "Solution #%4u: ", solutions_found ); for ( auto& i : sack ) printf ("%u ", i); printf ( "\n" ); } void Knapsack::backtrack ( ) { if ( generate_all_solutions == false && solutions_found > 0 ) return; if ( is_a_solution () ) { process_solution (); } else { // >>> dynamic programming part I if ( useDP ) { unsigned int total = get_total(); if ( DP_sack[total].size() == 0 ) { DP_sack[total].insert( end(DP_sack[total]), begin(sack), end(sack) ); DP_in_sack[total] = in_sack; } } // <<< for ( unsigned int i = 0; i != set_size; ++i ) { if ( !in_sack[i] ) { // >>> comment this to disable pruning if ( get_total () + set[i] > target ) { ++skips; continue; } // <<< in_sack[i] = true; sack.push_back (set[i]); // >>> dynamic programming part II if ( useDP && (get_total() <= target) ) { const unsigned int remainder = target - get_total (); if ( DP_sack[remainder].size() > 0 ) { sack.insert(end(sack), begin(DP_sack[remainder]), end(DP_sack[remainder])); in_sack = DP_in_sack[remainder]; in_sack[i] = true; } } // <<< backtrack (); in_sack[i] = false; sack.pop_back (); } } } } int main () { const std::vector<unsigned int> numbers {1,2,5,9,10,21,33,57,98,111,1001}; const unsigned int T {1111}; Knapsack ks ( numbers, T ); auto start = std::chrono::steady_clock::now(); if ( ks.use_dp() ) ks.init_dp (); ks.backtrack ( ); auto diff = std::chrono::steady_clock::now() - start; if ( ks.use_dp() ) ks.print_dp (); std::cout << "Solution time: " << std::chrono::duration <double, std::milli> (diff).count() << " ms" << std::endl; if ( ks.num_solutions () == 0 ) printf ("No solution found!\n"); printf ("We pruned %u times!\n", ks.num_skips()); return 0; } [/code] For the set {1,2,5,9,10,21,33,57,98,111,1001} and the target 1111 the execution times for constructing all 864 possible knapsack sets on my machine are: Basic backtracking: 66513 ms Backtracking with pruning: 9368 ms DP: 234 ms

Problem 8-10

The integer partition takes a set of positive integers \(S=s_1, \ldots, s_n\) and asks if there is a subset \(I \in S\) such that \(\sum_{i \in I} s_i= \sum_{ i \notin I} s_i\). Let \(\sum_{i \in S} s_i = M\). Give an O(nM) dynamic programming algorithm to solve the integer partition problem.

Solution

This can be solved with the code of Problem 8-9 setting T = M/2.

No solution to Problems 8-11 to 8-13
If you have one, I am happy to post it and credit it to you.

Problem 8-14

The traditional world chess championship is a match of 24 games. The current champion retains the title in case the match is a tie. Each game ends in a win, loss, or draw (tie) where wins count as 1, losses as 0, and draws as 1/2. The players take turns playing white and black. White has an advantage, because he moves first. The champion plays white in the first game. He has probabilities ww, wd, and wl of winning, drawing, and losing playing white, and has probabilities bw, bd, and bl of winning, drawing, and losing playing black.

  1. Write a recurrence for the probability that the champion retains the title. Assume that there are g games left to play in the match and that the champion needs to win i games (which may end in a 1/2).
  2. Based on your recurrence, give a dynamic programming to calculate the champion’s probability of retaining the title.
  3. Analyze its running time for an n game match.

Solution

  1. This is achieved by an exhaustive search over all possible game series writing a trackback program like the one in Problem 8-9. The candidate set is composed from the possible outcomes {w,d,l} with multiple inclusion (i.e. without cutting the candidates). A solution is found if the length of the outcome vector \(v_n\) has length g. There are \(N = 3^g\) such outcome vectors (permutations).The process_solution step consists of computing the number of points won by the defending champion given the current outcome vector \(v_n\). In this step we have to take into account whether the champion plays white or black, which can be determined from the number of games remaining r by r % 2.

    We count for how many outcome vectors the defending champion wins and divide by \(N\). This gives the requested probability.

  2. The DP matrix maintains the points won at intermediate stages of a match, i.e., the sum of points for vectors \(v_k\) with \(k < g\).
  3. Once we consider game k, all required numbers of points won in games 1 to k-1 are stored in the DP matrix. At game k there are hence \(3^{k-1}\) lookups and \(N = 3 \times 3^{k-1}\) multiplications and DP writes to be performed. The runtime should thus be \(\mathcal{O}(3^{g+1})\).

Problem 8-24

Given a set of coin denominators, find the minimum number of coins to make a certain amount of change.

Solution

A Python 3 solution using the denominations of Euro coins.

coins  = [1,2,5,10,20,50,100,200]
target = 11111
DP = [[] for i in range(target+1)]

def construct_hierachically ( i ):
    shortest = []
    
    for n,c in reversed(list(enumerate(coins))):
        # one coin suffices
        if i - c == 0:
            DP[i].append(c)
            return
        # find the minimal set
        elif i - c > 0:
            set = 
            if len(DP[i-c]) == 0:
                construct_hierachically ( i-c )
            set.extend ( DP[i-c] )
            if len(shortest) == 0:
                shortest = set
            else:
                if len(set) < len(shortest):
                    shortest = set
    DP[i] = shortest

if __name__ == "__main__":
    construct_hierachically ( target )
    print (DP[target])

Please leave comments if you find mistakes or improvements.