`
ravenex
  • 浏览: 11269 次
  • 性别: Icon_minigender_1
  • 来自: 体育仓库
社区版块
存档分类
最新评论

字符串变异的演示

阅读更多
引用
Programming: String Mutation

Richard Dawkins, in his book The Blind Watchmaker, explains his much-debated experiment known as the weasel program. This algorithm is designed to show that random mutations in DNA can create complex life forms through natural selection, rather than the will of an intelligent designer. His experiment took a sample string of gibberish and repeatedly mutated, or slightly changed, the string as if introducing an error in DNA duplication from one generation to another. The program had a "target" string in mind -- a line from Hamlet, "Methinks it is like a weasel" -- and continued until the random mutations added up such that the original string was the same as this target string. By running his program and finding that any line of gibberish can become Shakespeare through incremental changes, Dawkins suggested that natural selection can account for all the complexity of life.

Naturally, this is a controversial experiment, and an over-simplification of evolution. One criticism is that it is designed to run toward a target string; at every generation, a string's "fitness" to reproduce is determined by its similarity to the target. Could the target not be determined by an intelligent designer? And is an organism's fitness to survive proportional in such small increments to its genetic similarity with a better-adapted mutant? Whatever the answer, Dawkins probably did not anticipate that his experiment was an early example of what has become a common technique in machine learning: genetic algorithms. GAs can apply the same strategy effectively to complicated problems in artifical intelligence: come up with a model, make a lot of "daughter" models that are slightly different, choose the daughter that works a little bit better than the original model, and make her the new mother for the next generation. Lather, rinse, repeat.

For HW2 programming we're going to try our hand at genetic algorithms, using the same string-to-string transformation goal. Specifically, the steps of your program should be:
Take from the user, on the command line, two strings: an input string to start from, and a target string toward which to mutate. (Hint: when testing on CUNIX, you can use quotation marks to pass command line arguments with spaces. See example output.)
Model the input word as a linked list of letters (Strings of length 1), using Java's Linked List class.
Create a large set of mutated "daughter" strings from the input string, each of which is different in a small way:
Insertion. For each of 27 letters (including space), and each of N+1 positions in the input string, insert the letter into the word at the position. Use the add() method of LinkedList that takes an index to quickly insert the letter at the correct position. For a string of N letters, this method will produce 27 * (N+1) daughter strings. For example HELLO becomes AHELLO, BHELLO, etc., until HELLOZ (not counting the space character).
Deletion. For each letter in the input string, delete the letter. Use the remove() method of LinkedList() to do this quickly. For a string of N letters, this method will produce N daughter strings. For example, HELLO becomes ELLO, HLLO, HELO (twice) and HELL.
Choose the daughter string that is closest to the target string. To do this, use an algorithm to find each daughter string's fitness -- a score between 0 and 1 -- and find the daughter string with the highest score.
Fitness should be calculated as the length of the longest common substring of the target string found in the input string, divided by the length of the target string. For example, HPPLE has a score of .8 compared to APPLE, since the longest substring of APPLE found in HPPLE is 4 letters long, which is .8 of APPLE's length. Additionally, penalize the score by 0.1 for each extraneous letter the input string has over the length of the target string. For example. HAPPLE has a score of .9 compared to APPLE, because it is 1 letter longer, and otherwise has the complete substring APPLE. Note that this makes it possible for a candidate word to have a negative score.
Print out the generation number, the winning daughter string and her score.
Repeat the above from step 2, making the winning daughter the new input string, until the daughter has reached the target string. You can use either iteration or recursion to repeat appropriately. Then, terminate the program.
Here are some hints and guidelines for your program:
Let's be a little more formal about object-oriented design in HW2. Set up a class called Word which will represent a single word. The class should:
Translate an input string into a LinkedList in the constructor. It should take a String as an argument in the constructor, have a private LinkedList as a variable, and parse the string into the LinkedList (using the substring() method of String).
Have a method, mutate, which returns an array of all daughter Words to be sired by the target string. That is, each daughter should be a new Word.
Have a method makeCopy() that duplicates itself into an identical Word. Your mutate() function will use this to create many copies of itself and then change each copy.
Have a method toString() that creates a normal string from the LinkedList and returns it. (Hint: use a StringBuffer.)
Have one other class, MutationSim, that takes the input and target strings from the user, creates a Word from the target, mutates it, calculates fitness, and repeats as described above.
Learn about the String methods substring(), toUpperCase() and indexOf().
It'll help if you create a function that returns all the letter of the alphabet in an array over which you can iterate in a loop... something maybe like this:
public static String[] alphabet () {
   String[] alphabet = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", " "};
   return alphabet;
}


Answer the following questions in your README:
How did you design your program? What design choices did you make in setting up your algorithm, other than the ones specified above?
How long does your program take? What do you think is the maximum number of generations needed to reach the target string, i.e., Big-Oh for an input string of I letters, a target string of T letters and an alphabet of L letters? Is this also the minimum generations needed? Why or why not?
How does the total run time relate to you answer to #2? How do you see your program's performance change as you increase the length of the strings?
Do you think the fitness function was a fair model of natural selection? Why or why not? How would you improve it?

Example Output
$ java MutationSim "Hello there" "How are you"
Generation 1: ELLO THERE (score: 0.18181818181818182)
Generation 2: ELLOW THERE (score: 0.2727272727272727)
Generation 3: LLOW THERE (score: 0.2727272727272727)
Generation 4: LLHOW THERE (score: 0.36363636363636365)
Generation 5: LHOW THERE (score: 0.36363636363636365)
Generation 6: LHOW ATHERE (score: 0.45454545454545453)
Generation 7: HOW ATHERE (score: 0.45454545454545453)
Generation 8: HOW ARTHERE (score: 0.5454545454545454)
Generation 9: HOW ARHERE (score: 0.5454545454545454)
Generation 10: HOW ARERE (score: 0.6363636363636364)
Generation 11: HOW ARE RE (score: 0.7272727272727273)
Generation 12: HOW ARE YRE (score: 0.8181818181818182)
Generation 13: HOW ARE YE (score: 0.8181818181818182)
Generation 14: HOW ARE YOE (score: 0.9090909090909091)
Generation 15: HOW ARE YO (score: 0.9090909090909091)
Generation 16: HOW ARE YOU (score: 1.0)


$ java MutationSim "it was a dark and stormy night" "to be or not to be"
Generation 1: IT WAS A DARK AND TORMY NIGHT (score: -0.9333333333333335)
Generation 2: T WAS A DARK AND TORMY NIGHT (score: -0.8333333333333334)
Generation 3:  WAS A DARK AND TORMY NIGHT (score: -0.7333333333333334)
Generation 4: WAS A DARK AND TORMY NIGHT (score: -0.6333333333333334)
Generation 5: AS A DARK AND TORMY NIGHT (score: -0.5333333333333334)
Generation 6: S A DARK AND TORMY NIGHT (score: -0.43333333333333346)
Generation 7:  A DARK AND TORMY NIGHT (score: -0.33333333333333337)
Generation 8: A DARK AND TORMY NIGHT (score: -0.23333333333333336)
Generation 9:  DARK AND TORMY NIGHT (score: -0.1333333333333334)
Generation 10: DARK AND TORMY NIGHT (score: -0.033333333333333354)
Generation 11: ARK AND TORMY NIGHT (score: 0.06666666666666665)
Generation 12: RK AND TORMY NIGHT (score: 0.16666666666666666)
Generation 13: K AND TORMY NIGHT (score: 0.16666666666666666)
Generation 14: K ANDT TORMY NIGHT (score: 0.2222222222222222)
Generation 15:  ANDT TORMY NIGHT (score: 0.2222222222222222)
Generation 16:  ANDOT TORMY NIGHT (score: 0.2777777777777778)
Generation 17:  ANOT TORMY NIGHT (score: 0.3333333333333333)
Generation 18:  NOT TORMY NIGHT (score: 0.3888888888888889)
Generation 19: R NOT TORMY NIGHT (score: 0.4444444444444444)
Generation 20: OR NOT TORMY NIGHT (score: 0.5)
Generation 21: OR NOT TOMY NIGHT (score: 0.5)
Generation 22:  OR NOT TOMY NIGHT (score: 0.5555555555555556)
Generation 23:  OR NOT TOY NIGHT (score: 0.5555555555555556)
Generation 24:  OR NOT TO NIGHT (score: 0.6111111111111112)
Generation 25: E OR NOT TO NIGHT (score: 0.6666666666666666)
Generation 26: BE OR NOT TO NIGHT (score: 0.7222222222222222)
Generation 27: BE OR NOT TO IGHT (score: 0.7222222222222222)
Generation 28:  BE OR NOT TO IGHT (score: 0.7777777777777778)
Generation 29:  BE OR NOT TO GHT (score: 0.7777777777777778)
Generation 30: O BE OR NOT TO GHT (score: 0.8333333333333334)
Generation 31: O BE OR NOT TO HT (score: 0.8333333333333334)
Generation 32: TO BE OR NOT TO HT (score: 0.8888888888888888)
Generation 33: TO BE OR NOT TO T (score: 0.8888888888888888)
Generation 34: TO BE OR NOT TO BT (score: 0.9444444444444444)
Generation 35: TO BE OR NOT TO B (score: 0.9444444444444444)
Generation 36: TO BE OR NOT TO BE (score: 1.0)


$ java MutationSim asafsdfkurweafasfp "methinks it is like a weasel"
Generation 1: ASAFSDFKUR WEAFASFP (score: 0.14285714285714285)
Generation 2: ASAFSDFKURA WEAFASFP (score: 0.17857142857142858)
Generation 3: ASAFSDFKUR A WEAFASFP (score: 0.21428571428571427)
Generation 4: ASAFSDFKURE A WEAFASFP (score: 0.25)
Generation 5: ASAFSDFKURKE A WEAFASFP (score: 0.2857142857142857)
Generation 6: ASAFSDFKURIKE A WEAFASFP (score: 0.32142857142857145)
Generation 7: ASAFSDFKURLIKE A WEAFASFP (score: 0.35714285714285715)
Generation 8: ASAFSDFKUR LIKE A WEAFASFP (score: 0.39285714285714285)
Generation 9: ASAFSDFKURS LIKE A WEAFASFP (score: 0.42857142857142855)
Generation 10: ASAFSDFKURIS LIKE A WEAFASFP (score: 0.4642857142857143)
Generation 11: SAFSDFKURIS LIKE A WEAFASFP (score: 0.4642857142857143)
Generation 12: SAFSDFKUR IS LIKE A WEAFASFP (score: 0.5)
Generation 13: AFSDFKUR IS LIKE A WEAFASFP (score: 0.5)
Generation 14: AFSDFKURT IS LIKE A WEAFASFP (score: 0.5357142857142857)
Generation 15: FSDFKURT IS LIKE A WEAFASFP (score: 0.5357142857142857)
Generation 16: FSDFKURIT IS LIKE A WEAFASFP (score: 0.5714285714285714)
Generation 17: SDFKURIT IS LIKE A WEAFASFP (score: 0.5714285714285714)
Generation 18: SDFKUR IT IS LIKE A WEAFASFP (score: 0.6071428571428571)
Generation 19: DFKUR IT IS LIKE A WEAFASFP (score: 0.6071428571428571)
Generation 20: DFKURS IT IS LIKE A WEAFASFP (score: 0.6428571428571429)
Generation 21: FKURS IT IS LIKE A WEAFASFP (score: 0.6428571428571429)
Generation 22: FKURKS IT IS LIKE A WEAFASFP (score: 0.6785714285714286)
Generation 23: KURKS IT IS LIKE A WEAFASFP (score: 0.6785714285714286)
Generation 24: KURNKS IT IS LIKE A WEAFASFP (score: 0.7142857142857143)
Generation 25: URNKS IT IS LIKE A WEAFASFP (score: 0.7142857142857143)
Generation 26: URINKS IT IS LIKE A WEAFASFP (score: 0.75)
Generation 27: RINKS IT IS LIKE A WEAFASFP (score: 0.75)
Generation 28: RHINKS IT IS LIKE A WEAFASFP (score: 0.7857142857142857)
Generation 29: HINKS IT IS LIKE A WEAFASFP (score: 0.7857142857142857)
Generation 30: THINKS IT IS LIKE A WEAFASFP (score: 0.8214285714285714)
Generation 31: THINKS IT IS LIKE A WEAASFP (score: 0.8214285714285714)
Generation 32: THINKS IT IS LIKE A WEASFP (score: 0.8571428571428571)
Generation 33: ETHINKS IT IS LIKE A WEASFP (score: 0.8928571428571429)
Generation 34: METHINKS IT IS LIKE A WEASFP (score: 0.9285714285714286)
Generation 35: METHINKS IT IS LIKE A WEASP (score: 0.9285714285714286)
Generation 36: METHINKS IT IS LIKE A WEASEP (score: 0.9642857142857143)
Generation 37: METHINKS IT IS LIKE A WEASE (score: 0.9642857142857143)
Generation 38: METHINKS IT IS LIKE A WEASEL (score: 1.0)


MutationSim.java:
/*
 * MutationSim.java, 02/22/2008
 */

public class MutationSim {

    private String input;
    private String target;

    private static String USAGE = "Usage: java MutationSim INPUT_STRING TARGET_STRING"
        + "\nOnly English alphabet and space is allowed in the strings.";

    public static void main(String[] args) {
        MutationSim sim = new MutationSim();
        try {
            sim.processCommandLine(args);
            sim.runSim();
        } catch (Exception e) {
            System.err.println(e.getMessage());
            e.printStackTrace(System.err);
            usage();
        }
    }

    private void processCommandLine(String[] args) throws Exception {
        if (args.length > 2)
            throw new Exception("Too many arguments.");
        if (args.length < 2)
            throw new Exception("Too little arguments.");
        if (isNonAlphabetExist(args[0]) || isNonAlphabetExist(args[1]))
            throw new Exception("Non-English alphabet character in string.");

        this.input = args[0];
        this.target = args[1];
    }

    void runSim() {
        Word word = new Word(this.input);
        Word targetWord = new Word(this.target);
        Word currentWord = null;
        double score = -2.0; // dummy value

        for (int generation = 1; score != 1.0; ++generation) {
            Word[] daughters = word.mutate();

            // iterate through the array,
            // do a linear search for best fit
            double maxFitness = -2.0;
            for (Word w : daughters) {
                double currentFitness = calculateFitness(w, targetWord);

                if (maxFitness <= currentFitness) {
                    currentWord = w;
                    maxFitness = currentFitness;
                }
            }

            printStat(generation, currentWord, maxFitness);
            word = currentWord;
            score = maxFitness;
        }
    }

    static void usage() {
        System.err.println();
        System.err.println(USAGE);
    }

    static void printStat(int generation, Word word, double score) {
        System.out.println("Generation " + generation + ": " + word.toString()
                + " {score: " + score + ")");
    }

    public static double calculateFitness(Word in, Word targ) {
        double ratio = Util.longestCommonSubstring(in.toString(),
                targ.toString()).length()
                / (double) targ.length();
        double penalty =
            (in.length() > targ.length()) ?
                (0.1 * (in.length() - targ.length()))
                : 0;
        return ratio - penalty;
    }
    
    private static boolean isNonAlphabetExist(String s) {
        s = s.toUpperCase();
        String[] alphabet = Util.alphabet();
        
        for (String letter : alphabet) {
            s = s.replaceAll(letter, "");
        }

        return (s.length() != 0);
    }
}


Word.java:
/*
 * Word.java, 02/22/2008
 */

import java.util.ArrayList;
import java.util.LinkedList;

/**
 * 
 */
public class Word {
    private LinkedList<String> list;

    public Word() {
        // list = null;
    }

    public Word(String input) {
        list = new LinkedList<String>();
        
        for (int i = 0; i < input.length(); ++i) {
            String currentLetter = input.substring(i, i+1);
            list.add(currentLetter.toUpperCase());
        }
    }

    public Word[] mutate() {
        String[] alphabet = Util.alphabet();
        int wordLength = this.list.size();
        int alphabetLength = alphabet.length;
        int arraySize = (wordLength + 1) * alphabetLength + wordLength;
        Word[] result = new Word[arraySize];
        int index = 0;

        for (int i = 0; i <= wordLength; ++i) {
            // insertion
            for (int j = 0; j < alphabetLength; ++j) {
                Word copy = this.makeCopy();
                copy.list.add(i, alphabet[j]);
                result[index++] = copy;
            }

            // deletion
            if (i < this.list.size()) {
                Word copy = this.makeCopy();
                copy.list.remove(i);
                result[index++] = copy;
            }
        }

        return result;
    }

    public Word makeCopy() {
        Word copy = new Word();
        copy.list = (LinkedList<String>) this.list.clone();
        return copy;
    }
    
    public int length() {
        return this.list.size();
    }

    @Override
    public String toString() {
        StringBuilder builder = new StringBuilder();
        for (String str : list) {
            builder.append(str);
        }

        return builder.toString();
    }
}


Util.java:
public class Util {
    
    /**
     * Returns the longest common substring of the given two arguments.
     * 
     * @param first the first string
     * @param second the second string
     * @return the longest common substring of the given two arguments
     */
    public static String longestCommonSubstring(String first, String second) {
        // could have used dynamic programming, or generalized suffix tree
        // to solve the LCS problem, but here we'll just stick to simplicity
        
        int start = 0; // The start in a of the longest found so far
        int len = 0; // The length of the longest found so far
        for (int i = 0; i < first.length() - len; ++i) {
            for (int j = first.length(); j > i + len; --j) {
                if (second.indexOf(first.substring(i, j)) != -1) {
                    start = i;
                    len = j - i;
                    break; // Exit the inner loop
                }
            }
        }

        return first.substring(start, start + len);
    }
    
    public static String[] alphabet() {
        String[] alphabet = { "A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
                "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V",
                "W", "X", "Y", "Z", " " };
        return alphabet;
    }
}
分享到:
评论

相关推荐

    MultiFunctionAndroidDemo,Android演示.zip

    3. `res`: 资源目录,包括应用的布局文件(XML)、图片、字符串等资源。 4. `AndroidManifest.xml`: 定义应用的基本配置,如权限、组件等。 5. `build.gradle`: 构建脚本,定义依赖库、版本信息等,用于构建和打包...

    GA_GA_numeralmut_zip_源码.zip

    在遗传算法中,每个个体通常被表示为一个编码串,这个编码串可以是二进制字符串、浮点数向量或更复杂的结构。"numeralmut"暗示这个源码可能专注于处理数字编码的个体,并且包含对这些数字进行变异的函数或类。变异...

    Python库 | mutalyzer_spdi_parser-0.2.0.tar.gz

    1. **读取SPDI格式数据**:能够从文本文件或字符串中读取SPDI格式的变异描述,并将它们转换为Python对象。 2. **验证SPDI格式**:库内置了对SPDI格式的严格检查,确保输入的数据符合标准,避免了因数据错误导致的...

    遗传算法实例.rar

    在这个过程中,解决方案被表示为一组称为“染色体”的字符串,每个字符串代表可能的解。通过模拟自然选择的过程,优秀的解决方案(即适应度高的染色体)被保留并进行重组,生成下一代解决方案,直到找到满意的解或者...

    biopython学习文档

    在详细讲解序列对象时,文档指出Biopython的Seq对象类似于Python的字符串,但具有额外的生物学特性。你可以对序列进行切片、转换成字符串以及合并序列。这些操作对于序列比对、变异检测和序列分析等任务至关重要。 ...

    Bioconductor-tutorial

    - **分割字符串**:使用 `strsplit()` 函数来根据分隔符分割字符串。 - **替换字符串**:使用 `gsub()` 函数来替换字符串中的模式。 #### 四、读取与比对 ##### 4.1 pasilla 数据集 pasilla 数据集是一个示例数据集...

    基于C++实现的改进版遗传算法解决TSP问题源码+项目说明.zip

    - 使用一个不重复的,首尾相同的字符串来表示一个解,该字符串即TSP的顺序 - 使用交叉,变异两种方式产生新的解,并根据数据来计算解的适应度 - 种群定义为当前所有解的一个集合,当种群中的每个个体完成一次“进化...

    数学建模-三个遗传算法matlab程序实例.zip

    1. 初始化种群:这是遗传算法的第一步,需要随机生成一个初始种群,每个个体代表一个可能的解决方案,由一串编码(如二进制字符串)表示。 2. 适应度函数:这是评价个体优劣的标准,通常与问题的目标函数相关。在...

    仿生智能算法 机器学习技术 遗传算法 进化算法及其在数值计算中的应用 优质详细的课件 共34页.rar

    它通过模拟种群的进化过程,以编码的个体(通常为二进制字符串)代表解决方案,并通过选择、交叉和变异操作进行迭代优化。遗传算法适用于解决多目标优化问题、组合优化问题以及参数调优等问题。 **进化算法** 进化...

    变异项目

    7. **自定义函数和库**:在项目中,开发者可能创建了自己的函数库,封装了常用的功能,如日期时间处理、字符串操作或计算逻辑。这些函数可能保存在单独的文件中,以提高代码复用性和可维护性。 8. **文件系统操作**...

    遗传算法GA

    - **编码**: 在GA中,每个潜在的解决方案被称为个体,通常用二进制串或数值向量表示,这个表示个体的字符串就是编码。 - **种群**: 个体集合称为种群,种群中的每个个体代表可能的解。 - **适应度函数**: 用于评估...

    【优化布局】遗传算法求解矩形零件排列优化问题【含Matlab源码 2614期】.zip

    这些编码通常用二进制字符串表示,每个二进制位对应矩形的一个位置或方向。 1. **编码方案**:通常,我们可以将每个矩形的位置和旋转状态编码为一个固定长度的二进制串。例如,如果每个矩形有四个可能的位置(左上...

    遗传算法原理及应用.rar

    2. **编码**:为了适应计算机处理,问题的解被转化为适合遗传操作的数字或字符串形式,这称为编码。 3. **适应度函数**:每个个体都有一个适应度值,它衡量了个体的优劣程度。适应度函数是根据问题的具体目标设定的...

    matlab的关键字.docx

    13. char:创建字符串数组或者将其他类型变量转化为字符串数组。 14. charfcn:maple 函数。 15. class:判别数据类别。 16. clc:清除指令窗中显示内容。 17. clear:从内存中清除变量和函数。 18. clf:清除...

    Genetic-Hello-World

    在这个项目中,"基因"代表了程序中的字符串或字符序列,比如我们常见的"Hello, World!"。每个基因由多个"位"组成,这些位可以是字符,对应于组成字符串的各个字符。在遗传算法中,基因是问题解决方案的表示方式。 ...

    用MATLAB实现遗传算法程序.zip

    它将一个十进制数转换为等效的二进制字符串。 5. `changes.m`: 可能是用来处理编码变化的函数,例如在适应度计算后,将二进制编码转换为对应的解空间值,或者在选择、交叉和变异过程中进行编码的修改。 6. `B2F.m`...

    GA.rar_遗传算法例

    这些个体由基因编码,通常是一个二进制字符串或其他形式的数据结构。算法的核心步骤如下: 1. **初始化种群**:随机生成初始的一组个体,作为第一代种群。 2. **适应度函数**:为每个个体分配一个适应度值,这通常...

    matlab遗传算法工具箱的应用

    3. **二进制与十进制转换**: 在遗传算法中,基因通常表示为二进制字符串,但实际问题中的变量往往需要以十进制形式表示。GAOT提供了这两种数据格式之间的转换功能。 ### 实现步骤 #### 1. 遗传算法主函数 遗传...

    Genetic CNN

    文章中提出了将每个网络结构编码为固定长度的二进制字符串的方法,并用这种编码初始化遗传算法的群体。随后,在每一代中通过选择、变异和交叉等遗传操作来淘汰劣质个体,并生成更具竞争力的个体。个体的竞争能力是...

    matlab函数大全(附有详细解释).docx

    16. `cchar`: 创建字符串数组或将其他类型变量转换为字符串数组。 17. `charfcn`: Maple中的函数,与MATLAB的字符操作相关。 18. `Children`: 获取图形对象的所有子对象,用于图形编程。 19. `clabel`: 在等高线图上...

Global site tag (gtag.js) - Google Analytics