文章列表
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Date;
import java.util.Iterator;
import java.util.List;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import ...
package com.sap.research.semantic;
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import javax.print.attribute.standard.OutputDeviceAssigned;
import cc.mallet.types.Vector;
import cc.mallet.util.CommandOption.Set;
import com.sap.res ...
package com.sap.research.semantic;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap ...
Note that the first version of LongSentenceFilter is not complete, because even after filtering there still may be French sentences of more than 100 words. Now this version tackles this problem. Note also that this version is not optimal from implementational view and a better version will be in next ...
package extractor;
import java.io.File;
import java.io.IOException;
import org.htmlparser.Node;
import org.htmlparser.NodeFilter;
import org.htmlparser.Parser;
import org.htmlparser.filters.TagNameFilter;
import org.htmlparser.nodes.TagNode;
import org.htmlparser.nodes.TextNode;
impor ...
XMLOutputter outputter = new XMLOutputter(Format.getPrettyFormat());
String xmlString = outputter.outputString(document);
System.out.println(xmlString);
This example demonstrate how to convert JDOM Document object to a String using theXMLOutputter.outputString(Document doc) method.
Reference ...
I have been working on Joshua, a toolkit for SMT. Before extracting
grammar from parallel corpus, one necessary step is to eliminate
sentences of more than 100 words. For Hansard, it is common that you
will encounter sentences like that. So one needs to implement a function
to do filtering. H ...