`
m635674608
  • 浏览: 5029002 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

Java获取文件后缀的方式

    博客分类:
  • java
 
阅读更多

Using Java 7

Files.html#probeContentType

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Test {
  public static void main(String[] args) throws IOException {
    Path source = Paths.get("c:/temp/0multipage.tif");
    System.out.println(Files.probeContentType(source));
    // output : image/tiff
  }
}

The default implementation is OS-specific and not very complete. It's possible to register a better detector, like for example Apache Tika, see Transparently improve Java 7 mime-type recognition with Apache Tika.

Using javax.activation.MimetypesFileTypeMap

activation.jar is required, it can be downloaded from http://java.sun.com/products/javabeans/glasgow/jaf.html.

The MimetypesFileMap class is used to map a File to a Mime Type. Mime types supported are defined in a ressource file inside the activation.jar.

import javax.activation.MimetypesFileTypeMap;
import java.io.File;

class GetMimeType {
  public static void main(String args[]) {
    File f = new File("gumby.gif");
    System.out.println("Mime Type of " + f.getName() + " is " +
                         new MimetypesFileTypeMap().getContentType(f));
    // expected output :
    // "Mime Type of gumby.gif is image/gif"
  }
}

The built-in mime-type list is very limited but a mechanism is available to add very easily more Mime Types/extensions.

The MimetypesFileTypeMap looks in various places in the user's system for MIME types file entries. When requests are made to search for MIME types in the MimetypesFileTypeMap, it searches MIME types files in the following order:

  1. Programmatically added entries to the MimetypesFileTypeMap instance.
  2. The file .mime.types in the user's home directory.
  3. The file <java.home>/lib/mime.types.
  4. The file or resources named META-INF/mime.types.
  5. The file or resource named META-INF/mimetypes.default (usually found only in the activation.jar file).

This method is interesting when you need to deal with incoming files with the filenames normalized. The result is very fast because only the extension is used to guess the nature of a given file.

Using java.net.URL

Warning : this method is very slow!.

Like the above method a match is done with the extension. The mapping between the extension and the mime-type is defined in the file [jre_home]\lib\content-types.properties

import java.net.*;

public class FileUtils{
  public static String getMimeType(String fileUrl)
    throws java.io.IOException, MalformedURLException
  {
    String type = null;
    URL u = new URL(fileUrl);
    URLConnection uc = null;
    uc = u.openConnection();
    type = uc.getContentType();
    return type;
  }

  public static void main(String args[]) throws Exception {
    System.out.println(FileUtils.getMimeType("file://c:/temp/test.TXT"));
    // output :  text/plain
  }
}

A note from R. Lovelock :

I was trying to find the best way of getting the mime type of a file
and found your sight very useful. However I have now found a way of
getting the mime type using URLConnection that isn't as slow as the
way you describe.
import java.net.FileNameMap;
import java.net.URLConnection;

public class FileUtils {

  public static String getMimeType(String fileUrl)
      throws java.io.IOException
    {
      FileNameMap fileNameMap = URLConnection.getFileNameMap();
      String type = fileNameMap.getContentTypeFor(fileUrl);

      return type;
    }

    public static void main(String args[]) throws Exception {
      System.out.println(FileUtils.getMimeType("file://c:/temp/test.TXT"));
      // output :  text/plain
    }
  }

Using Apache Tika

Tika is subproject of Lucene, a search engine. It is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

This package is very up-to-date regarding the filetypes supported, Office 2007 formats are supported (docs/pptx/xlsx/etc...).

Apache Tika

Tika has a lot of dependencies ... almost 20 jars ! But it can do a lot more than detecting filetype. For example, you can parse a PDF or DOC to extract the text and the metadata very easily.

import java.io.File;
import java.io.FileInputStream;

import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.ContentHandler;

public class Main {

    public static void main(String args[]) throws Exception {

    FileInputStream is = null;
    try {
      File f = new File("C:/Temp/mime/test.docx");
      is = new FileInputStream(f);

      ContentHandler contenthandler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      metadata.set(Metadata.RESOURCE_NAME_KEY, f.getName());
      Parser parser = new AutoDetectParser();
      // OOXMLParser parser = new OOXMLParser();
      parser.parse(is, contenthandler, metadata);
      System.out.println("Mime: " + metadata.get(Metadata.CONTENT_TYPE));
      System.out.println("Title: " + metadata.get(Metadata.TITLE));
      System.out.println("Author: " + metadata.get(Metadata.AUTHOR));
      System.out.println("content: " + contenthandler.toString());
    }
    catch (Exception e) {
      e.printStackTrace();
    }
    finally {
        if (is != null) is.close();
    }
  }
}

You can download here a ZIP containing the required jars if you want to check it out.

Using JMimeMagic

Checking the file extension is not a very strong way to determine the file type. A more robust solution is possible with the JMimeMagic library. JMimeMagic is a Java library (LGLP licence) that retrieves file and stream mime types by checking magic headers.

// snippet for JMimeMagic lib
//     http://sourceforge.net/projects/jmimemagic/

Magic parser = new Magic() ;
// getMagicMatch accepts Files or byte[],
// which is nice if you want to test streams
MagicMatch match = parser.getMagicMatch(new File("gumby.gif"));
System.out.println(match.getMimeType()) ;

Thanks to Jean-Marc Autexier and sygsix for the tip!

Using mime-util

Another tool is mime-util. This tool can detect using the file extension or the magic header technique.

import eu.medsea.mimeutil.MimeUtil;

public class Main {
    public static void main(String[] args) {
        MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
        File f = new File ("c:/temp/mime/test.doc");
        Collection<?> mimeTypes = MimeUtil.getMimeTypes(f);
        System.out.println(mimeTypes);
        //  output : application/msword
    }
}

The nice thing about mime-util is that it is very lightweight. Only 1 dependency with slf4j

Using Droid

DROID (Digital Record Object Identification) is a software tool to perform automated batch identification of file formats.

DROID uses internal and external signatures to identify and report the specific file format versions of digital files. These signatures are stored in an XML signature file, generated from information recorded in the PRONOM technical registry. New and updated signatures are regularly added to PRONOM, and DROID can be configured to automatically download updated signature files from the PRONOM website via web services.

It can be invoked from two interfaces,  a Java Swing GUI or a command line interface.

http://droid.sourceforge.net/wiki/index.php/Introduction

Aperture framework

Aperture is an open source library and framework for crawling and indexing information sources such as file systems, websites and mail boxes.

The Aperture code consists of a number of related but independently usable parts:

  • Crawling of information sources: file systems, websites, mail boxes
  • MIME type identification
  • Full-text and metadata extraction of various file formats
  • Opening of crawled resources

For each of these parts, a set of APIs has been developed and a number of implementations is provided.

Other Method

ONE:

FileDataSource fds = new FileDataSource(new File("c:/some/path/file.xlsx"));
System.out.println("Content-Type is: "+fds.getContentType());

 

TWO:

import java.net.FileNameMap;
import java.net.URLConnection;

public class FileUtils {

  public static String getMimeType(String fileUrl)
      throws java.io.IOException
    {
      FileNameMap fileNameMap = URLConnection.getFileNameMap();
      String type = fileNameMap.getContentTypeFor(fileUrl);

      return type;
    }

    public static void main(String args[]) throws Exception {
      System.out.println(FileUtils.getMimeType("file://c:/temp/test.TXT"));
      // output :  text/plain
    }
  }

 

THREE:

Apache Tika 1.3 offers tika-core (http://search.maven.org/#artif...|org.apache.tika|tika-core|1.3|bundle), which does NOT load any more dependencies.

Minimal code example (with theInputStream and theFileName being the "input"):

    try (InputStream is = theInputStream;
            BufferedInputStream bis = new BufferedInputStream(is);) {
            AutoDetectParser parser = new AutoDetectParser();
    Detector detector = parser.getDetector();
        Metadata md = new Metadata();
        md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
        MediaType mediaType = detector.detect(bis, md);
        return mediaType.toString();
    }

 

http://blog.csdn.net/qiuhan/article/details/12586943

分享到:
评论

相关推荐

    JAVA获取文件后缀名源码

    获取到JAVA的后缀名,集成到Util类,方便以后工作的时候调用!

    正则截取文件名,文件后缀,文件路径

    FileNameUtils.getSuffix : 获取文件后缀 如 C:\A\B\test.txt 返回: txt /home/usr/test.txt 返回 txt test.txt 返回: txt FileNameUtils.getFilename: 获取文件名 如 C:\A\B\test.txt 返回 test.txt /home...

    Java IO文件后缀名过滤总结

    Java IO文件后缀名过滤总结 Java IO文件后缀名过滤是Java编程中常见的一种文件操作,通过对文件的后缀名进行过滤,可以对特定的文件进行处理和管理。在本篇文章中,我们将详细讲述Java IO文件后缀名过滤的相关知识...

    Java根据文件内容获取文件类型

    Java根据文件内容获取文件类型,防止文件伪造后缀名。

    java-文件工具,可以查看文件类型,文件魔数,可以判断是否是视频文件,音乐文件,图片文件等等

    java文件的工具类,封装了常用的操作,尤其针对文件的实际类型,通过获取文件的byte,来查看文件起始字节的魔数值,通过魔数值来判断文件的类型,工具集合了常用的文件类型对应的魔数,也封装了文件类型的判断方法

    JAVA 根据Url 接口 获取文件名称和类型

    JAVA 根据Url 接口 获取文件名称和类型,亲测可用。输入参数地址即可。

    Java递归获取匹配后缀的文件列表

    `getFiles()`方法接受两个参数:`path`表示要搜索的起始路径,`suffix`则是要匹配的文件后缀。方法返回一个`List&lt;String&gt;`,其中包含所有匹配后缀的文件的完整路径。 在`main()`方法中,我们看到`getFiles()`被调用...

    给JAVA打开文件对话框指定后缀

    总的来说,通过使用Java的`JFileChooser`和`FileNameExtensionFilter`,我们可以创建具有指定后缀过滤器的文件对话框,确保用户只能选择我们需要的特定类型文件。这种方法在开发用户界面时非常有用,提高了用户体验...

    SPRINGBOOT-finereport代码-需要修改文件后缀名

    在本项目中,我们主要探讨的是如何在SpringBoot框架下集成Finereport报表工具,并针对文件后缀名的修改进行详细讲解。SpringBoot以其简洁、快速的开发特性,深受开发者喜爱,而Finereport作为一款强大的报表设计工具...

    一个用来解释所有文件后缀名称的工具

    在计算机领域,文件后缀名是标识文件类型的重要方式,比如.txt代表文本文件,.docx是微软Word文档,.jpg则是图片文件等。这个工具的存在,解决了用户在遇到不常见或未知后缀名时无法判断文件性质和如何打开的问题。 ...

    java程序 删除半年前的文件 后缀名为csv

    3. **文件过滤**:为了只删除CSV文件,程序可能会使用`FilenameFilter`接口或Java 7引入的`Files`类的`walkFileTree`方法配合`FileVisitOption`,在遍历过程中只处理`.csv`后缀的文件。 4. **日期和时间处理**:...

    java 保存文件

    本文详细介绍了 Java 中文件保存功能的实现过程,包括获取文件的后缀名,使用 JFileChooser 实现文件保存功能和文件写入操作。通过本文的学习,可以帮助读者更好地理解 Java 中文件保存功能的实现原理和方法。

    java 解析 tar gz文件 两种方法

    在Java编程中,处理tar.gz文件常常用于归档和压缩大量数据。tar.gz文件是由tar命令创建的归档文件,然后通过gzip工具进行压缩。在Java中解析这种类型的文件,通常需要两个步骤:首先解压gzip,然后解包tar。这里我们...

    得到文件后缀名

    在计算机领域,文件后缀名是识别文件类型的重要方式,它通常位于文件名的末尾,由一个点(.)分隔。例如,“example.txt”中的“.txt”就是文件后缀名,它告诉我们这个文件是一个纯文本文件。了解如何获取文件后缀名...

    java实现上传文件类型检测过程解析

    读取文件的二进制数据并将其转换为十六进制时,同类型文件的文件头数据是相同的,即使改变了其后缀,这个数据也不会改变。 在 Java 中,可以使用 FileInputStream 读取文件的二进制数据,并将其转换为十六进制字符...

    修改文件夹内所有文件后缀

    标题提到的"修改文件夹内所有文件后缀"就是一个这样的需求,通常会用到编程语言中的文件操作和文件遍历功能。以下将详细介绍如何使用几种常见的编程语言来实现这一功能。 1. **批处理(Batch Script)**: 在...

    获取URL文件名后缀

    // 获取文件后缀 list($type,$vars)=explode('?',$str); // 分割类型和查询字符串 return $type; // 返回后缀 ``` 在这些方法中,`pathinfo`是最为直接和强大的方式,因为它返回一个包含文件信息的数组,包括完整...

    Java获取文件的类型和扩展名的实现方法

    总的来说,Java中获取文件类型和扩展名的方式主要包括使用`File`类获取文件名和扩展名,以及通过探测MIME类型或读取文件头来判断文件类型。实际开发中,应根据具体需求选择合适的方法。注意,确保在处理用户提供的...

    计算某文件夹下的文件数,依据某个后缀名计算文件数,去掉某个后缀名计算文件数

    在Java中,我们可以使用`java.io.File`类的`listFiles()`方法来获取文件夹中的所有文件和子文件夹。如果需要递归遍历子文件夹,可以使用递归函数。以下是一个简单的例子: ```java public static void ...

    java批量修改文件后缀名方法总结

    Java批量修改文件后缀名方法总结 Java批量修改文件后缀名方法总结是指使用Java语言编写的程序来批量修改文件的后缀名。以下是关于Java批量修改文件后缀名方法的知识点总结: 一、使用Java语言编写的批量修改文件...

Global site tag (gtag.js) - Google Analytics