This is a problem I’ve come across frequently, and since it has come up again recently, I thought I’d explore this issue in the hope that it will save others some trouble. There are so many problems that this one issue can lead to that it’s baffling browsers still behave this way. The issue? An HTML image, either via <img> tag or JavaScript Image object, that has its src set to “” (an empty string).
The offending code
There are basically two patterns to identify. The first pattern is just straight HTML:
<img src="" >
The second pattern is JavaScript and involves the dynamic setting of the src property on either a newly created image or an existing one:
var img = new Image();
img.src = "";
Both patterns cause the same effect: another request is made to your server. There are two different ways that browsers do this.
* Internet Explorer makes a request to the directory in which the page is located. For example, if you have a page running at http://www.example.com/dir/mypage.htm that has one of these patterns, IE makes a request to http://www.example.com/dir/ to fill in the image.
* Safari and Chrome make a request to the actual page itself. So the page running at http://www.example.com/dir/mypage.htm results in a second request to http://www.example.com/dir/mypage.htm to fill in the image.
You’ll note that Opera and Firefox aren’t mentioned at all. Opera behaves as you might expect: it doesn’t do anything when an empty image src is encountered; the attribute is ignored. Firefox 3 and earlier behave the same as Safari and Chrome, but Firefox 3.5 addressed this issue and no longer sends a request (related bug).
Both cases, of course, are problematic because it’s an image making a request for a document. You can easily see this behavior using an HTTP debugging proxy (I highly recommend Fiddler).
The problems
There are two basic problems that this browser behavior causes. The first is a traffic spike. Imagine that have <img src=""> on the page at http://www.example.com/. The big problem is that each instance of <img src=""> makes a request to / in all browsers, which is the homepage of the domain. Congratulations, you’ve effectively doubled your traffic to the homepage.
For small sites, this may not be that big of a deal; jumping from 10,000 to 20,000 page views probably isn’t going to raise any flags for you or your host. If you’re a page that gets millions of page views per day, and probably have a lot of machines to handle that load, doubling or tripling traffic can be crippling. You can very easily run out of capacity.
Another issue with the traffic increase is the computing power needed to generate that homepage. If the page is personalizable or is updated with some regular frequency, you could be wasting computing cycles creating a page that will never be viewed by anyone.
The second problem is user state corruption. If you’re tracking state in the request, either by cookies or in another way, you have the possibility of destroying data. Even though the image request doesn’t return an image, all of the headers are read and accepted by the browser, including all cookies. While the rest of the response is thrown away, the damage may already be done.
How does this code happen?
The first time I encountered this problem, I naively thought that it was a bad developer writing crappy code. Had this been 2000 or earlier, I probably would have been right. In today’s web development world, however, I’m mostly wrong. Today, there are so many templating engines and content management systems responsible for constructing pages on-the-fly that it’s quite possible for good developers to end up producing pages with this code. All it takes is something as simple as this PHP:
<img src="$imageUrl" >
If some other part of the code is responsible for filling in $imageUrl, and that code fails, then the offending code gets output to the browser.
In today’s web development world, we’re all doing something along these lines, whether we know it or not. Download a new Wordpress theme? Make sure you’ll filled in all default arguments. Using a CMS at work? Make sure all your image URL fields are validated. It’s frightening easy to end up with this bad code on your page.
Other tags with problems
Before getting too angry at browser vendors, I think it’s fair to take a look at the HTML 4 specification, specifically the part defining images. Even though the specification indicates that the src attribute should contain a URI, it fails to define the behavior when src doesn’t contain a URI. Of course, images aren’t the only tags that reference an external resource, and so it should come as no surprise that there are other tags with the same problem.
As it turns out, Internet Explorer is the most sane browser out there. It’s problems are thankfully limited to images with an empty src attribute. It does make for this by making it a pain to detect, but that will be discussed later.
For other browsers, there are two additional problem scenarios: <script src=""> and <link href="">. Chrome, Safari, and Firefox all initiate another request.
Thankfully, no browser has a problem with <iframe src="">, as all correctly do not make another request.
What can be done?
Of course, the best thing to do is eliminate the offending code from your pages whenever possible. That’s fixing the problem at the source. If you can’t do that, though, your next best option is to attempt to detect it on the server and abort any further execution.
For browsers other than IE, it’s not too difficult to detect what’s going on from the server side. Since the request comes back to the exact same location that contains the offending code, there are two things you can do. First, you can check the request’s referrer. A request resulting from this issue coming from http://www.example.com/dir/mypage.htm will have a referrer of http://www.example.com/dir/mypage.htm. Assuming that there are no valid situations under which your page links to itself, this is a fairly safe way to detect these requests on the server-side.
Internet Explorer throws a wrench into the works by sending the request to the directory of the page instead of the page itself. If you’re only using path URLs (i.e., nothing with a file extension), then the effect is the same and you can use the same referrer detect. Some sample code for use with PHP:
<?php
//Works for IE only when using path URLs and not file URLs
//get the referrer
$referrer = isset($_SERVER['HTTP_REFERER']) ? $_SERVER['HTTP_REFERER'] : '';
//current URL (assuming HTTP and default port)
$url = "http://" . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
//make sure they're not the same
if ($referrer == $url){
exit;
}
?>
The goal here is to detect that the page refers to itself and then exit immediately to prevent the server from doing anything additional. Another option, and probably a good idea, is to log that this has happened so it shows up on a dashboard for evaluation.
Another way to attempt to detect this type of request on the server is by looking at the HTTP Accept header. All browsers except IE send different HTTP Accept headers for image requests than they do for HTML requests. As an example, Chrome sends the following Accept header for an HTML request:
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Compare this to the Accept header that is sent for an image, script, or style sheet request:
Accept: */*
Firefox, Safari, and Opera all send roughly the same Accept header for HTML requests, meaning that you can check for an individual part, such as “text/html”, to determine if the request is an HTML request or something else. Unfortunately, IE only sends the latter Accept header for all requests, so there is no way to differentiate this on the server. For browsers other than IE, you can use something like the following:
<?php
//Warning: Doesn't work for IE!
//make sure the Accept header has 'text/htmnl' in it
if (strpos($_SERVER['HTTP_ACCEPT'], 'text/html') === false){
exit;
}
?>
This check is a little safer than the previous, but its big downside is that it doesn’t work in IE.
Why does this happen?
The real problem is the way that URI resolution is performed in browsers. This behavior is defined in RFC 3986 - Uniform Resource Identifiers. When an empty string is encountered as a URI, it’s considered a relative URI and is resolved according to the algorithm defined in section 5.2. This specific example, an empty string, is listed in section 5.4. Firefox, Safari, and Chrome are all resolving an empty string correctly per the specification, while Internet Explorer is resolving it incorrectly, apparently in line with an earlier version of the specification, RFC 2396 - Uniform Resource Identifiers (this was obsoleted by RFC 3986). So technically, the browsers are doing what they’re supposed to do to resolve relative URIs. The problem is that in this context, the empty string is clearly unintentional.
It’s time to fix this
This is a serious flaw in browsers, and I’m not sure you can look at it in any way where it’s not considered a bug. The inconsistent behavior, from Opera completely ignoring all invalid external references, to IE falling victim only for <img> tags while others do the same for <script> and <link> as well, seem to indicate a bug in browsers. Though browsers seem to be following correct URI resolution (except IE), I think this is a case where common sense must win over the letter of the specification. There is no way that an image can possibly render an HTML page, and the same goes for <script> and <link>. This bug has cost web developers hundreds of lost hours and has potentially brought down sites, pushing servers over capacity. Enough is enough. It’s time for the browser vendors to fix this bug. I’ve taken the liberty of filing or locating bugs:
* Firefox: Bug 531327
* WebKit (Safari/Chrome): Bug 30303
Please show support for fixing these bugs, as I don’t see any reason why we should still be dealing with this browser behavior. And if anyone can get the note to Microsoft so they can address IE, we’d all greatly appreciate it.
HTML5 to the rescue
HTML5 adds to the description of the <img> tag’s src attribute to instruct browsers not to make an additional request in section 4.8.2:
The src attribute must be present, and must contain a valid URL referencing a non-interactive, optionally animated, image resource that is neither paged nor scripted. If the base URI of the element is the same as the document’s address, then the src attribute’s value must not be the empty string.
Hopefully, browsers won’t have this problem in the future. Unfortunately, there is no such clause for <script src=""> and <link href="">. Maybe there’s still time to make that adjustment to ensure browsers don’t accidentally implement this behavior.
Update (2 Dec 2009): It appears that <img src=""> has been patched in Firefox 3.5 (bug 444931). Problems with <script src=""> and <link href=""> still remain. Also, added a reference to the HTML5 section that aims to help this issue.
分享到:
相关推荐
标题“destroy”所代表的是一个别具匠心的桌面发泄工具,它以一种非常直观的方式,提供给用户一种虚拟破坏的体验,以达到减压的目的。在如今快节奏的生活中,压力和紧张成为人们普遍面对的问题。工作的繁忙、学习的...
PNGImage1.Picture.LoadFromFile('path_to_your_image.png'); ``` 3. 创建PNG按钮(PNGButton) PNGButton是一种自定义的按钮控件,它允许开发者使用PNG图像作为按钮的背景,从而实现更美观且具有透明效果的按钮。...
在Unity3D中,开发游戏时常常需要监听游戏对象(GameObject)的生命周期事件,特别是当对象被销毁(Destroy)时,可能需要执行某些清理工作或触发相关联的逻辑。标题和描述提到的问题是关于如何在Gameobject被销毁时...
之前写了一个activity加载fragment的比较low,那个是放在xml布局里面动态控制show和hide实现的,这个代码也是通过show和hide实现显示和隐藏防止destroy,但是可以动态加载fragment实例,不用再把fragment放在xml布局...
SpyBot-Search & Destroy 是专门用来清理间谍程序的工具。一些间谍程序随着共享软件安装到您的电脑中,监视您的电脑运行。到目前为止,他已经可以检测一万多种间谍程序 (Spyware),并对其中的一千多种进行免疫处理...
`servlet-src`通常指的是Servlet的源代码,这对于理解Servlet的工作原理、学习如何编写Servlet以及进行自定义功能扩展非常有用。在这个压缩包中,我们可能会找到`javax`包,这是Java标准库的一部分,包含了Servlet...
标题中的“destroy-mbstract.rar_USB编程_destroy mat”可能是指一个关于USB编程的项目,其中包含了某种数据处理或分析的步骤,可能与MATLAB环境下的矩阵操作有关。"destroy mat"可能是一个函数或方法的名称,暗示了...
1. **Servlet和JSP生命周期**:理解Servlet的init(), service(), destroy()方法,以及JSP如何转换为Servlet并执行。 2. **HTTP协议处理**:Tomcat如何解析HTTP请求并生成响应。 3. **线程模型**:Tomcat如何管理线程...
在Laravel框架中,开发一个"site-setting"模块主要用于管理和维护网站的各种配置信息,如站点名称、联系方式、版权信息等。这个"site-setting"项目可能是从后端门户提供一个友好的用户界面,允许管理员轻松地更新...
标题中的“acsociative_destroy.rar”可能是一个错误的拼写,根据上下文,可能是与MATLAB和Android开发相关的项目文件。"MATLAB destroy_android开发"这部分描述可能指的是使用MATLAB进行Android应用程序的开发或者...
"servlet-src"这个压缩包很可能是包含了Servlet的源代码,方便开发者学习和理解Servlet的工作原理。Decompiler Jad被用来反编译这些源码,使得原本可能只有的字节码形式的类文件转化为可读的Java源代码。 在Java ...
然而,当你遇到"Warning: session_destroy() [function.session-destroy]: Trying to destroy uninitialized session"这样的错误时,这意味着在尝试销毁session之前,session并没有被正确地初始化。 首先,理解错误...
C语言版简单内存池的... This function will destroy the whole pool, freeing all your memory allocated by that pool, even if you haven't xmem_free yet. xmem_destroy_pool(pool1); xmem_destroy_pool(pool2);
Destroy CMS 的出现,旨在提供一种高效、灵活且易于使用的平台,帮助开发者快速搭建和管理网站内容。 PHP(Hypertext Preprocessor)是一种解释型、通用的、面向对象的脚本语言。它可以在服务器端运行,能够生成...
combinedImage.Destroy(); ``` 为了使程序更加灵活,你可以增加一个文件对话框让用户选择图片,或者通过修改程序代码使其能处理不同大小的图片,通过裁剪或缩放来适应拼接需求。此外,可以添加错误处理代码,以...
在Unity3D游戏开发中,`Object.Destroy`函数是一个至关重要的功能,用于在运行时动态地从场景中移除游戏对象、组件或者资源。这个功能的使用灵活性极高,可以优化性能,减少不必要的计算负担,特别是在处理临时性...
`destroy()`方法在Servlet不再使用时调用,用于释放资源。 2. **Servlet API**:Servlet API包含了一系列接口,如`Servlet`, `GenericServlet`, `HttpServlet`等。`HttpServlet`是`Servlet`的子类,专门处理HTTP...
1. **架构设计**:Destroy CMS可能采用了MVC(Model-View-Controller)架构,这是一种常见的Web应用设计模式,将业务逻辑(Model)、数据展示(View)和用户交互(Controller)分离,有利于代码组织和维护。...
Blog Finder is blog commenter software that provides you with a great way to get backlinks for your sites without using any risky tactics that could destroy your search engine rankings. Blog Finder is...