`
sizhefang
  • 浏览: 227277 次
  • 性别: Icon_minigender_1
  • 来自: 天津
社区版块
存档分类
最新评论

Changing dynamic to static urls

阅读更多

Changing dynamic to static url s
Search engine-friendly links with mod_rewrite

Introduction

One of the most frequent questions posted in the Apache Server forum is "How can I change my dynamic url s to static url s using mod_rewrite?" So this post is intended to answer that question and to clear up a very common misconception.

Mod_rewrite cannot "change" the url s on your pages

First, the misconception: Mod_rewrite cannot be used to change the url that the visitor sees in his/her browser address bar unless an external redirect is invoked. But an external redirect would 'expose' the underlying dynamic url to search engines and would therefore completely defeat the purpose here. This application calls for an internal server rewrite, not an external client redirect.

It's also important to realize that mod_rewrite works on requested url s after the HTTP request is received by the server, and before any scripts are executed or any content is served. That is, mod_rewrite changes the server filepath and script variables associated with a requested url , but has no effect whatsoever on the content of 'pages' output by the server.

How to change dynamic to static url s

With that in mind, here's the procedure to implement search engine-friendly static url s on a dynamic site:

  • Change all url s in links on all pages to a static form. This is usually done by modifying the database or by changing the script that generates those pages. PHP's preg_replace function often comes in handy for this.
  • Add mod_rewrite code to your httpd.conf, conf.d, or .htaccess file to internally rewrite those static url s, when requested from your server, into the dynamic form needed to invoke your page-generation script.
  • Add additional mod_rewrite code to detect direct client requests for dynamic url s and externally redirect those requests to the equivalent new static url s. A 301-Moved Permanently redirect is used to tell search engines to drop your old dynamic url s and use the new static ones, and also to redirect visitors who may come back to your site using outdated dynamic -url bookmarks.

    Considering the above for a moment, one quickly realizes that both the dynamic and static url formats must contain all the information needed to reconstruct the other format. In addition, careful selection of the 'design' of the static url s can save a lot of trouble later, and also save a lot of CPU cycles which might otherwise be wasted with an inefficient implementation.

    An earnest warning

    It is not my purpose here to explain all about regular expressions and mod_rewrite; The Apache mod_rewrite documentation and many other tutorials are readily available on-line to anyone who searches for them (see also the references cited in the Apache Forum Charter and the tutorials in the Apache forum section of the WebmasterWorld Library ).

    Trying to use mod_rewrite without studying that documentation thoroughly is an invitation to disaster. Keep in mind that mod_rewrite affects your server configuration , and that one single typo or logic error can make your site inaccessible or quickly ruin your search engine rankings. If you depend on your site's revenue for your livlihood, intense study is indicated.

    That said, here's an example which should be useful for study, and might serve as a base from which you can customize your own solution.

    Working example

    Old dynamic url format: /index\.php?product=widget&color=blue&size=small&texture=fuzzy&maker=widgetco

    New static url format: /product/widget/blue/small/fuzzy/widgetco

    Mod_rewrite code for use in .htaccess file:

    # Enable mod_rewrite, start rewrite engine
    Options +FollowSymLinks
    RewriteEngine on
    #
    # Internally rewrite search engine friendly static url to dynamic filepath and query
    RewriteRule ^product/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/?$ /index.php?product=$1&color=$2&size=$3&texture=$4&maker=$5 [L]
    #
    # Externally redirect client requests for old dynamic url s to equivalent new static url s
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?product=([^&]+)&color=([^&]+)&size=([^&]+)&texture=([^&]+)&maker=([^\ ]+)\ HTTP/
    RewriteRule ^index\.php$ http:/<!-- www.bestbbs.com -->/example.com/product/%1/%2/%3/%4/%5? [R=301,L]

    Note that the keyword "product" always appears in both the static and dynamic forms. This is intended to make it simple for mod_rewrite to detect requests where the above rules need to be applied. Other methods, such as tesing for file-exists are also possible, but less efficient and more prone to errors compared to this approach.

    Differences between .htaccess code and httpd.conf or conf.d code

    If you wish to use this code in a <directory> container in the http.conf or conf.d server configuration files, you will need to add a leading slash to the patterns in both RewriteRules, i.e. change "RewriteRule ^index\.php$" to "RewriteRule ^/i ndex\.php$". Also remember that you will need to restart your server before changes in these server config files take effect.

    How this works

  • A visitor uses their browser to view one of your pages
  • The visitor clicks on the link <a href="/product/gizmo/red/tiny/furry/gizmocorp">Tiny red furry gizmos by GizmoCorp!</a> on your page
  • The browser requests the virtual file http:/<!-- www.bestbbs.com -->/example.com/product/gizmo/red/tiny/furry/gizmocorp from your server
  • Mod_rewrite is invoked, and the first rule above rewrites the request to /index\.php?product=gizmo&color=red&size=tiny&texture=furry&maker=gizmocorp, invoking your script
  • Your script generates the requested page, and the server sends it back to the client browser
  • The visitor clicks on another link, and the process repeats

    Now let's say a search engine spider visits your site using the old dynamic url :

  • The spider requests http:/<!-- www.bestbbs.com -->/example.com/index\.php?product=wodget&color=green&size=large&texture=smooth&maker=wodgetsinc from your server
  • Mod_rewrite is invoked, and the second rule generates an external 301 redirect, informing the spider that the requested page has been permanently moved to http:/<!-- www.bestbbs.com -->/example.com/product/wodget/green/large/smooth/wodgetsinc
  • The spider queues a request to its url database manager, telling it to replace the old dynamic url with the new one given in that redirect response.
  • The spider re-requests the page it was looking for using the new static url http:/<!-- www.bestbbs.com -->/example.com/product/wodget/green/large/smooth/wodgetsinc
  • Mod_rewrite is invoked, and the first rule internally rewrites the request to /index\.php?product=wodget&color=green&size=large&texture=smooth&maker=wodgetsinc, invoking your script
  • Your script generates the requested page, and the server sends it back to the search engine spider for parsing and inclusion in the search index
  • Since the spider is now collecting pages including new static links, and all requests for old dynamic url s are permanently redirected to the new static url s, the new url s will replace the old ones in search results over time.

    Location, location, location

    In order for the code above to work, it must be placed in the .htaccess file in the same directory as the /index.php file. Or it must be placed in a <directory> container in httpd.conf or conf.d that refers to that directory. Alternatively, the code can be modified for placement in any Web-accessible directory above the /index.php directory by changing the url -paths used in the regular-expressions patterns for RewriteCond and RewriteRule.

    Regular-expressions patterns

    Just one comment on the regular expressions subpatterns used in the code above. I have avoided using the very easy, very popular, and very inefficient construct "(.*)/(.*)" in the code. That's because multiple ".*" subpatterns in a regular-expressions pattern are highy ambiguous and highly inefficient.

    The reason for this is twofold; First, ".*" means "match any number of any characters". And second, ".*" is 'greedy,' meaning it will match as many characters as possible. So what happens with a pattern like "(.*)/(.*)" is that multiple matching attempts must be made before the requested url can match the pattern or be rejected, with the number of attempts equal to (the number of characters between "/" and the end of the requested url plus two) multiplied by (the number of "(.*)" subpatterns minus one) -- It is easy to make a multiple-"(.*)" pattern that requires dozens or even hundreds of passes to match or reject a particular requested url .

    Let's take a short example. Note that the periods are used only to force a 'table' layout on this forum. Bearing in mind that back-reference $1 contains the characters matched into the first parenthesized sub-pattern, while $2 contains those matched into the second sub-pattern:

    Requested url : http:/<!-- www.bestbbs.com -->/example.com/abc/def
    Local url -path: abc/def
    Rule pattern: ^(.*)/(.*)$

    Pass# ¦ $1 value ¦ $2 value ¦ Result
    1 ... ¦ abc/def .¦ - ...... ¦ no match
    2 ... ¦ abc/de . ¦ f ...... ¦ no match
    3 ... ¦ abc/d .. ¦ ef ..... ¦ no match
    4 ... ¦ abc/ ... ¦ def .... ¦ no match
    5 ... } abc .... ¦ def .... ¦ Match

    I'll hazard a guess that many many sites are driven to unnecessary server upgrades every year by this one error alone.

    Instead, I used the unambiguous constructs "([^/]+)", "([^&]+)", and "([^\ ]+)". Roughly translated, these mean "match one or more characters not equal to a slash," "match one or more characters not equal to an ampersand," and "match one or more characters not equal to a space," respectively. The effect is that each of those subpatterns will 'consume' one or more characters from the requested url , up to the next occurance of the excluded character, thereby allowing the regex parser to match the requested url to the pattern in one single left-to-right pass.

    Common problems

    A common problem encountered when implementing static-to-dynamic url rewrites is that relative links to images and included CSS files and external JavaScripts on your pages will become broken. The key is to remember that it is the client (e.g. the browser) that resolves relative links; For example, if you are rewriting the url /product/widget/blue/fuzzy/widgetco to your script, the browser will see a page called "widgetco", and see a relative link on that page as being relative to the 'virtual' directory /product/widget/blue/fuzzy/. The two easiest solutions are to use server-relative or absolute (canonical) links, or to add additional code to rewrite image, CSS, and external JS url s to the correct location. An example would be to use the server-relative link <img src="/l ogo.gif"> to replace the page-relative link <img src="logo.gif">.

    Avoiding testing problems

    For both .htaccess and server config file code, remember to flush your browser cache before testing any changes; Otherwise, your browser will likely serve any previously-requested pages from its cache instead of fetching them from your server. Obviously, in that case, no code on your server can have any effect on the transaction.

    Read first, then write and test

    I hope this post is helpful. If you still have problems after studying the mod_rewrite documentation and regular expressions tutorials, and writing and testing your own code, feel free to post relevant entries from your server error log and ask specific questions in the Apache Server forum . Please take a few minutes to read the WebmasterWorld Terms of Service and the Apache Forum Charter before posting (Thanks!).

    Jim <!-- /post -->

  • 分享到:
    评论

    相关推荐

      chrome 插件 Allow access to file URLs

      AxureRP-extension-for-Chrome-0.6.2 Chrome Version 33.0.1750.146 m 因为众所周知的原因,此插件不能正常下载,但是可以通过离线安装 使用说明: 1 设置---更多工具--扩展程序 2 打开开发者模式 ...

      ZenCart的优化插件ultimate_seo_urls的使用,以及站内优化

      ZenCart 优化插件 ultimate_seo_urls 的使用和站内优化 ZenCart 优化插件 ultimate_seo_urls 是一个强大的插件,它可以将 ZenCart 的 URL 静态化,从而提高网站的搜索引擎优化(SEO)。下面是使用 ultimate_seo_...

      ultimate_seo_urls

      "Ultimate SEO URLs" 是一款针对 ZenCart 电子商务平台的扩展插件,旨在优化网站的搜索引擎优化(SEO)性能。这个插件的核心目标是将传统的动态URL转化为静态、用户友好的格式,从而提高搜索引擎的抓取效率和用户...

      ultimate_seo_urls_2-109

      "Ultimate SEO URLs 2.109"是一个专门为Zen Cart电子商务平台设计的插件,其目标是将网站内部的动态URL转化为静态或伪静态形式,以提高搜索引擎的友好性。 Zen Cart是一款开源的电子商务购物车系统,它提供了一套...

      Zencart必备插件之ultimate_seo_urls(URL伪静态)

      "Ultimate SEO URLs"是Zencart的一个重要插件,专门用于解决这个问题。 Ultimate SEO URLs插件的核心功能在于提供URL重写机制,将原本复杂的动态URL转换为简洁、易于理解的静态化形式。这不仅提升了用户体验,也...

      Go To Chrome URLs Page-crx插件

      标题:“Go To Chrome URLs Page-crx插件”是一个专为Google Chrome浏览器设计的扩展程序,旨在帮助用户更方便、快捷地访问浏览器的特定网址页面。这个插件主要的功能是提供一键式访问,使得用户能够迅速跳转到...

      zencartultimate_seo_urls安装说明.pdf

      zencartultimate_seo_urls安装说明.pdf

      PyPI 官网下载 | d8s_urls-0.6.0-py2.py3-none-any.whl

      标题中的"PyPI 官网下载 | d8s_urls-0.6.0-py2.py3-none-any.whl"表明这是一个在Python Package Index (PyPI)官网上发布的软件包,名为`d8s_urls`,版本号为0.6.0。PyPI是Python社区最常用的第三方库分发平台,...

      Node.js-reachable-urls检查文本的网址是否可以访问

      本篇文章将深入探讨一个与Node.js相关的实用工具——`reachable-urls`,这是一个用于检查文本中网址可访问性的模块。 `reachable-urls`是Node.js开发的一个HTTP工具,它的主要功能是对给定文本中的URL进行扫描,并...

      Chrome插件-Copy All Urls优雅地保存-开启多个标签页.zip

      Chrome插件-Copy All Urls优雅地保存-开启多个标签页.zip。Copy All Urls属于小而美地工具,如果你每天都需要查看几个固定的网页, Copy All Urls能帮你省很多时间。

      Copy-All-Urls2.10:谷歌浏览器复制所有标签url网址插件

      资源内容:Copy-All-Urls_v2.10.crx。解决一次性复制谷歌浏览器所有标签网址的问题。释放双手。 使用方法:安装谷歌chrome浏览器》下载文件双击打开安装

      网站爬取的图集的urls

      基于我之前写的代码,这个文件用于储存事先requests得到的图集链接,直接保存到代码同级目录以运行代码

      Android代码-PicoUrl

      The only problem is, a lot of times, you probably want to allow for deep links into your app using tiny urls, but for the best of reasons, do not want to register for all urls starting with ...

      utils-api > urls

      utils-api &gt; urls

      解决django 多个APP时 static文件的问题

      from django.conf.urls.static import static from blog.settings import STATIC_ROOT urlpatterns = [ url(r'^admin/', admin.site.urls), url(r'^static/(?P&lt;path&gt;.*)/$', serve, {'document_root': STATIC_...

      This is a class to get the folders and urls of the IE.(32KB)

      标题中的 "This is a class to get the folders and urls of the IE" 表明这是一个用于获取Internet Explorer(IE)浏览器的收藏夹和URL的程序。在IT领域,这通常涉及到编程,特别是与Windows操作系统相关的API调用...

      Open Multiple URLs-crx插件

      打开URL列表源代码:https://github.com/htrinter/Open-Multiple-URLs/ Changelog:https://github.com/htrinter/Open-Multiple-URLs/blob/v1.5.0/CHANGELOG。 md权限:-“制表符”权限以打开新标签页。 该权限显示...

      huginn_mattermost_urls_to_files

      将此字符串添加到您的Huginn的.env ADDITIONAL_GEMS配置中: huginn_mattermost_urls_to_files# when only using this agent gem it should look like this:ADDITIONAL_GEMS = huginn_mattermost_urls_to_files ( ...

    Global site tag (gtag.js) - Google Analytics