Making your URLs “pretty” is a basic step to improving your website visibility. Often with website systems, CMSs, the URL that is put out by the system is not a) nice for search engines and b) nice for users. Even WordPress is known to have some options for URLs that are not well optimised. If your website has urls that look like “www.yourwebsite.com/cid?1239u09149″ or something like “www.yourblog.com/123/post/category/yourtitle” then you are missing out on some SEO points.
URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. There are some standards that search engines look for to determine if the URL has been created “well”. They also use these points to determine if a URL has been crawled before, and which parts of the website are the most important.
- URLs are case sensitive. Most normalizers will convert them to lowercase. Example:HTTP://www.Example.com/ ? http://www.example.com/
- There are two ways to present special characters in URLS. The first is through the encoding itself and the second is with the special character. Most systems do not allow special characters in the database, but these are required by search engines. This is called decoding percent-encoded octets of unreserved characters. For consistency, percent-encoded octets in the ranges of ALPHA (%41–%5A and %61–%7A), DIGIT (%30–%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. Example:http://www.example.com/%7Eusername/ ? http://www.example.com/~username/
- Adding trailing / Directories are indicated with a trailing slash and should be included in URLs. Example: http://www.example.com ? http://www.example.com/
- Applying the following normalizations result in a semantically different URL although it may refer to the same resource: Removing directory index. Default directory indexes are generally not needed in URLs. Examples: http://www.example.com/default.asp ? http://www.example.com/
http://www.example.com/a/index.html ? http://www.example.com/a/
- Removing the fragment. The fragment component of a URL is usually removed. Example:http://www.example.com/bar.html#section1 ? http://www.example.com/bar.html
- Sorting the variables of active pages. Some active web pages have more than one variable in the URL. A normalizer can remove all the variables with their data, sort them into alphabetical order (by variable name), and reassemble the URL. Example:http://www.example.com/display?lang=en&article=fred ? http://www.example.com/display?article=fred&lang=en
- Some normalization rules may be developed for specific websites by examining URL lists obtained from previous crawls or web server logs. For example, if the URL
appears in a crawl log several times along with