Saturday, April 25, 2009

0

File Types That are Shown In Google Search Engine Result Pages

Seraching on google is very easy, simply go to www.google.com type in the search box what you want to search and press google search button you will get the most relevent search results for your query but the problem is if you need some specific file type it is not easy to find it out. In order to find specific filetypes on google you should know which all file types google search can show in SERP (search engine result pages).

There are 13 file types which google shows in SERP (search engine result pages) in addition to standard web formatted documents in HTML and you can search them by typing query filetype:filetype name. For example if you want to search for PDF file related to shopping you can type shopping filetype:pdf and google will display results containing PDF files or go to advanced search section select the file type you want to search from the file type dropdown and press enter the search result

will display results containing that particular file type.

The file types that are shown in SERP are:









































































File FormatSuffixDescription
Adobe Acrobat PDFpdfA publishing format commonly used for product manuals and documents of all sorts.
Adobe PostScriptpsA printing format often used for academic papers.
Hypertext Markup Languagehtml or htmThe primary language for web pages.
Lotus 1-2-3wk1, wk2, wk3, wk4, wk5, wki, wks, or wkuA spreadsheet format.
Lotus WordProlwpA word processing format.
MacWritemwA word processing format.
Microsoft ExcelxlsA spreadsheet format.
Microsoft PowerPointpptA format for presentations and slides.
Microsoft WorddocA common word processing format.
Microsoft Workswks, wps, or wdbA word processing format.
Microsoft WritewriA Macintosh word processing format.
Rich Text FormatrtfA format used to exchange documents between Microsoft Word and other formats.
Plain Textans or txtOrdinary text with no special formating.

By adding these file types to SERP (search engine result pages) google gives a better understanding of what all other file types are available on the web. Among these file types PDF files are the most popular.

When crawling the file types google converts all file types to either HTML or text.The SERP (search engine result pages) include a link to either "View as HTML" or "View as Text".

Saturday, February 28, 2009

1

Web Analytics Data Collection Methods

Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web site usage.

Web analytics can be divided into two categories Off site web analytics and Onsite web analytics.

Off site web analytics data refers to data collected and measured by companies which are not directly associated with referred websites.

Onsite web analytics data collection is the measurement of visitor’s journey once arrived on a particular website.

Here we will discuss different methods of collecting data related to visitor’s journey and how they interacted with the website.

There are four methods to collect data as a customer interacts with our websites.

1. log files
2. JavaScript tags
3. packet sniffers
4. web beacons

Log Files – This is the oldest data collection method. It is the most easily accessible source of data. When a visitor requests a web page through a web browser the request is send to the server hosting the webpage. The web server accepts the request and creates an entry in the web log for the request. The entry includes IP address, page name, browser type, time, operating system etc. Then with the help of log-parsing tool or a web analytics tool, data is segmented and standard reports are generated.

Java Script Tagging – This data collection method is the most popular method at present. There is no need to collect data at server hosting the web pages. Web page request is served from one server and data is collected on other server usually third party server.

In this method a small java script code is inserted on web pages. When the web page is requested, the request is send to the server hosting the webpage. As the page loads, it executes the JavaScript code, which captures the page view details about the visitor session, and cookies, and sends it back to the data collection server.

Packet Sniffers – It is one of the most advanced data collection method but is not very popular. The process of collecting data using packet sniffing is as follows

The visitor requests for a web page. The request is send to the web server, but before reaching the web server, it passes through a software- or hardware-based packet sniffer that collects attributes of the web page request that can send back more data about the Visitor to the packet sniffer. The packet sniffer sends the request on to the web server. The request is sent back to the customer but is first passed to the packet sniffer. The packet sniffer captures information about the page going back to the customer and stores that data.

Web Beacons – This method of collecting is also popular. The process is very simple a 1 x 1 pixel transparent image is placed on web page within an img src HTML tag. This transparent image is usually hosted on a third-party server—different from the server that is hosting the web page. When a visitor requests a web page the request is send to the server hosting the webpage and image request is send to the third party server. The image request appends data information (IP address, page name, browser type, time, operating system etc) and sends it to the third party server where the data is stored and processed.

Tuesday, October 21, 2008

0

Image Optimization - For Better Search Engine Visibility



One of the most underused area in search engine optimization is Image Optimization. Search engines are giving importance to images and site owners are seeing their image shown in regular search results, only few are using image search optimization.

In this post I will try to cover all the points which will help webmasters and site owners to optimize their images for better search engine ranking.

1. The very first thing take original photos, so that you can brand them with your trademark, logo or url. At business listing sites, add your business logo to creates a more significant effect on users’ mind.

2. Find out some sites where your images can appear as search result, can get indexed and crawled by search engines. All major search engines show images in search result either in vertical image search or within contextual search results.

3. Use photo sharing sites to upload your photos for better visibility (Flickr.

4. Maintain quality of images and photos. Resolution of images is very important so try to adjust it between full size images and thumbnails.

5. Try to save images in standard format. The most popular format are JPEG and GIF format.

6. Name your photo according to the theme of the photo. The name should be related to the photo and should describe its theme such as “latest_mobile_phones.jpeg” and not as “xyz.jpeg”

7. Alt tag and title tag should be appropriate and should describe the photo.

8. The size and weight of the image should be propositional. Heavy images take more time to download and consumes more bandwidth.

9. Try to specify size of the image i.e. width and height when defining image on webpage. If you don’t mention the image size then html parser itself need to consider the image size and it’ll take some extra time to process.

10. Images with lot of content around them ranks better in search engines. As content around an image describes the purpose and theme of the image.

11. You can create trust of your website among visitors by adding images of testimonials from customers, celebrities’ snaps or award winning snapshots.

12. Publicize your website’s logo by adding them on press releases and other such content.

13. Adding logo to your profile while submitting your website to directories.

14. If you are sending newsletter through Email marketing adding Logo helps in creating brand.

15. Use thumbnails and original large image as needed. Use proper image size where needed. Using inappropriate image size will affect usability of website.

16. Google webmaster tool provides an option “enable enhanced image search”. Enable it. It will help imagebot to crawl your images.

17. Bookmark your photos and images using social networking sites such as Facebook, Digg etc.

18. Never exclude images folder from search engine robots. Blocking images folder in robots.txt file will exclude them from getting indexed.

19. Don’t use java script to access any image link. It will never get crawled by search engine.

20. Google’s tool such as Google Image Labeler is used to associate the images included in your site with labels that will improve indexing and search quality of those images. Google Image Labeler.

List of Image Search Engines

1. Imagery

2. Google Image Search

3. Yahoo Image Search

4. Ask Images Search

5. MSN Live Image Search

6. Corbis

7. PicSearch

8. Exalead

9. Pixsy

10. Visoo

11. Netvue

12. Flickr Images

13. Webshots

14. Photobucket

15. Getty Images Search

16. AltaVista Image Search



List of photo sharing sites:

1. 23

2. Animus3

3. Art Limited

4. DeviantART

5. DropShots

6. Flickr

7. FocalPower

8. Fotki

9. Fotolog

10. Gallery 2

11. Humble Voice

12. ImageEvent

13. Ipernity

14. Kodak Gallery

15. Koffee Photo

16. Multiply

17. My Photo Album

18. One True Media

19. Panoramio

20. PBase

21. Phanfare

22. Photo.net

23. Photobucket

24. PhotoSIG

25. Photoworks

26. Photrade

27. Picasa

28. Picateers

29. Pickle

30. PicMe

31. Pix.ie

33. RedBubble

34. rmbr

35. Shutterfly

36. Slide

37. SmugMug

38. Snapfish.com by HP

39. Tabblo

40. Walgreens

41. Webshots

42. Winkflash

43. Zenfolio

44. Zooomr

45. Zoto


Monday, October 20, 2008

0

Some Facts About Flash Optimization

Flash File Optimization
Historically websites based on flash were unable to get crawled by search engines. Content such as text and links embedded in the Flash (SWF) files was of no use for search engines. This situation created frustration among web developers who tried every method to get their site get indexed by major search engines and get ranking in SERP. This situation created problem for searchers, as they were missing some quality content for there queries.

Google in cooperation with Adobe developed a new algorithm for Flash, according to which Googlebot can now indexes “textual content in Flash (SWF)” and can also extracts URLs embeded in Flash files.

Google is being working on for years to crawl rich media files such as flash and java script. Now when google has developed such technology, few questions need to be answered.

1. Does Google associate text content in Flash with the correct parent URL?
2. Can Flash files have PageRank?
3. Can Google Translate Flash content


1. Does Google associate text content in Flash with the correct parent URL?

According to google, flash (SWF) files that are embedded in the HTML of webpage its text is associated with the parent URL ie. they are indexed as single unit. But studies show that this is not completely true. Flash file URL and parent URL are indexed separately.

2. Can Flash files have PageRank?

Flash files can get page rank but currently it is having some issues. According to studies links that are present in flash files can pass page rank and so they can get page rank also. The issue is that flash file i.e. SWF files and parent URL both get page rank but in some cases they may differ.

3. Can Google Translate Flash content?

Google cannot translate flash content. To verify this try to translate any flash (SWF) file using google translator. You will get No results found.

Tuesday, August 26, 2008

2

Google’s Case Sensitive Issues

When google crawls webpages over the internet to find fresh and unique content, it also crawls pages with duplicate content. Below are some of the factors which are generally not discussed as a result duplicate URL’s and content is crawled by google.

Google crawling is case insensitive

Starting with URI specification

Scheme and hostname are case insensitive i.e. the below url’s are treated same.

http://www.xyx.com/ = HTTP://www.Xyz.com/

But in case of Directories and filenames it is case sensitive

The below examples are treated as 3 different URLs

* http://www.xyz.com/Page1.html
* http://www. xyz.com/PAGE1.HTML
* http://www.xyz.com/page1.html

Google and Case Issues

Crawling

Google considers case variations in directory and filename and will consider the below URL’s as different and may crawl all the 3

* http://www.xyz.com/Page1.html
* http://www.xyz.com/PAGE1.HTML
* http://www.xyz.com/page1.html

Indexing

When case-varied URLs are accessible and webserver does not redirect to the preferred URL
Duplicate content is crawled between different URL cases.
It consolidate properties (such as link information) between duplicate URL’s and stores them.
It will display, high-ranking URL selected from case-sensitive URL comparisons.

URL Case Recommendations

Web server default behavior is as follows

* IIS is case insensitive it will treat Page1.html = page1.html, the two pages are treated as same
* Apache is case-sensitive it will treat Page1.html != page1.html, the two pages are treated as different

The most important issue which is not much discussed is robots.txt is case sensitive for paths

The below example will explain the same

* Disallow: /abc = disallow: /abc
* Disallow: /ABC != Disallow: /abc, the two paths are treated as different

Recommendation

1. Follow consistent design format for URL’s either choose ePuppy.html or epuppy.html

2. It is recommended and is often more error-proof to create all lowercase URLs such as epuppy.html

3. Verify case sensitive paths with Webmaster Tools’ robots.txt analysis tool

If the above mentioned points are considered while creating a website many duplicate issues can be solved.