Today, the person who has more information on his hand is more powerful. To have versatile knowledge we need to have versatile information with us and for that we need to have a strong search engine. We have many search engines available around us. For some specific search criteria one search engine can give better search results than others. Based on the search criteria it needs to identify which search engine has better results. So, a comparison among these search engines can be demanded. In our report to find this comparison we have used both qualitative and quantitative method. Our analysis has made a clear distinction about the performance of these search engines which have been measured based on some search criteria. The ultimate result has proved that Google is pioneered for searching for most of the search criteria while others are a bit lagging behind. Again most of the search engines produce better results for Boolean and keyword based searching.
Keywords: Search Engine, Keyword Search, Metadata Search, Precision, Recall
Table of Contents
Introduction 1
Structure of the Paper 1
Background 1
Description of the Research Area 2
Problem Definition 2
Research Questions 2
Goal 2
Objectives 2
Method 3
Method Discussion 3
Limitations 4
Extended Background 4
Search engines and basic search techniques 5
Truncation (Wildcards) 6
Boolean search approach 6
Keyword based search 6
Metadata search 6
Language search 7
Precision Vs Recall 7
Data collection 8
Prepare Search Queries 8
Calculate precision and recall 9
Analysis 11
Discussion 14
Conclusion 15
Introduction
Structure of the Paper
This paper has been structured according to the following manner.
Setting query criteria
Search query results and find their relevancy in different search engines
Observing the precision and recall of the search results with the different factors
Making comparison among different search engines based on above observations
Background
We have many search engines available. The popular search engines are Google, Yahoo, Windows live search, AOL, Ask, AllTheWeb, and AltaVista etc. Searching information in the web using these search engines is a common practice among human in the modern web based society. Now-a-days, we cannot live without searching information as it is the key of power and knowledge. It is not possible to acquire versatile knowledge for a single person. But according to the necessity human can ask for the knowledge while they are connected with internet and have a search engine in front of them. So, information mining or searching in the web is a regular practice. Human have their own preference for choosing search engines. So far various reports show that Google is the most favorite and used search engine worldwide [1, 2]. Other general purpose search engines are not being used as widely as Google. As an example based on US Internet usage 71.9% volume of search ratio belongs to Google[3]. Other search engines have obviously lower search ratio. All of these search engines are keyword based search engines[3]. They search documents matching with the given keywords in queries. They have different algorithms for searching and indexing the resultant pages or documents. That is why getting the most relevant search results for the same queries differ from search engine to search engine. The searching queries may be formed in multiple ways for the same topic. Thus even the results may vary within the same search engine for these differently formed queries. A clear observation of the relevancy of search results for these queries can be considered for tabling the comparisons of these search engines. The differently formed queries can be categorized according to the keyword based search, Meta data search, language based search, searching with wildcards etc.
A clear finding out of the strength of these search engines need to be addressed. So, our research area is about defining comparisons among different keyword based search engines.
Description of the Research Area
So far many researches have been conducted on finding various measurements regarding search engines. Mostly evaluation of the search engines have been done based on finding the precision and recall of the search results. Besides performance of search engines based on queries regarding metadata (e.g. image search), different languages, semantic search etc. has been measured [4-6].
It needs to define a transparent comparison among these search engines to identify their performance level. Depending on the relevancy of the retrieved documents for the differently formed query set the comparisons can be captured. The approach will be an experimental effort. For this experiment the most general purposed and keyword based search engines like Google, Yahoo, Bing (Windows Live Search) and Ask will be used.
Problem Definition
There are many online free search engines available throughout the world. They are ranked by their number of hits. Number of hits or human preference does not indicate the performance level. Depending on the performance issue a comparison within these search engines should be made.
Research Questions
Which search engines retrieves comparatively most relevant documents for the first several hits? Analyzing and answering this question we will measure the proficiency of the search engines and thus this question will be indicated as our research question.
Goal
The goal of this study is to make comparison among various search engines.
Objectives
The objectives of this paper refer to the ways that indicate how the goal of this research has been achieved. The following indications identify the means of achieving the desired goal.
Literature study and review in terms of query searching via search engines
Defining the query categories covering different search criteria
Evaluating the query search results
Observing the evaluation of query search results
Method
The method of this research is the combination of both qualitative and quantitative approach. Since our goal is to compare the search engines on the basis of some key search criteria, we have used inductive reasoning logic. Observing the consequences from the search results that are based on the set search criteria, we will infer the existing antecedents. From these antecedents we will generalize the observations that will draw the comparison among the search engines.
Method Discussion
The qualitative approach of this research convey the literature study and review for articulating the proper reasoning in terms of indicating comparison criteria among different search engines. The quantitative part of this paper is for evaluating the different query results. On the basis of these query results the comparison will also be indicated. This quantitative analysis will draw the performance of different search engines indicating the precision and recall of the query results. This quantitative approach can be discussed according to the following steps.
Firstly, the queries will be classified for performance measurement of the search engines.
Secondly, the relevant documents for the set queries will be retrieved from the search engines.
Thirdly, the precision and recall for the retrieved documents will be measured by the following formulas:
Formula 1: Precision Finding Formula[7]
Formula 2: Recall Finding Formula[7]
Fourthly, the precision–recall curves will be drawn to evaluate the performance of different search engines.
Finally, analyzing these curves for various search engines under different classification criteria the comparisons will be depicted as a generalized format.
Finding the existing antecedents from the precision-recall curve for various search criteria the comparison will be generalized and thus the inductive reasoning will also be maintained.
Limitations
The number of retrieved documents in first several pages for a specific query may vary for the time span due to their frequency of hits and search engines’ internal algorithms. So, a variation of analyzed results may occur time to time.
Extended Background
In this chapter we will discuss about the theoretical background of this research. The important keys and terminologies related to the topic for understanding it will be described here based on some prior studies by different authors or relevant sources.
Search engines and basic search techniques
Search engine is the engine for information retrial from a wide collection of sources. According to the Webopedia definitions search engine is “a program that searches documents for specified keyword and returns a list of the documents where the keywords were found”[8]. The top most search engines ranked by U.S. searches for August 2009 are Google, Yahoo, MSN, AOL, ASK etc.
Figure 1: Top ten search engines for August, 2009 [2]
The frequency of searching by Google is the highest and has a quite handsome difference than others. Although these differences show the level of human interaction with the specific search engine, it cannot conclude their performance level.
Search engines search information from web using hyper text transfer protocol (http) and return us the most relevant documents they find in the World Wide Web according to their strength. There are many techniques to get better search results from web. But all of them are not appropriate for all search engines [9].
“The ability to support advanced search options of native database interfaces, utilize Boolean, wildcard and proximity operators, and allow field searching are also significant functions”[10].
The variation of the search results can be observed using these functions. In this paper we have considered the following search techniques for building up search queries.
Wild cards
Boolean search approach
Keyword based search
Besides we will evaluate search engines with considering queries with different language and metadata respectively.
Query with Bangla language
Query for image searching (Metadata searching)
Truncation (Wildcards)
Truncation can be used for searching the documents related with truncated matched words. The asterisk (*) symbol is generally used for truncation and it is called wildcards. As example- if we truncate a word like “matern*”, it will bring all the documents where this pattern can be found in words (e.g.: maternal, maternity etc.). Wildcards can give more relevant results enhancing the effectiveness of search mechanism [11]. We can include some conditions for getting extended collection of information when we are roaming around the search queries.
“The ability to express typed conditions with wildcards is very useful to the users and it is necessary not to restrict the expressive powers of users’ queries against the object views of Web data sources”[12].
This is why we considered wildcards for forming queries and checking through selected search engines.
Boolean search approach
Presently the search queries that we use for information retrieval is mostly based on natural language form. But queries can also be formed using Boolean operators like AND, OR and NOT. The keywords joining with the AND operator will be retrieved in the resultant documents. The documents containing any of the keywords will be retrieved that are joined with OR operator. NOT operator will discard those documents where the NOT adjacent keyword will not be present.
Keyword based search
Keyword is the major or explanatory word of a query. Most of the search engines do the searching based on the keywords. Keywords are the most important key of searching information. As many keywords we can include in our searching, as much possibility it grows to find the most relevant documents.
Metadata search
Metadata refers the data that gives information about the data. So, metadata gives the data about data. Image, video, audio etc. are such type of data that are not self explanatory. We cannot search them according to their content. As a result, to identify their content we need to provide information about their contents. These data are basically the file objects. Files have their additional attributes like file type, created date, owner, title etc. Providing the information of these attributes we can search these data or files by the search engines. To get the most relevant data it requires specifying this information as accurately as can be done.
So searching these file objects to evaluate the search engines is another approach of this paper.
Language search
We have huge collection of information in the web in different languages. The search performance of the search engines can also be evaluated considering the queries in different language format. As an experiment in this paper we have considered the Bangla language.
Precision Vs Recall
We will find out the precision and recall from the query results. Precision is the fraction of retrieved documents that are relevant whereas recall is the fraction of relevant document that have been retrieved[13]. The query results will be plotted for getting the precision-recall curve that will guide us to find out comparatively better collection of retrieved documents by various search engines. The following figure shows the quality of precision-recall curves for retrieved documents.
Figure 2: Precision-Recall Curve’s Quality
The curve with red color has bad retrieval of documents. The blue curve has average retrieval and the green curve has a very good retrieval of documents. Theoretically the dotted curve has the best retrieval as it returns all the available documents where every one of them is relevant. But, in reality it is not possible.
“If an algorithm always retrieves all documents in a document base, it has one hundred percent recall. However, it presumably has lower precision … Precision at a certain point of recall indicates how much garbage readers have to wade through until they know they have found at least half of the interesting documents.[14]”
The characteristics of our analyzed curves will be measured according to this quality oriented precision-recall curves.
Data collection
Data collection has been conducted by browsing through different search engines for some specific search criteria. The first retrieved 30 documents for each query and each search engine have been enlisted as the preliminary experimental data. Manually checking each document the relevancy has been found. Based on the relevant documents and total retrieved documents the precision and recall has been calculated.
Prepare Search Queries
The preliminary and most important task of this data collection part is to set the search criteria and their sample queries. We have discussed earlier that there are different search techniques available for finding documents and according to these techniques we set our search criteria on six categories. These search criteria are as follows.
C1: Query with formal communicative words (including stop words )
C2: Query with Boolean search
C3: Query with based on keyword search
C4: Query with other language (Bengali)
C5: Query with wildcards
C6: Query focused on metadata search
For these six search criteria or categories (C1…C6) we have defined six queries (Q1…Q6) that belong to these categories. The defined queries are as follows.
C1
Q1: What are the maternal health problems in the rural area of Bangladesh?
The, in, of -> stop Words
C2
Q2: Maternal AND (Health OR Problems) AND (Rural AND Bangladesh)
AND, OR -> Boolean Operators
C3
Q3: Maternal health problem rural Bangladesh
All the words are keywords
C4
Q4: ??????? ?????? ????
Bengali words substitute to “risk during maternal period”
C5
Q5: matern* health problem in Bangladesh
*= wildcards; matern* should retrieve all the words containing ‘matern’ (e.g.: maternity, maternal etc.)
C6
Q6: Location of KTH Kista Campus Sweden
These query is for image search module of search engine (e.g.: http://images.google.com/ )
Table 1: search criteria and their adjacent sample queries
Calculate precision and recall
The set queries have been experimented for retrieving documents through the target search engines. The precision has been calculated with the regular difference of each 5 documents. As example, within top 5 documents we have found the number of relevant documents. Again, for next five documents that means within 10 documents we have found the total number of relevant documents. In this manner we have searched up to 30 retrieved documents.
So, for 30 documents with regular distant of 5 documents 6 precision points can be gained as follows.
Precision points = Nord top5/5, Nord top10/10, Nord top15/15, Nord top20/20, Nord top25/25, Nord top30/30
[Nordtop[n] = Number of relevant documents for top n retrieved documents]
All the relevant documents cannot be retrieved by a specific search engine. We can get some common retrieved and relevant documents for several search engines. For finding out the recall the set of relevant documents will be the summation of all the relevant documents found by the selected search engines. Since for each search engine we have considered the first 30 documents as experiment, we have found all the relevant documents within the first 30 documents of each search engine.
So, for each search engine we can again get 6 recall points for the regular distant of 5 documents as follows.
Recall points= Nordtop5/Rdoc, Nordtop10/Rdoc, Nordtop15/Rdoc, Nordtop20/Rdoc, Nordtop25/Rdoc, Nordtop30/Rdoc
[Nordtop[n] = Number of relevant documents for top n retrieved documents
Rdoc = Number of total relevant documents]
Below we have enlisted the detail precision (P) and recall (R) value for the search engines Google, Yahoo, Bing and Ask with respect to the set queries (Q1…Q6).
Q1
Q2
Q3
Q4
Q5
Q6
P
R
P
R
P
R
P
R
P
R
P
R
Top 5
5/5
5/54
4/5
4/67
5/5
5/70
4/5
4/14
4/5
4/47
4/5
4/16
Top 10
8/10
8/54
9/10
9/67
9/10
9/70
7/10
7/14
5/10
5/47
7/10
7/16
Top 15
12/15
12/54
14/15
14/67
14/15
14/70
9/15
9/14
9/15
9/47
9/15
9/16
Top 20
17/20
17/54
18/20
18/67
19/20
19/70
9/20
9/14
9/20
9/47
10/20
10/16
Top 25
19/25
19/54
23/25
23/67
22/25
22/70
9/25
9/14
10/25
10/47
13/25
13/16
Top 30
22/30
22/54
27/30
27/67
26/30
26/70
11/30
11/14
12/30
12/47
16/30
16/16
Table 2: Found precision and recall in Google for the queries (Q1...Q6)
Yahoo
Q1
Q2
Q3
Q4
Q5
Q6
P
R
P
R
P
R
P
R
P
R
P
R
Top 5
3/5
3/54
4/5
4/67
3/5
3/70
1/5
1/14
5/5
5/47
Top 10
6/10
6/54
9/10
9/67
6/10
6/70
4/10
4/14
10/10
10/47
Top 15
8/15
8/54
12/15
12/67
8/15
8/70
12/15
12/47
Top 20
11/20
11/54
17/20
17/67
11/20
11/70
13/20
13/47
Top 25
16/25
16/54
21/25
21/67
15/25
15/70
18/25
18/47
Top 30
18/30
18/54
26/30
26/67
17/30
17/70
20/30
20/47
Table 3: Found precision and recall in Yahoo for the queries (Q1...Q6)
Bing
Q1
Q2
Q3
Q4
Q5
Q6
P
R
P
R
P
R
P
R
P
R
P
R
Top 5
3/5
3/54
3/5
3/67
3/5
3/70
1/14
3/5
3/47
0/2
Top 10
4/10
4/54
8/10
8/67
8/10
8/70
4/10
4/47
Top 15
5/15
5/54
12/15
12/67
11/15
11/70
4/15
4/47
Top 20
10/20
10/54
17/20
17/67
16/20
16/70
4/20
4/47
Top 25
14/25
14/54
22/25
22/67
19/25
19/70
6/25
6/47
Top 30
16/30
16/54
25/30
25/67
22/30
22/70
8/30
8/47
Table 4: Found precision and recall in Bing for the queries (Q1...Q6)
Ask
Q1
Q2
Q3
Q4
Q5
Q6
P
R
P
R
P
R
P
R
P
R
P
R
Top 5
1/5
1/54
4/5
4/67
5/5
5/70
3/5
3/14
2/5
2/47
Top 10
5/10
5/54
6/10
6/67
8/10
8/70
5/10
5/14
4/10
4/47
Top 15
8/15
8/54
9/15
9/67
13/15
13/70
7/15
7/14
8/15
8/47
Top 20
9/20
9/54
9/20
9/67
17/20
17/70
7/20
7/14
10/20
10/47
Top 25
11/25
11/54
9/25
9/67
20/25
20/70
7/25
7/14
10/25
10/47
Top 30
14/30
14/54
10/30
10/67
24/30
24/70
8/30
8/14
10/30
10/47
Table 5: Found precision and recall in Ask for the queries (Q1...Q6)
Analysis
Figure 3: Precision-Recall Curves for query q1-q5 (experimented on Google)
For query one (q1) we have fair amount of retrieval. Out of first 30 documents we got 22 relevant documents. So, we have high precision. But out of total 54 relevant documents we had only 22 relevant documents that ensure a lower recall.
While searching with Boolean expression (q2) we have got a very good amount of retrieval. Out of first 30 documents we got 27 relevant documents that ensure a high precision. But the number of total found relevant documents are 67 that imply a lower recall.
Keyword based search (q3) also resulted a good precision with lower recall.
Query four (q4) was about searching with Bengali language. For this special search Google returned a very bad precision with high recall. Out of total 14 relevant documents it returned 11 of them whereas it only returned 11 relevant documents from 30 retrieved documents.
Searching with wildcards (q5) retrieved a wide collection of documents. Both the precision and recall are lower for this case. It returned only 12 relevant documents out of 47 relevant documents.
Figure 4: Precision-Recall Curves for query q1-q3, q5 (experimented on Yahoo)
Here we have plotted precision-recall curves for query q1, q2, q3 and q5. Query with Bengali language returned a limited amount of documents. It returned only 9 documents where only three were relevant.
According to the graph Yahoo has returned higher recall for q1. It produced comparatively lower precision.
For Boolean search (q2) Yahoo has returned higher precision and medium recall. Out of 67 relevant documents it retrieved 26 of them.
Yahoo has very bad recall for keyword based search (q3). Out of 70 documents it retrieved 17 of them.
For wildcard searching (q5) Yahoo has average precision-recall quality curve. Out of 47 relevant documents it retrieved 20 relevant documents.
Figure 5: Precision-Recall Curves for query q1-q3, q5 (experimented on Bing)
Again, for the search engine Bing we have got four curves. Query with Bengali language did not return a single document.
Bing has produced lower recall for q1. Out of 54 relevant documents it retrieved only 16 of them. However q2 has higher recall and precision. For q3 we have medium precision but lower recall. Out of 70 relevant documents we have only 22 of them.
Query with wildcards (q5) for Bing has returned a very poor precision recall curves.
Figure 6: Precision-Recall Curves for query q1-q5 (experimented on ASK)
For search engine ASK we have got five curves. Query (q4) with Bengali language returned only 30 documents and among these documents we have only 8 relevant documents. So, it has lower precision. Since we have in total 14 relevant documents for q4, it results a very high recall rate.
According to the figure the search result for q1 has a very lower precision and its recall is also lower because out of 54 relevant documents it only retrieved 14 of them.
Boolean search (q2) for ASK has resulted poor precision-recall curves. Only 10 relevant documents were retrieved out of 67 relevant documents.
For keyword based (q3) search the resultant precision-recall curves is fair. It has higher precision and medium recall.
Searching with wildcards (q5) for search engine ASK has returned a very lower precision as well as lower recall since we have got in total 47 relevant documents where only 10 of them were retrieved.
Discussion
Generally for a query, a search engine retrieves many documents relating with the given terms in the query and most of the time we check first couple of pages to get our desired answers. Many relevant documents can be found in the deeper pages of search results. Since it is not possible for a human to go through all the pages, we picked first 30 documents from each search engine and thus we accumulated our set of relevant documents from total of 120 documents. Since we may have many relevant documents in the deeper of the resultant pages, the analyzed recall will not give us accurate points to be plotted. Rather it may give us a generic idea about the resultant curves.
The search results on the set search criteria evaluate the considered search engines. For evaluating these search engines we have considered a scoring system. The scoring system defines the quality of the precision of the curves. According to the system the ‘scoring rage’ will define the range of relevant documents and the ‘scoring quality’ will define the quality of precision.
Scoring Range (no of relevant documents)
Scoring Quality (for precision)
Very few retrieved documents
Undefined
<10
Very Poor
10-14
Poor
12-19
Fair
20-24
Good
25-30
Very good
Table 6: Scoring System for determining precision quality
Now According to this scoring system the quality of precision for the search results and the corresponding search engines can be found in the following table.
Search Criteria (Query)
Natural Language Search (Q1)
Good
Fair
Fair
Poor
Boolean Search (Q2)
Very Good
Very Good
Very Good
Poor
Keyword Based search (Q3)
Very Good
Fair
Good
Good
Search with Bengali Language (Q4)
Poor
Undefined
Undefined
Very Poor
Search with Truncation (wildcards) (Q5)
Poor
Good
Very Poor
Poor
Metadata Search (Q6)
Fair
Undefined
Undefined
Undefined
Table 7: performance determination for various search engines
The precision-recall curves determine that for search criteria one Google has the best quality curve.
For Boolean search Google also produce the best precision recall curve. For this case, Yahoo and Bing also produce very good quality curves.
Again Google is pioneered for keyword based searching. It produces very good precision-recall curve respect to other pioneered ASK and Bing.
For query with Bengali language we couldn’t plot the precision and recall points for Yahoo and Bing as they could not find enough documents. Yahoo in total retrieved 9 documents and only 2 of them were relevant. Again, Bing only retrieved 2 documents and one of them was relevant. For this special case Ask had very lower precision. Although Google produces poor precision, it retrieved lots of documents.
While searching with truncation approach (using wildcards) Yahoo got better results than Google and other search engines. Although for this case the quality of precision-recall curves is not good for all of the search engines, Yahoo got slight better results than others.
The last search criteria were on metadata search. The probability of getting an object (image, video) on searching increases depending on the information provided against it. For this searching Google produced fair precision-recall curve. All other search engines could not find any relevant items.
The overall observation depicts that the Boolean search works best for information searching. All selected search engines produced good results for Boolean search except Ask. Then the keyword based search also works fine for information retrieval. It is applicable for all search engines, since search engines generally search documents on basis of keywords.
During searching we have also observed the following compatibility issues for targeted search engines. The tick symbol (v) denotes that the observed phenomenon is supported by the search engines. The cross symbol (X) denotes that it is not supported by the search engines.
Auto Suggestion
Spelling Suggestion (tried: Animol)
Stemming
computation
Truncation
Compute*
Yahoo
X
Bing
X
X
ASK
X
X
Table 8: General phenomenon supported by the search engines
Conclusion
Our analysis showed the precision and recall of the query results. These have been made based on the relevancy of the search results. Defining the quality of curves we have got the performance of search engines for various search criteria. The results show that Google has better performance for all the search criteria except truncation. For truncation Yahoo produces comparatively better results. Besides comparatively all the search engines have good performance regarding Boolean and keyword based searching.
Every search engine has their own mechanism of indexing the web pages so that they can be optimized for searching. If these mechanisms change, search results can be varied. Again better hits can give a page better preference of being shown within first few results. So, the relevancy for the specific query and search engine can also be varied time to time. Again, due to less hits a relevant page can go down of the search results.
This is Preview only. If you need the solution of this assignment, please send us email with the complete assignment title: ProfessorKamranA@gmail.com