Home

Forums

Web development

 

 

 

 
     
 
dna88 Web development and Technology Forum
 
Profile   Register   Memberlist   Usergroups   FAQ   Search  Log in
google programming searchengine

 
Post new topic   Reply to topic    dna88 Forum Index -> Web design and solution Discussion Forum
Author Message
onauc
Beginner User
Beginner User


Joined: 18 Nov 2004
Posts: 15

Post Post subject: google programming searchengine Reply with quote

Howdy,

I want to know how the google, webcrawler etc. searchengines really work as I am learning php programming and want to write a searchengine.
I have read around 10 websites, found on google, about “how searchengines work” and not a single one of them make it clear if it is the spider or the index or the search software does the ranking according to it’s ranking algorithm.
All they ever say is that, a searchengine has 3 softwares :
a) the spider
b) the index
c) the search system (search-box, template, etc.)
The spiders crawl the web collecting webpages and then forward them to the index and then the search software searches the index for the sought keywords/phrases.
Also, some say that the spiders copy the whole website into it’s index. So, in other words, there is 2 copies of a website. One residing in the website owner’s webserver and the other residing on the index of the searchengine.
So now, I can only assume 3 possibilities how a searchengine works from all this:

1.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) gives the ranking according to the searchengine’s ranking algorithm.
This means, the spider nor the index is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.

OR

2.
The spider does the ranking according to the searchengine’s ranking algorithm.
It visits a website and grabs all it’s html codes (copy a website) and then finally dump the html codes to it’s index. When it dumps the copies of websites it ranks them according to the searchengine’s algorithm.
The Index is nothing but a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm because that has been already done by the spider when dumping the data onto the index.
This means, the spider is responsible for giving the ranking and not the index nor the search-system responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.

OR

3.
The spider does not do the ranking according to any algorithm.
All it does is visit a website, grab all it’s html codes (copy a website) and then dump the html codes to it’s index.
The Index is not only a big txt file (.txt, .html) on the searchengine’s webserver that keeps full copy (html codes) of each website but also the system that does the ranking.
When it receives data from the spider, it ranks the links in it’s database according to the searchengine’s ranking algorithm.
The search-system, when searching and finding links (in the index) does not give the ranking according to the searchengine’s ranking algorithm.
Frankly, all it does is output a copy of certain parts of the index onto a searcher’s screen.
This means, neither the spider or the search-system is responsible for the ranking because these 2 parts of the searchengine are not taught the ranking algorithm.


So, which assumption is correct according to the 3 above ?
_________________
I would like to come-up with my own "Compression Algorithm" and teach that to the browsers so you can now show streaming videos and lengthy animations from your website without losing an arm and a leg on your band-width.
:)
Mon Dec 13, 04 5:45 pm
Back to top
onauc View user's profile Send private message
hasnut
Expert User
Expert User


Joined: 28 Aug 2004
Posts: 201

Post Post subject: Reply with quote

I didn't read all your post but the below links will help you creating one
some are opensource
http://www.searchtools.com/tools/tools-opensource.html
http://www.nutch.org/docs/en/developers.html
_________________
Sarder Hasnut
MCSD, CIW A

Need Low Cost Prefessional Hosting Contact me
Thu Dec 16, 04 9:44 pm
Back to top
hasnut View user's profile Send private message Visit poster's website MSN Messenger
onauc
Beginner User
Beginner User


Joined: 18 Nov 2004
Posts: 15

Post Post subject: Reply with quote

hasnut wrote:
I didn't read all your post but the below links will help you creating one
some are opensource
http://www.searchtools.com/tools/tools-opensource.html
http://www.nutch.org/docs/en/developers.html


Hi,

Better to read my whole post.
Also, the link you gave me mentioned that it lists searchengines that should be runned from the command line rather than via a browser.
Frankly, I don't like DOS looking stuffs.
_________________
I would like to come-up with my own "Compression Algorithm" and teach that to the browsers so you can now show streaming videos and lengthy animations from your website without losing an arm and a leg on your band-width.
:)
Fri Dec 17, 04 6:44 am
Back to top
onauc View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    dna88 Forum Index -> Web design and solution Discussion Forum All times are GMT - 7 Hours
Page 1 of 1

 

Partners and Resources

Bangladesh hosting company

Bangladesh web design

Driven by phpBB © phpBB Group