Indexing the Web 

 
 
    Crawl the WWW collecting new pages 

    Update the inverted index 

    • a list of sorted words pointing for each word to the pages containing it 
    • current technologies can reduce this index to less than 5% of the text 
    Search: binary search plus query evaluation 

    Relevance ranking usually depends on word occurrence 

    Iterative refinement: complex queries 

  
principal    indice