Modelling the Web

     

    Possible models:

      Document sizes: "Heavy tail" distribution 
        The big majority are small documents but the number of big documents is important
      Vocabulary: sublinear growth (Heaps' law) 

      Word distribution: generalized Zipf's law

       
  
 
principal    indice