Technical Issues
Text is normalized: letters, spacing, stop-words, etc.
Text can be compressed to 30% allowing random-access search
Approximate search can be performed sequentially over the vocabulary
Structure should be also indexed
Using logical blocks the index size is reduced
Smaller pointers and profit is made from the word distribution
|