
Livro digital
Título:
Search Engines: Information Retrieval in Practice
Autor:
W. Bruce Croft, Donald Metzler, Trevor Strohman
Categoria:
Tecnologia > Dados
Doador:
Raffaello D. N.
Sinopse:
Search is everywhere, but the internals of a search engine remain opaque to most of the engineers who rely on them. This textbook from UMass Amherst's Center for Intelligent Information Retrieval solves that: it walks from raw document acquisition to final ranking in a single coherent arc, and the last chapter, Beyond Bag of Words, pushes into term dependence models, XML retrieval, entity search, and multimodal content, topics that standard IR courses skip.
The chapters follow the architecture of a real engine: crawling and document feeds, text processing with statistical laws (Zipf, Heaps), inverted index construction including MapReduce-scale indexing, query processing and relevance feedback, retrieval models from Boolean through vector space to language models and inference networks, evaluation with TREC test collections and clickthrough data, and classification and clustering using Naive Bayes and Support Vector Machines. The Galago search engine ships alongside the book so readers can run experiments against real document collections rather than toy examples.
Designed for undergraduates and graduate students in computer science and information science, and freely released by the authors in 2015, this textbook remains one of the clearest paths from zero to production-ready search engineering knowledge.