
Livro digital
Título:
Mining of Massive Datasets
Autor:
Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
Categoria:
Tecnologia > Dados
Doador:
Raffaello D. N.
Sinopse:
A reader confronting data that no longer fits in memory needs more than theory, and the table of contents makes that clear from the start: distributed file systems and MapReduce come first, before similarity search, stream processing, and search-engine ranking. The book opens by framing mining as an algorithmic problem at massive scale, then moves immediately into the software stack that makes such scale practical.
From there, the chapters build a full toolkit: minhashing and locality-sensitive hashing, data-stream algorithms, PageRank and link-spam detection, frequent-itemset mining, clustering, advertising and recommendation, large-graph analysis, dimensionality reduction, and machine-learning methods that can survive huge datasets. The front matter also shows a collaborative academic origin, with material drawn from Stanford courses and shaped by three authors working across web mining, network analysis, and large-scale data-mining projects.
What emerges is a textbook for readers who already have some database, algorithms, and software-systems background and want the methods that matter when data is too large for ordinary approaches. It gives not just concepts, but the algorithmic patterns behind modern large-scale analysis, making it useful both as a course text and as a reference for practical data-mining work.