Details for: Modeling the Internet and the Web : probabilistic methods and algorithms /

Image from Syndetics

Normal view MARC view ISBD view

Modeling the Internet and the Web : probabilistic methods and algorithms / by Pierre Baldi

By: Baldi, Pierre

Contributor(s): Frasconi, Paolo

| Smyth, Padhraic

Publisher: Hoboken : Wiley, 2003Description: 285p.; 24cm001: 11398ISBN: 0470849061Subject(s): Internet | Telecommunications | Mathematics | CyberspaceDDC classification: 004.678015 BAL Online resources: Click here to access online

Holdings
Item type	Current library	Collection	Call number	Copy number	Status	Date due	Barcode
Book	MAIN LIBRARY Book	PRINT	004.678015 BAL (Browse shelf(Opens below))	1	Available		082308

Enhanced descriptions from Syndetics:

Modeling the Internet and the Web covers the most important aspects of modeling the Web using a modern mathematical and probabilistic treatment. It focuses on the information and application layers, as well as some of the emerging properties of the Internet.

 Provides a comprehensive introduction to the modeling of the Internet and the Web at the information level.
 Takes a modern approach based on mathematical, probabilistic, and graphical modeling.
 Provides an integrated presentation of theory, examples, exercises and applications.
 Covers key topics such as text analysis, link analysis, crawling techniques, human behaviour, and commerce on the Web.

Interdisciplinary in nature, Modeling the Internet and the Web will be of interest to students and researchers from a variety of disciplines including computer science, machine learning, engineering, statistics, economics, business, and the social sciences.

"This book is fascinating!" - David Hand (Imperial College, UK)

"This book provides an extremely useful introduction to the intellectually stimulating problems of data mining electronic business." - Andreas S. Weigend (Chief Scientist, Amazon.com)

Includes diagrams, charts, tables

Includes index

Table of contents provided by Syndetics

Preface (p. xiii)
1 Mathematical Background (p. 1)
1.1 Probability and Learning from a Bayesian Perspective (p. 1)
1.2 Parameter Estimation from Data (p. 4)
1.2.1 Basic principles (p. 4)
1.2.2 A simple die example (p. 6)
1.3 Mixture Models and the Expectation Maximization Algorithm (p. 10)
1.4 Graphical Models (p. 13)
1.4.1 Bayesian networks (p. 13)
1.4.2 Belief propagation (p. 15)
1.4.3 Learning directed graphical models from data (p. 16)
1.5 Classification (p. 17)
1.6 Clustering (p. 20)
1.7 Power-Law Distributions (p. 22)
1.7.1 Definition (p. 22)
1.7.2 Scale-free properties (80/20 rule) (p. 24)
1.7.3 Applications to Languages: Zipf's and Heaps' Laws (p. 24)
1.7.4 Origin of power-law distributions and Fermi's model (p. 26)
1.8 Exercises (p. 27)
2 Basic WWW Technologies (p. 29)
2.1 Web Documents (p. 30)
2.1.1 SGML and HTML (p. 30)
2.1.2 General structure of an HTML document (p. 31)
2.1.3 Links (p. 32)
2.2 Resource Identifiers: URI, URL, and URN (p. 33)
2.3 Protocols (p. 36)
2.3.1 Reference models and TCP/IP (p. 36)
2.3.2 The domain name system (p. 37)
2.3.3 The Hypertext Transfer Protocol (p. 38)
2.3.4 Programming examples (p. 40)
2.4 Log Files (p. 41)
2.5 Search Engines (p. 44)
2.5.1 Overview (p. 44)
2.5.2 Coverage (p. 45)
2.5.3 Basic crawling (p. 46)
2.6 Exercises (p. 49)
3 Web Graphs (p. 51)
3.1 Internet and Web Graphs (p. 51)
3.1.1 Power-law size (p. 53)
3.1.2 Power-law connectivity (p. 53)
3.1.3 Small-world networks (p. 56)
3.1.4 Power law of PageRank (p. 57)
3.1.5 The bow-tie structure (p. 58)
3.2 Generative Models for the Web Graph and Other Networks (p. 60)
3.2.1 Web page growth (p. 60)
3.2.2 Lattice perturbation models: between order and disorder (p. 61)
3.2.3 Preferential attachment models, or the rich get richer (p. 63)
3.2.4 Copy models (p. 66)
3.2.5 PageRank models (p. 67)
3.3 Applications (p. 68)
3.3.1 Distributed search algorithms (p. 68)
3.3.2 Subgraph patterns and communities (p. 70)
3.3.3 Robustness and vulnerability (p. 72)
3.4 Notes and Additional Technical References (p. 73)
3.5 Exercises (p. 74)
4 Text Analysis (p. 77)
4.1 Indexing (p. 77)
4.1.1 Basic concepts (p. 77)
4.1.2 Compression techniques (p. 79)
4.2 Lexical Processing (p. 80)
4.2.1 Tokenization (p. 80)
4.2.2 Text conflation and vocabulary reduction (p. 82)
4.3 Content-Based Ranking (p. 82)
4.3.1 The vector-space model (p. 82)
4.3.2 Document similarity (p. 83)
4.3.3 Retrieval and evaluation measures (p. 85)
4.4 Probabilistic Retrieval (p. 86)
4.5 Latent Semantic Analysis (p. 88)
4.5.1 LSI and text documents (p. 89)
4.5.2 Probabilistic LSA (p. 89)
4.6 Text Categorization (p. 93)
4.6.1 k nearest neighbors (p. 93)
4.6.2 The Naive Bayes classifier (p. 94)
4.6.3 Support vector classifiers (p. 97)
4.6.4 Feature selection (p. 102)
4.6.5 Measures of performance (p. 104)
4.6.6 Applications (p. 106)
4.6.7 Supervised learning with unlabeled data (p. 111)
4.7 Exploiting Hyperlinks (p. 114)
4.7.1 Co-training (p. 114)
4.7.2 Relational learning (p. 115)
4.8 Document Clustering (p. 116)
4.8.1 Background and examples (p. 116)
4.8.2 Clustering algorithms for documents (p. 117)
4.8.3 Related approaches (p. 119)
4.9 Information Extraction (p. 120)
4.10 Exercises (p. 122)
5 Link Analysis (p. 125)
5.1 Early Approaches to Link Analysis (p. 126)
5.2 Nonnegative Matrices and Dominant Eigenvectors (p. 128)
5.3 Hubs and Authorities: HITS (p. 131)
5.4 PageRank (p. 134)
5.5 Stability (p. 138)
5.5.1 Stability of HITS (p. 139)
5.5.2 Stability of PageRank (p. 139)
5.6 Probabilistic Link Analysis (p. 140)
5.6.1 SALSA (p. 140)
5.6.2 PHITS (p. 142)
5.7 Limitations of Link Analysis (p. 143)
6 Advanced Crawling Techniques (p. 149)
6.1 Selective Crawling (p. 149)
6.2 Focused Crawling (p. 152)
6.2.1 Focused crawling by relevance prediction (p. 152)
6.2.2 Context graphs (p. 154)
6.2.3 Reinforcement learning (p. 155)
6.2.4 Related intelligent Web agents (p. 157)
6.3 Distributed Crawling (p. 158)
6.4 Web Dynamics (p. 160)
6.4.1 Lifetime and aging of documents (p. 161)
6.4.2 Other measures of recency (p. 167)
6.4.3 Recency and synchronization policies (p. 167)
7 Modeling and Understanding Human Behavior on the Web (p. 171)
7.1 Introduction (p. 171)
7.2 Web Data and Measurement Issues (p. 172)
7.2.1 Background (p. 172)
7.2.2 Server-side data (p. 174)
7.2.3 Client-side data (p. 177)
7.3 Empirical Client-Side Studies of Browsing Behavior (p. 179)
7.3.1 Early studies from 1995 to 1997 (p. 180)
7.3.2 The Cockburn and McKenzie study from 2002 (p. 181)
7.4 Probabilistic Models of Browsing Behavior (p. 184)
7.4.1 Markov models for page prediction (p. 184)
7.4.2 Fitting Markov models to observed page-request data (p. 186)
7.4.3 Bayesian parameter estimation for Markov models (p. 187)
7.4.4 Predicting page requests with markov models (p. 189)
7.4.5 Modeling runlengths within states (p. 193)
7.4.6 Modeling session lengths (p. 194)
7.4.7 A decision-theoretic surfing model (p. 198)
7.4.8 Predicting page requests using additional variables (p. 199)
7.5 Modeling and Understanding Search Engine Querying (p. 201)
7.5.1 Empirical studies of search behavior (p. 202)
7.5.2 Models for search strategies (p. 207)
7.6 Exercises (p. 208)
8 Commerce on the Web: Models and Applications (p. 211)
8.1 Introduction (p. 211)
8.2 Customer Data on the Web (p. 212)
8.3 Automated Recommender Systems (p. 212)
8.3.1 Evaluating recommender systems (p. 214)
8.3.2 Nearest-neighbor collaborative filtering (p. 215)
8.3.3 Model-based collaborative filtering (p. 218)
8.3.4 Model-based combining of votes and content (p. 223)
8.4 Networks and Recommendations (p. 224)
8.4.1 Email-based product recommendations (p. 224)
8.4.2 A diffusion model (p. 226)
8.5 Web Path Analysis for Purchase Prediction (p. 228)
8.6 Exercises (p. 232)
Appendix A Mathematical Complements (p. 235)
A.1 Graph Theory (p. 235)
A.1.1 Basic definitions (p. 235)
A.1.2 Connectivity (p. 236)
A.1.3 Random graphs (p. 236)
A.2 Distributions (p. 237)
A.2.1 Expectation, variance, and covariance (p. 237)
A.2.2 Discrete distributions (p. 237)
A.2.3 Continuous distributions (p. 238)
A.2.4 Weibull distribution (p. 240)
A.2.5 Exponential family (p. 240)
A.2.6 Extreme value distribution (p. 241)
A.3 Singular Value Decomposition (p. 241)
A.4 Markov Chains (p. 243)
A.5 Information Theory (p. 243)
A.5.1 Mathematical background (p. 244)
A.5.2 Information, surprise, and relevance (p. 247)
Appendix B List of Main Symbols and Abbreviations (p. 253)
References (p. 257)
Index (p. 277)

There are no comments on this title.

to post a comment.