AItools

Synopsis

AItools is a suite to address mining and retrieval tasks and is developed and maintained by the Webis Group. The suite comprises basic and advanced algorithms, data structures, and design patterns to model real-world retrieval processes. The following figure illustrates typical steps of an AItools processing pipe.

UML Activity Process

The solution of information retrieval problems requires highly developed skills from different computer science fields. AItools shall simplify the development and minimize time to market of new information retrieval technology by providing approved solutions, a unified interface to algorithms, rapid prototyping, and experiment automation. The AItools suite is organized in the table below: gray cells correspond to software components, colored cells name component classes. The components are arranged, from left to right, according to five functional areas (the colors indicate these areas): Acquisition [aq], Information Extraction [ie], Information Retrieval [ir], Data Mining [dm], and Information Visualization [iv]. The cells are hyperlinked and clicking on the text in any cell will open the documentation for the chosen component or class. The drop-down "Filter..." box highlights classes with respect to the language. The letter codes in the list above [aq, ie, ir, dm, iv] are used within the Java package naming scheme of AItools.




  • Data Structure
  • Feature
  • Feature (JavaScript)
  • Document
  • Document (JavaScript)
  • Big Hashmap
  • Big Hashmap (C++)
  • Inverted Index
  • Inverted Index (C++)
  • Vector
  • Vector (JavaScript)
  • Dense Matrix
  • Sparse Matrix
  • Feature Set
  • Suffix Tree
  • Undirected Graph
  • Graph (JavaScript)
  • k-Nearest Neighbor Graph (JavaScript)
  • Undirected Int Hash Graph
  • Undirected Long Hash Graph
  • Vocabulary (JavaScript)
  • Web Search
  • Bing
  • ChatNoir
  • Etools
  • Google
  • Wikipedia
  • YaCy
  • Yahoo
  • Web Download
  • Wikidump
  • Wget Wraper
  • Parser
  • Wikidump
  • Text Preprocessing
  • Plain Text Extractor
  • PS to Plain Text Converter
  • POS Tagging
  • Least Frequent Substring Stemmer
  • Porter Stemmer
  • Snowball Stemmer
  • Smart Spell
  • Stopword Filter
  • Decomposition
  • Character Chunk
  • Character N-gram
  • Word Chunking
  • Word ICU4J
  • Word N-gram
  • Word OpenNLP
  • Word Tokenization
  • Sentence ICU4J
  • Sentence OpenNLP
  • Keyphrase Extraction
  • Frequent Phrase
  • Head Noun Phrases
  • Repeated String
  • Stanford Noun Phrase Wrapper
  • TextRank
  • Language Detection
  • Character Trigrams
  • Retrieval Model
  • Retrieval Model (JavaScript)
  • Divergence from Randomness
  • ESA
  • Fuzzy Fingerprinting
  • LSI
  • OkapiBM25
  • Language Model
  • Suffix Tree
  • Tf
  • Tf (JavaScript)
  • TfIdf
  • TfIdf (JavaScript)
  • TfPdf
  • Vector Space Model
  • Relevance Measures
  • Cosine Similarity
  • Cosine Similarity (JavaScript)
  • Dot Product Similarity
  • Jaccard Similarity
  • Jensen-Shannon Divergence
  • Kullback-Leibler Divergence
  • Pointwise Kullback-Leibler Divergence
  • Cluster Algorithm (CA)
  • Complete Link
  • DbScan
  • Graph Clustering
  • High Recall Clustering
  • KNNHAC
  • k-Means
  • MajorClust
  • MajorClust (JavaScript)
  • Randomized Clustering (JavaScript)
  • Single Link
  • Soft Clustering
  • Suffix Tree Clustering
  • Unweighted Average Link
  • Ward's Method
  • Weighted Average Link
  • CA Validation
  • Davies-Bouldin Index
  • Dunn Index
  • Expected Density
  • Expected Density (JavaScript)
  • Silhouette Coefficient
  • BCubed Precision
  • BCubed Recall
  • Completeness
  • F-Measure
  • Folkes-Mallows Index
  • Homogeneity
  • Inverse Purity
  • Jaccard Coefficient
  • Misclassification Index
  • Normalized Entropy
  • Purity
  • Q-Measure
  • Adjusted Rand Index
  • Rand Index
  • V-Measure
  • Variation of Information
  • Cluster Labeling (CL)
  • BiSecting k-Means
  • Frequent Predictive Words
  • Keyphrase Labeling
  • Lingo
  • Long Tail Labeling
  • No Shadowing Labeling
  • Soft Labeling
  • SuffixTree Labeling
  • Topic Identifier
  • Weighted Centroid Covering
  • Weighted Centroid Covering (JavaScript)
  • CL Validation
  • MAP
  • Match@N
  • MRR
  • NDCG@R
  • Precision@N
  • Graph Drawing
  • Distortion Visualization
  • JTree Visualization
  • Reingold Tilford Graph
  • Walker Linear Graph
  • Multidimen- sional Scaling
  • Chalmers 1996
  • Chalmers 2003
  • Jourdan 2004
  • Jourdan 2004 Multiscale
  • Spring Model
  • Stein 2006

People