Data Mining
Data Mining:
A process of analysing business data (often stored in a data warehouse) to uncover hidden trends and patterns and establish relationships. Data mining is normally performed by expert analysts who use specialist software tools.
From: www.oranz.co.uk/glossary_text.htm
The process of using statistical techniques to discover subtle relationships between data items, and the construction of predictive models based on them. The process is not the same as just using an OLAP tool to find exceptional items. Generally, data mining is a very different and more specialist application than OLAP, and uses different tools from different vendors. Normally the users are different, too. OLAP vendors have had little success with their data mining efforts.
From : www.olapreport.com/glossary.htm
Orange
Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts (easier and better), or through GUI objects called Orange Widgets.
Orange is distributed free under GPL and can be downloaded from the download page; if you'll run it on MS Windows and are impatient, download the latest snapshot (recommended). Some Features of OrangeOrange is a component-based framework, which means you can use existing components and build your own ones. You can even prototype your own components in Python, and use it in place of some standard C-based Orange component. For instance, you may craft your own function for attribute quality estimation, and use it within Orange's classification tree induction algorithm. Orange provides for some elementary components and more complex components build from elementary ones, and uses Python as a glue language. Some of the readily-available features of Orange include:
Data input/ouput: Orange can read from and write to tab-delimited files and C4.5 files, and supports also some more exotic formats.
Preprocessing: feature subset selection, categorization, feature utility estimation for predictive tasks.
Predictive modelling: classification trees, naive bayes, k-NN, majority classifier, support vector machines, logistic regression. Ensemble methods like boosting and bagging are also included .
Model validation: different data sampling and validation techniques (like cross-validation, random sampling, etc.), and various statistics for model validation (classification accuracy, AUC, sensitivity, specificity, ...) are included. Orange evaluation schemas support caching: validation results (class probabilities) are stored, and rerunning the validation will only validate new classifiers.
visit
Orange
Orange was visited : 67 times
Loading .....