Machine Learning

Machine Learning

Repeated k-Fold Cross-Validation for Model Evaluation in Python
Machine Learning

Repeated k-Fold Cross-Validation for Model Evaluation in Python

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the k-fold cross-validation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model. This involves simply repeating the cross-validation procedure multiple times and reporting the mean result across all folds from all runs. This mean result is expected to be a more accurate estimate of the true unknown underlying mean performance of the model on the dataset, as calculated using the standard error. In this tutorial, you will discover repeat...
How to Selectively Scale Numerical Input Variables for Machine Learning
Machine Learning

How to Selectively Scale Numerical Input Variables for Machine Learning

Many machine learning models perform better when input variables are carefully transformed or scaled prior to modeling. It is convenient, and therefore common, to apply the same data transforms, such as standardization and normalization, equally to all input variables. This can achieve good results on many problems. Nevertheless, better results may be achieved by carefully selecting which data transform to apply to each input variable prior to modeling. In this tutorial, you will discover how to apply selective scaling of numerical input variables. After completing this tutorial, you will know: How to load and calculate a baseline predictive performance for the diabetes classification dataset. How to evaluate modeling pipelines with data transforms applied blindly to all numerical input v...
Train-Test Split for Evaluating Machine Learning Algorithms
Machine Learning

Train-Test Split for Evaluating Machine Learning Algorithms

The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive modeling problem. Although simple to use and interpret, there are times when the procedure should not be used, such as when you have a small dataset and situations where additional configuration is required, such as when it is used for classification and the dataset is not balanced. In this tutorial, you will discover how to evaluate machine learning models using the train-test split. After completing this tutorial, you will know: The train-test split procedure is app...
LOOCV for Evaluating Machine Learning Algorithms
Machine Learning

LOOCV for Evaluating Machine Learning Algorithms

The Leave-One-Out Cross-Validation, or LOOCV, procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model. It is a computationally expensive procedure to perform, although it results in a reliable and unbiased estimate of model performance. Although simple to use and no configuration to specify, there are times when the procedure should not be used, such as when you have a very large dataset or a computationally expensive model to evaluate. In this tutorial, you will discover how to evaluate machine learning models using leave-one-out cross-validation. After completing this tutorial, you will know: The leave-one-out cross-validation procedure is appropriate when you have a small dataset or when an ...
Nested Cross-Validation for Machine Learning with Python
Machine Learning

Nested Cross-Validation for Machine Learning with Python

Last Updated on July 31, 2020The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not used during training. This procedure can be used both when optimizing the hyperparameters of a model on a dataset, and when comparing and selecting a model for the dataset. When the same cross-validation procedure and dataset are used to both tune and select a model, it is likely to lead to an optimistically biased evaluation of the model performance. One approach to overcoming this bias is to nest the hyperparameter optimization procedure under the model selection procedure. This is called double cross-validation or nested cross-validation and is the preferred way to evaluate and compare tuned machine learning models. In thi...
How to Configure k-Fold Cross-Validation
Machine Learning

How to Configure k-Fold Cross-Validation

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm on a dataset. A common value for k is 10, although how do we know that this configuration is appropriate for our dataset and our algorithms? One approach is to explore the effect of different k values on the estimate of model performance and compare this to an ideal test condition. This can help to choose an appropriate value for k. Once a k-value is chosen, it can be used to evaluate a suite of different algorithms on the dataset and the distribution of results can be compared to an evaluation of the same algorithms using an ideal test condition to see if they are highly correlated or not. If correlated, it confirms the chosen configuration is a robust approximation f...
How Intelligent Automation Tames the Healthcare Document Beast
Machine Learning

How Intelligent Automation Tames the Healthcare Document Beast

Few vertical industries are as document-intensive as healthcare, whether on the provider or insurance side. That makes the healthcare industry ripe for tools that can automate insurance claims processing and other chores, for both providers and insurers alike. The challenge is heightened by the fact that most of the documents in question are unstructured, consisting of text, numbers and images that vary in nature and in terms of content. That means approaches to intelligent document processing that rely on templates to identify and extract content will be ineffective, because you can’t reasonably create a template for every potential document an insurance company or healthcare provider may have to process.  Rather, healthcare providers and insurers need intelligent document processing to...
Not All Intelligent Process Automation Requires Million-dollar Hardware
Machine Learning

Not All Intelligent Process Automation Requires Million-dollar Hardware

While the artificial intelligence market is unquestionably enjoying rapid growth, cost is a gating factor that gives some companies pause. It’s understandable given the price tag for software and hardware required to run some complex AI applications, notably those involving deep learning. But when it comes to intelligent process automation (IPA), it doesn’t have to be that way. Looking at the numbers, you’d think absolutely everyone is on the AI bandwagon. Grand View Research expects the global AI market to reach $390 billion by 2025, growing at a compound annual growth rate of more than 46% from 2019. Hardware accounts for a significant chunk of that total. The AI hardware market is expected to reach more than $230 billion by 2025, up from about $42 billion in 2019, according to Statist...
Indico Posts Record Q2 in New Bookings as Automation Wave Continues to Accelerate
Machine Learning

Indico Posts Record Q2 in New Bookings as Automation Wave Continues to Accelerate

Customer Momentum Driving New Executive Hires – Former Blue Prism Exec joins Indico as VP of Strategic AlliancesIndico, a leading provider of intelligent process automation for document intake and understanding today announced a record second quarter 2020 for new customer bookings, including its largest ever contract to date. Indico’s Q2 performance also included strong customer renewals and a 200% increase in year over year contracted recurring revenues.  Indico has seen continued momentum across its primary insurance and financial services verticals, where the presence of large amounts of unstructured data are driving interest in Indico’s award-winning solution for addressing the river of documents, email, text, and images that drive critical processes across these industries.  The glo...
Release Notes – Indico IPA v4.0
Machine Learning

Release Notes – Indico IPA v4.0

Thank you for being a valued Indico user! We’re constantly making updates to our app and APIs, working on new features, and garnering feedback to be best in class for intelligent process automation.  Have ideas on how to make our product even better – please let us know here!Innovations and Updates in v4.0 (our biggest release to date! 🎉):   ReviewOur BRAND NEW module in the application introduces human-in-the-loop functionality to Indico IPA.Easy to use exception handling queues that allow subject matter experts to accept predictions and do any character corrections within the Review Queue, and allowing process owners to remediate on documents that require further triaging before reaching the downstream systems;.Same intuitive interface (with some changes!) as On Document Labeling to ca...