Generate viral LinkedIn posts in your style for free.

Generate LinkedIn posts
Timur Bikmukhametov, PhD

Timur Bikmukhametov, PhD

These are the best posts from Timur Bikmukhametov, PhD.

3 viral posts with 3,870 likes, 174 comments, and 313 shares.
3 image posts, 0 carousel posts, 0 video posts, 0 text posts.

๐Ÿ‘‰ Go deeper on Timur Bikmukhametov, PhD's LinkedIn with the ContentIn Chrome extension ๐Ÿ‘ˆ

Best Posts by Timur Bikmukhametov, PhD on LinkedIn

Advanced ML Hyperparameter Tuning: Best Method?

Bayesian vs Particle Swarm Optimization ๐Ÿ‘‡

(for more ML tutorials and resources like this, subscribe to my newsletter: https://lnkd.in/gddXakxh)

1๏ธโƒฃ ๐—•๐—ฎ๐˜†๐—ฒ๐˜€๐—ถ๐—ฎ๐—ป ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป (๐—•๐—ข)

๐—˜๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜ ๐—ณ๐—ผ๐—ฟ ๐—–๐—ผ๐˜€๐˜๐—น๐˜† ๐—˜๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€
BO works well when each hyperparameter evaluation (e.g., training a model) is computationally expensive.

๐—ฆ๐—บ๐—ฎ๐—ฟ๐˜ ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—ถ๐—ป ๐—–๐—ผ๐—บ๐—ฝ๐—ฎ๐—ฐ๐˜ ๐—ฆ๐—ฝ๐—ฎ๐—ฐ๐—ฒ๐˜€
Performs best in low-dimensional hyperparameter spaces (typically fewer than 20 dimensions).

It uses past evaluations to sample promising points, making it sample-efficient (see the gif).

โœ… ๐—•๐—ฒ๐˜€๐˜ ๐—ณ๐—ผ๐—ฟ:
โ†ณ Cases with limited computational budgets.
โ†ณ Cases when model evaluation is time-consuming.
โ†ณ Tuning a small to moderate number of parameters.

2๏ธโƒฃ ๐—ฃ๐—ฎ๐—ฟ๐˜๐—ถ๐—ฐ๐—น๐—ฒ ๐—ฆ๐˜„๐—ฎ๐—ฟ๐—บ ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป (๐—ฃ๐—ฆ๐—ข)
๐—›๐—ฎ๐—ป๐—ฑ๐—น๐—ฒ๐˜€ ๐—›๐—ถ๐—ด๐—ต ๐——๐—ถ๐—บ๐—ฒ๐—ป๐˜€๐—ถ๐—ผ๐—ป๐—ฎ๐—น๐—ถ๐˜๐˜†
PSO suites for high-dimensional hyperparameter spaces, exploring diverse regions effectively.

๐—ฃ๐—ฎ๐—ฟ๐—ฎ๐—น๐—น๐—ฒ๐—น๐—ถ๐˜‡๐—ฎ๐—ฏ๐—น๐—ฒ
Easily scales across multiple computational nodes, making it efficient for large-scale optimization tasks.

โœ… ๐—•๐—ฒ๐˜€๐˜ ๐—ณ๐—ผ๐—ฟ:
โ†ณ High-dimensional hyperparameter spaces
โ†ณ Scenarios with abundant computational resources allowing for parallel evaluations.
โ†ณ Problems where the evaluation cost is less critical compared to exploration breadth.

๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐—–๐—ต๐—ผ๐—ผ๐˜€๐—ฒ ๐˜๐—ต๐—ฒ ๐—ฅ๐—ถ๐—ด๐—ต๐˜ ๐—”๐—น๐—ด๐—ผ๐—ฟ๐—ถ๐˜๐—ต๐—บ
-> ๐—จ๐˜€๐—ฒ ๐—•๐—ฎ๐˜†๐—ฒ๐˜€๐—ถ๐—ฎ๐—ป ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป when each evaluation is costly and you're dealing with a manageable number of hyperparameters.

-> ๐—จ๐˜€๐—ฒ ๐—ฃ๐—ฎ๐—ฟ๐˜๐—ถ๐—ฐ๐—น๐—ฒ ๐—ฆ๐˜„๐—ฎ๐—ฟ๐—บ ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป if you have many hyperparameters to tune and the capacity for parallelized evaluations.

โ™ป๏ธ Share with network to show your interest in advanced ML practices!

P.S. Did you try PSO in your work?
Post image by Timur Bikmukhametov, PhD
Anomaly Detection in Time Series in real-time.

Use Mahalanobis Distance (steps, pros, cons)๐Ÿ‘‡

๐Ÿ”ฅ More of ๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ ๐— ๐—Ÿ ๐—–๐—ผ๐—ป๐˜๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—บ๐—ฒ here:ย https://lnkd.in/emb4cFCS

๐Ÿง  ๐—ช๐—ต๐—ฎ๐˜ ๐—ถ๐˜€ ๐— ๐—ฎ๐—ต๐—ฎ๐—น๐—ฎ๐—ป๐—ผ๐—ฏ๐—ถ๐˜€ ๐——๐—ถ๐˜€๐˜๐—ฎ๐—ป๐—ฐ๐—ฒ?
MD measures the distance of a point from the mean of a distribution, accounting for correlations between features in multidimensional space.

For the distributions, distributions of the train data (with no anomalies) are taken.

The bigger the distance of a new point from the distribution means, the more likely it's an anomaly.

๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐— ๐—— ๐—ณ๐—ผ๐—ฟ ๐—ฎ๐—ป๐—ผ๐—บ๐—ฎ๐—น๐˜† ๐—ฑ๐—ฒ๐˜๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป:
-> Train a model:
Use clean (anomaly-free) training data to calculate the mean vector and covariance matrix.

-> Compute MD for new points:
For each new data point, compute its MD using the trained model.

-> Set a threshold and flag anomalies:
Choose a statistical threshold (e.g., the 95th percentile of MD values in the training data).

๐ŸŸข ๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:
โ†ณ Can detect gradual and steep anomalies
โ†ณ Easy to interpret and fast to compute (like PCA)
โ†ณ Detects anomalies considering feature relationships

๐Ÿ”ด ๐——๐—ถ๐˜€๐—ฎ๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:
โ†ณ Sensitive to errors in covariance matrix
โ†ณ Needs prior handling of outliers in training data.
โ†ณ Assumes data has a multivariate normal distribution.
ย 
๐Ÿ”ฅ More of ๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ ๐— ๐—Ÿ ๐—–๐—ผ๐—ป๐˜๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—บ๐—ฒ here:ย https://lnkd.in/emb4cFCS

โ™ป๏ธ Repost to show your interest in Anomaly Detection!

P.S. Which methods do you use for anomaly detection?
Post image by Timur Bikmukhametov, PhD
4 cases when NOT to use Random Forest

(that'll save you 50% of ML modeling time)

๐Ÿ”ฅ More of ๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ ๐— ๐—Ÿ ๐—–๐—ผ๐—ป๐˜๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—บ๐—ฒ here:ย https://lnkd.in/emb4cFCS

๐ŸŸข ๐—ช๐—ต๐—ฒ๐—ป ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ๐˜€ ๐—ฎ๐—ป๐—ฑ ๐˜๐—ฎ๐—ฟ๐—ด๐—ฒ๐˜ ๐—ฟ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€๐—ต๐—ถ๐—ฝ๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—บ๐—ผ๐˜€๐˜๐—น๐˜† ๐—น๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ
In this case, Random Forest will barely outperform Linear or Logistic Regression.

๐Ÿ‘‰ Linear models:
โ†ณ Train faster
โ†ณ Are easier to tune
โ†ณ Are more interpretable

๐ŸŸข ๐—ช๐—ต๐—ฒ๐—ป ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ถ๐˜€ ๐—ป๐—ผ๐—ถ๐˜€๐˜†, ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ, ๐—ฎ๐—ป๐—ฑ ๐—ต๐—ฎ๐˜€ ๐—น๐—ผ๐˜„ ๐˜ƒ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†
Random Forest likes when:
โ†ณ Data having many distinct feature values
โ†ณLow noise levels

For noisy and sparse data, simpler models (e.g., Linear Regression) often perform just as well.

๐ŸŸข ๐—ช๐—ต๐—ฒ๐—ป ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฒ๐˜…๐˜๐—ฟ๐—ฎ๐—ฝ๐—ผ๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐—ถ๐—บ๐—ฝ๐—ผ๐—ฟ๐˜๐—ฎ๐—ป๐˜
Random Forest doesnโ€™t extrapolate well.

๐Ÿ‘‰ Use smooth non-linear models instead:
โ†ณ Neural Networks
โ†ณย Gaussian Processes

๐ŸŸข ๐—ช๐—ต๐—ฒ๐—ป ๐˜†๐—ผ๐˜‚ ๐—ฝ๐—น๐—ฎ๐—ป ๐˜๐—ผ ๐˜‚๐˜€๐—ฒ ๐˜๐—ต๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ณ๐—ผ๐—ฟ ๐—ผ๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป
Random Forestโ€™s piecewise constant structure makes optimization gradients noisy and unstable.

๐Ÿ‘‰ In this case, use smooth non-linear models:
โ†ณ Neural Networks
โ†ณย Gaussian Processes
โ†ณย Splines

๐Ÿ”ฅ More of ๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ ๐— ๐—Ÿ ๐—–๐—ผ๐—ป๐˜๐—ฒ๐—ป๐˜ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—บ๐—ฒ here:ย https://lnkd.in/emb4cFCS

โ™ป๏ธ Repost to show your interest in Practical Machine Learning!

P.S. Whatโ€™s your go-to alternative when Random Forest isnโ€™t the right fit?
Post image by Timur Bikmukhametov, PhD

Related Influencers