Most of the advancements in Machine Learning for the past 10 years have come from a smart rearrangement of the simple units presented here. Obviously, I am omitting activation functions and some others here but you get the idea.
A convolution layer is meant to learn local correlations. Multiple successive blocks of conv and pooling layers allow one to learn correlations at multiple scales and they can be used on image data (conv2d), text data (text is just a time series of categorical variables) or time series (conv1d). You can encode text data using an embedding followed by a couple of conv1d layers. And you can encode a time series using a series of conv1d and pooling layers.
I advise against using LSTM layers when possible. The iterative computation doesn’t allow for good parallelism leading to very slow training (even with the Cuda LSTM). For text and time series ConvNet are much faster to train as they make use of matrix computation parallelism and tend to perform on par with LSTM networks (https://lnkd.in/g-6Z6qCN). One reason transformers became the leading block unit for text learning tasks, is its superior parallelism capability compared to LSTM allowing for realistically much bigger training data sets.
Here are a couple of dates to understand the DL timeline:
- (1989) Convolution layer and average pooling: https://lnkd.in/gtv_Q7iv
- (1997) LSTM layer: https://lnkd.in/gCWJjxJv
- (2003) Embedding layer: https://lnkd.in/g3iCBQNf
- (2007) Max Pooling: https://lnkd.in/ge9KKCME
- (2012) Feature dropout: https://lnkd.in/g49Sp6HE
- (2012) Transfer learning: https://lnkd.in/g9yWA86k
- (2013) Word2Vec Embedding: https://lnkd.in/gC62AchRÂ
- (2013) Maxout network: https://lnkd.in/gC_KvJjT
- (2014) GRU layer:Â https://lnkd.in/g-rRQ6km
- (2014) Dropout layers: https://lnkd.in/gkHUqYDE
- (2014) GloVe Embedding: https://lnkd.in/gA8bnnX2
- (2015) Batch normalization: https://lnkd.in/gmptQTXY
- (2016) Layer normalization: https://lnkd.in/gTad4iHE
- (2016) Instance Normalization: https://lnkd.in/g7SA_Z3q
- (2017) Self Attention layer and transformers: https://lnkd.in/gUts7Sjq
- (2018) Group Normalization: https://lnkd.in/gMv7KehG
----
Don't forget to subscribe to my ML newsletter: TheAiEdge.io
#machinelearning #datascience #artificialintelligence