News

One way to think about this is that training fits data to a curve. Activation functions specify those curves, and the question is which curve gives the best fit? A linear function would fit the data ...