Wow, I’m learning about neural networks and your post helped me really come at it from a historical perspective on how we got here. Thank you so much for this!

I struggled with two important parts of this post:

  1. Why did we introduce multiple parallel convolutions and then concatenate them? I understood the part about reducing feature depth with bottlenecks, but not parallelism.
  2. What happens due to the input bypass mechanism starting ResNet? I didn’t understand how f(x) + x translates into a better network.

Just sharing my thoughts coming at this as a beginner – it would be nice if you could elaborate on these two points in your post.

Thank you so much for putting this together!

Written by

Software guy. When you try to tell computers what to do, you eventually learn about human nature as well. (

