Wayne Radinsky
The full text of Simone Scardapane's book Alice's Adventures in a Differentiable Wonderland is available online for free. It's not available in print form because it's being written and this is actually a draft. But it looks like Volume 1 is pretty much done. It's about 260 pages. It introduces mathematical fundamentals and then explains automatic differentiation. From there it applies the concept to convolutional layers, graph layers, and transformer models. A volume 2 is planned with fine-tuning, density estimation, generative modeling, mixture-of-experts, early exits, self-supervised learning, debugging, and other topics.

"Looking at modern neural networks, their essential characteristic is being composed by differentiable blocks: for this reason, in this book I prefer the term differentiable models when feasible. Viewing neural networks as differentiable models leads directly to the wider topic of differentiable programming, an emerging discipline that blends computer science and optimization to study differentiable computer programs more broadly."

"As we travel through this land of differentiable models, we are also traveling through history: the basic concepts of numerical optimization of linear models by gradient descent (covered in Chapter 4) were known since at least the XIX century; so-called 'fully-connected networks' in the form we use later on can be dated back to the 1980s; convolutional models were known and used already at the end of the 90s. However, it took many decades to have sufficient data and power to realize how well they can perform given enough data and enough parameters."

"Gather round, friends: it's time for our beloved Alice's adventures in a differentiable wonderland!"

Alice's Adventures in a differentiable wonderland

#solidstatelife #aieducation #differentiation #neuralnetworks
Birne Helene hat dies geteilt
Will
@natewaddoups Yes, so first, neural nets use weights between neurons (nodes) in successive layers to capture relationships. The weights are continuous and differentiable, and as they adjust the weights iteratively (back prop) they remain continuous. Second, as the input is processed through a series of layers, each layer acts like a high dimensional representation. For example, first few layers may classify input in terms of where it finds "edges" or "corners". So, you can say that the early stages "represent" edges.

In an LLM, the network seems to develop a representation of how similar words are in a high dimensional "semantic space" so to speak.

Thus for example you can use the weights at a given layer and can calculate the equation "king minus male plus female" to be equal to queen.

For a better explanation see these links i just grabbed. I haven't read them but they should answer your question.

https://kawine.github.io/blog/nlp/2019/06/21/word-analogies.html
https://www.technologyreview.com/2015/09/17/166211/king-man-woman-queen-the-marvelous-mathematics-of-computational-linguistics/
natewaddoups
I'm familiar with how neural networks work, and with word embeddings.
I just don't know what you mean by "continuous representations." Or rather, I can think of a couple ways to interpret that term and I'm wondering what you had in mind. Especially since my best guess would mean that you must already understand that the answer to your question is "yes."

This website uses cookies to recognize revisiting and logged in users. You accept the usage of these cookies by continue browsing this website.