Progress in Generative Models in Machine Learning-
October 28, 2024Information
- ID
- 12270
- To Cite
- DCA Citation Guide
Transcript
- 00:00So next up, we have,
- 00:03doctor Deep Jaylee,
- 00:05who's,
- 00:06we're lucky to have him
- 00:07from,
- 00:08Apple machine learning where he's,
- 00:10lead leads a team of
- 00:11researchers working on fundamental techniques
- 00:13for machine learning.
- 00:15I know Deep for some
- 00:16time, and I know that
- 00:17he's been actually working on
- 00:18deep learning, before deep learning
- 00:20was cool a long time
- 00:21ago.
- 00:22So he actually did, his
- 00:24PhD under the supervision of
- 00:26Jeff Hinton, who just got
- 00:27a Nobel Prize, you may
- 00:28have heard, in the foundational
- 00:30days of deep learning. And
- 00:31then he joined Google Brain.
- 00:33He worked on deep learning
- 00:34models for
- 00:36sequences, and then he also,
- 00:38worked at various,
- 00:39places such as in media,
- 00:42Google Brain Robotics. And then
- 00:44somehow he got to finance
- 00:45for a bit, D. E.
- 00:46Shaw, and then, also before
- 00:47that, international labs. So we're
- 00:50gonna hear, I think, from
- 00:51Deepan the latest in generative
- 00:53models.
- 00:56Thank you, John.
- 00:59Well,
- 01:01I'll be presenting a lot
- 01:02of work with my colleagues
- 01:03at Apple. I have to
- 01:04say, this is a little
- 01:06different from my usual talk
- 01:07where I get into the
- 01:08nitty gritty of, machine learning
- 01:10and why one slight variation
- 01:12is more important than the
- 01:14other.
- 01:15Instead,
- 01:16for today, I thought maybe
- 01:17I would touch base on
- 01:19what I thought were three
- 01:21essential things that,
- 01:23people working in life sciences
- 01:25should think about
- 01:26in advances in machine learning
- 01:28as they would be relevant
- 01:29to them when they're looking
- 01:30at data.
- 01:32Okay. So what are these
- 01:34three things that I have
- 01:36in mind? Well, the first
- 01:36one,
- 01:38is that recently,
- 01:39neural networks have got really
- 01:41good at trying to embed
- 01:43various kinds of data,
- 01:45into a vector representation.
- 01:47You know, if you get
- 01:47some data, you first need
- 01:49to convert that into a
- 01:50form that you can work
- 01:51with in statistical models.
- 01:53And so this is a
- 01:54requirement. And in the past,
- 01:56you know,
- 01:57people had some ways to
- 01:59do it. And recently, there's
- 02:00been a lot of progress
- 02:00in this, so I wanna
- 02:01touch that touch upon that
- 02:03a little bit.
- 02:04I should also say,
- 02:07we can fit these representations
- 02:09very well. No one really
- 02:10knows why,
- 02:11but we can.
- 02:13And,
- 02:15recently, generated models themselves have,
- 02:19become really powerful. So we
- 02:20have really an uncanny ability
- 02:23to generate data now that
- 02:25doesn't
- 02:26that really surprises me every
- 02:28day, from, you know, generated
- 02:30models of text to
- 02:32models of images.
- 02:33In that regard, I'll touch
- 02:35on, like, the two main
- 02:36techniques,
- 02:37autoregressive models and diffusion models
- 02:39today and how they work.
- 02:42And I should also highlight
- 02:43that now we can do
- 02:44this across modalities. So not
- 02:46only is it just a
- 02:47model for text or just
- 02:48a model for images,
- 02:50but instead, we can build
- 02:51models that,
- 02:53work across all of them.
- 02:55I should say everything's gonna
- 02:57be on a really high
- 02:57level, but if you want
- 02:58to get into nitty gritties,
- 02:59we can,
- 03:00touch base after the talk.
- 03:03And so,
- 03:06I will end by one
- 03:08little vignette on
- 03:09doing conformer predictions with diffusion
- 03:12models. So you're given a,
- 03:14compound. You want to predict
- 03:15what the,
- 03:16structure of that compound is.
- 03:18And, you know, everybody's seen
- 03:20AlphaFold.
- 03:21There's specific methodologies,
- 03:23the early methods in AlphaFold
- 03:25used, which use a lot
- 03:26of information like multiple sequence
- 03:28alignments and so on.
- 03:29But now our techniques are
- 03:30getting powerful enough where you
- 03:32can do things have initial
- 03:34without that much information. And
- 03:35so I think this is
- 03:37an interesting
- 03:38approach to,
- 03:39to highlight, and I think
- 03:40latest alpha fold three also
- 03:42works on diffusion.
- 03:45So there's some commonality there.
- 03:48Okay. So,
- 03:50how does, how does this
- 03:52embedding of data into
- 03:54any space work? So,
- 03:56traditionally,
- 03:57we think of this as
- 03:58representation learning. You're given some
- 04:00data.
- 04:01Before you can do anything
- 04:02about it, you want to
- 04:03first convert that into a
- 04:04usable form by embedding it
- 04:06into some vector space with
- 04:08n dimensions.
- 04:10And then you plug that
- 04:11into a statistical model, and
- 04:13you can do things like
- 04:14make predictions on it or
- 04:15maybe do unsupervised learning like
- 04:17clustering that we just saw
- 04:19previously.
- 04:21And so,
- 04:22what people used to do
- 04:24in the past,
- 04:25was there was a specific
- 04:27technique for every modality. You
- 04:29had images. You would use
- 04:30two d convolutions, your convolutional
- 04:32models.
- 04:33For text, you would embed
- 04:35each text into a little,
- 04:38descriptor for it. For waveforms,
- 04:40you might convert, the waveform
- 04:42into spectral representations
- 04:44and then embed those into
- 04:46a big size vector space.
- 04:50And so,
- 04:52you know, they what because
- 04:54of this limitation, what happened
- 04:55is that everybody had,
- 04:58models for different kinds of
- 04:59data, and they were all
- 05:00separated.
- 05:01And then over time, people
- 05:03decided,
- 05:04well, let's try and embed
- 05:05different modalities into the same
- 05:07space.
- 05:08And once they're in the
- 05:09same space, we'll just combine
- 05:10them by little things like
- 05:12adding the representations from different
- 05:14spaces together
- 05:15or maybe even putting a
- 05:16small neural net on top
- 05:17of that.
- 05:19But that was really quite
- 05:21inflexible,
- 05:22in that, the kinds of
- 05:24changes you could make to
- 05:25the representation were limited. And,
- 05:28what you did at,
- 05:30times when you wanted to
- 05:31use the model was pretty
- 05:32much how you trained it.
- 05:33If you had two things
- 05:34going in during training, you
- 05:35only use two things,
- 05:38during evaluation. So, like, if
- 05:40you had images and text,
- 05:41you could just use images
- 05:42and text and no other
- 05:43combinations.
- 05:45And so in twenty twenty
- 05:46seven, all this changed with
- 05:48the attention,
- 05:49models paper.
- 05:51It was really,
- 05:52a breakthrough paper, which has
- 05:54had its implications in various
- 05:56forms.
- 05:57So
- 05:58the I the basic idea
- 06:00is you can take,
- 06:02embeddings of different data, and
- 06:04then
- 06:04you can combine these embeddings
- 06:06by choosing what's important. So
- 06:08there's this notion of attention.
- 06:09I won't go into the
- 06:10details necessarily,
- 06:12on that, but there's this
- 06:13notion of looking at your
- 06:14data,
- 06:15and looking at different parts
- 06:16of it and choosing,
- 06:18those parts,
- 06:20if it seems relevant to,
- 06:22the model itself.
- 06:23And this is all learned
- 06:25during training itself of the
- 06:26model, so this attention is
- 06:28not baked in beforehand. You
- 06:29just learn how to do
- 06:30it, as part of the
- 06:31training.
- 06:34So,
- 06:36what this also offers is
- 06:37this really interesting ability to
- 06:39change how you embed data
- 06:41into your model. So instead
- 06:43of just,
- 06:44using your traditional
- 06:46way of approaching,
- 06:48embeddings where you just put
- 06:49your data in through some
- 06:51prebaked
- 06:52model, what you can do
- 06:53is you can
- 06:54apply these attention models to
- 06:56sort of compress your data
- 06:58into a fixed size representation.
- 06:59So here's an example for
- 07:01images. You can take an
- 07:02image, split it into patches,
- 07:04and then learn an attention
- 07:06model on tap, which kind
- 07:07of compresses the whole image
- 07:09down to a single
- 07:10vector, and now you can
- 07:11use that for anything else
- 07:13you want to do with
- 07:13it.
- 07:16Furthermore,
- 07:17what's really interesting is now
- 07:19you can do this for
- 07:20various modalities across time. You
- 07:22can have images.
- 07:23You can have text. You
- 07:25can have videos. You can
- 07:26have sound. They all get
- 07:27embedded into the same space,
- 07:29and you can,
- 07:31compress them down to the
- 07:32same format. So you can
- 07:33do things like apply it
- 07:34to different sentences of different
- 07:36lengths. So you don't have
- 07:37to worry about periodicity or
- 07:39or the fact that everything
- 07:40has to be on the
- 07:41same length, with this device,
- 07:43and you can do this
- 07:45across different data types.
- 07:47And so,
- 07:49you know, it's a really
- 07:50powerful tool, and I, you
- 07:51know, I wanted to highlight
- 07:52that today because I think,
- 07:53you know, if you're dealing
- 07:54with multivariate data, you can,
- 07:57over time, think about clever
- 07:58techniques,
- 07:59on how to combine them
- 08:01together. And a lot of
- 08:02ingenuity
- 08:03that's gone into things like
- 08:04alpha bold and stuff is
- 08:05about how do you combine,
- 08:07various, data that goes in
- 08:08there.
- 08:09So it requires some experience,
- 08:11but I think, you know,
- 08:11just with a little tweaking,
- 08:13you get pretty good at
- 08:14it.
- 08:16Okay. So switching to generative
- 08:18models.
- 08:20Once you have an embedding,
- 08:21you can do sort of
- 08:22generative models of data.
- 08:24What's a generative model? A
- 08:26generative model is a model
- 08:27that allows, by definition, to
- 08:29generate new data of that
- 08:30modality.
- 08:32Additionally,
- 08:33it can help you quantify
- 08:35whether something you're seeing has
- 08:37high probability or low probability,
- 08:39so that you can do
- 08:41other things,
- 08:42with that probability as in
- 08:43build tools on top of,
- 08:46those measures itself.
- 08:48There's a wide variety of,
- 08:51techniques for generative models,
- 08:54that the machine learning community
- 08:55has built over time, but
- 08:57I'll basically just be talking
- 08:58about autoregressive
- 08:59models and diffusion models, which
- 09:02are really the mainstay of,
- 09:04of the models today. You're
- 09:06quite familiar with them.
- 09:08Autoregressive models are an example
- 09:10would be CHATCPT,
- 09:13for diffusion models,
- 09:15you know, something like stable
- 09:16diffusion for image generation
- 09:18is an example.
- 09:22Okay. So,
- 09:23with autoregressive models,
- 09:26the goal is to build
- 09:28a model,
- 09:29where you get a probability
- 09:31for any data point.
- 09:32And,
- 09:33the way we we do
- 09:35this with autoregressive models is
- 09:36to convert high dimensional data
- 09:38into a sequence
- 09:39and then measure the probability
- 09:41of the sequence using the
- 09:42chain rule of conditional probability,
- 09:44which is just basically
- 09:47multiply the probabilities
- 09:49of,
- 09:50one variable given the rest.
- 09:52I think the details are
- 09:53not too important. I'll try
- 09:54and highlight with this sort
- 09:56of example here.
- 09:57Let's say you have the
- 09:58web and you want to
- 09:59build a generative model of
- 10:00text.
- 10:02What you would do is
- 10:03you would take the entire
- 10:05dataset of text and convert
- 10:06it to input output pairs
- 10:10of of the type x
- 10:11and y. So you're given
- 10:12some data x and you
- 10:14want to predict y. Know,
- 10:15you're familiar with sort of
- 10:17regression or logistic regression. It's
- 10:19the same sort of,
- 10:20technique. You're basically just trying
- 10:22to predict,
- 10:24some some target given some
- 10:25input.
- 10:26And so with with the
- 10:27web, what you would do
- 10:28is you would just take
- 10:29all the prefixes.
- 10:31So,
- 10:32you got a Shakespearean verse
- 10:34here, and you could say,
- 10:35to be or not to
- 10:36be. So you convert that
- 10:37into data examples.
- 10:39Empty start, first word being
- 10:41two,
- 10:42and then x being two,
- 10:44and the next word is
- 10:45b.
- 10:46That's another data example. And
- 10:48then to b is an
- 10:49in an input for another
- 10:50one, and
- 10:51r is the target,
- 10:53word. And so you can
- 10:54convert the entire dataset entire
- 10:56web into such a database,
- 10:58and
- 10:59you are now learning a
- 11:00model that learns to predict,
- 11:03the next word given whatever
- 11:04context it is.
- 11:06And so, you know, this
- 11:08is the workhorse
- 11:09of how, current models like
- 11:11ChatDPD work. You just take
- 11:13the web and just do
- 11:14next
- 11:15token prediction,
- 11:16as it were. And then
- 11:18when you want to run
- 11:19the model, what you do
- 11:20is you feed in some
- 11:22context such as what it,
- 11:24why is the sky blue,
- 11:26and then you let the
- 11:27model generate the next, word,
- 11:28which which it has already
- 11:30learned from its model, and
- 11:32then you take that word,
- 11:33for example, in this case,
- 11:34because,
- 11:35and you feed that
- 11:37word back in the next
- 11:38sentence,
- 11:39why is the sky blue
- 11:40because, and then you have
- 11:42it predict the next word,
- 11:43because of,
- 11:46Raleigh, scattering, and so and
- 11:48so on. So you basically
- 11:49run the model during inference,
- 11:50and it it just generates
- 11:52text, and that,
- 11:54that generated,
- 11:55output is,
- 11:57you know, what you see.
- 12:00So how do you apply
- 12:01this to other modalities other
- 12:02than text? It's quite clear
- 12:04for text.
- 12:05It's a discrete data, and
- 12:07these models work really well
- 12:08for discrete data. They don't
- 12:09really work so well in
- 12:10continuous regression space.
- 12:13So it's easy to apply
- 12:14just to things like protein
- 12:16sequences, amino acids,
- 12:19and so on that are
- 12:20naturally discrete. It's a little
- 12:22trickier for, high dimensional data,
- 12:26which are not like strings.
- 12:28So when it highlights
- 12:30how how people do this,
- 12:32what you end up doing
- 12:33is building a model which
- 12:35first encodes your data into
- 12:36a sequence of,
- 12:38of tokens,
- 12:39dispute
- 12:41tokens,
- 12:42and then
- 12:43you learn another model.
- 12:45Typically, actually, you learn both
- 12:46of these models together.
- 12:48And the other model is
- 12:49called reconstruction model, which takes
- 12:51in,
- 12:52the output tokens and converts
- 12:54it back to the data
- 12:55itself.
- 12:56And once you have this,
- 12:57you can convert your entire
- 12:59data into the sequence of
- 13:00tokens,
- 13:02and then learn the autoregressive
- 13:04model on that sequence of
- 13:05tokens by just predicting the
- 13:07next token given the history
- 13:09of its tokens.
- 13:11And,
- 13:12you can now generate this
- 13:14new kind of data
- 13:16by running the autoregressive model,
- 13:17generating some sequences, and then
- 13:20converting that to real data.
- 13:23Here's an example on how
- 13:24you might apply this to
- 13:25modality like speech.
- 13:28So you have speech is
- 13:30really just waveforms.
- 13:32And so to be able
- 13:33to build a model actually,
- 13:34you were really really wanted
- 13:36to do it. You could
- 13:37just model speech directly,
- 13:39and people have done that.
- 13:41But it's harder to, deal
- 13:43with that because speech happens
- 13:45at a very fast rate,
- 13:46so the data would be
- 13:47just too much. So, typically,
- 13:49what people will do now
- 13:50is convert speech into a
- 13:51spectral representation,
- 13:53by just taking windows of
- 13:54speech and,
- 13:55computing a Fourier spectrum in
- 13:57it. So this original waveform
- 13:59is converted to a frequency
- 14:01diagram over time showing how
- 14:03the sound,
- 14:04is is distributing energy over
- 14:07these different frequencies on on
- 14:08y axis.
- 14:12And once once you've converted
- 14:14that to this, format, you
- 14:15can learn an inverse model
- 14:17called the vocoder, which will
- 14:18just generate,
- 14:20the raw waveform from
- 14:23from,
- 14:24this
- 14:25coded speech.
- 14:26Now, unfortunately,
- 14:27the,
- 14:29spectrum on the right hand
- 14:30side is it's still continuous.
- 14:32It's not discrete, and so
- 14:34it's hard to embed it
- 14:35into,
- 14:35an autoregressive model.
- 14:37So what what you can
- 14:39do is simply just take
- 14:40that data and tokenize it
- 14:42by discretizing the data. So
- 14:44just round divide by
- 14:46take off the minimum, divide
- 14:48by maximum, and just convert
- 14:50it into a range of
- 14:51numbers between zero and some
- 14:52maximum bin.
- 14:53And now you have a
- 14:54discretized
- 14:55version. You can convert back,
- 14:58to the original as well
- 14:59by just mapping,
- 15:00the
- 15:01continuous values to, the discrete
- 15:03values back to continuous codes.
- 15:06And so now you have
- 15:06this machinery by which you
- 15:08can take this continuous data,
- 15:09convert it to tokens, and
- 15:11then convert it back to
- 15:12real data. So you can
- 15:13really just beat that into
- 15:15an autoregressive model.
- 15:17And so,
- 15:19you take the waveform. You
- 15:20have spectral representation.
- 15:22You take each spectral representation
- 15:23and convert it into
- 15:25a sequence of discrete tokens,
- 15:27and
- 15:29voila. You can do next
- 15:30step predictions. So you feed
- 15:31in your history of tokens,
- 15:32and then you can predict
- 15:34the next token. So,
- 15:36it's basically a recipe that's,
- 15:38repeated all over. You just
- 15:40learn how to discretize your
- 15:42data into,
- 15:43some discrete bins, and then
- 15:44you learn how to address
- 15:45the model. And I think
- 15:46this could be applied.
- 15:49It's already applied to various
- 15:50things like speech,
- 15:52videos, images,
- 15:54and so on. And it's
- 15:55a pretty powerful technique that
- 15:57can be applied to other
- 15:58modalities as well.
- 16:03And now I want to
- 16:03talk a little bit about
- 16:04diffusion models.
- 16:06It's a set of new
- 16:07techniques that allows you to
- 16:09morph one probability distribution to
- 16:11another.
- 16:13And so,
- 16:15there's methods called optimal transport,
- 16:18flow matching, diffusion models. They're
- 16:20all trying to map
- 16:22distributions
- 16:23from one distribution to another.
- 16:25It might seem like a
- 16:26very arcane
- 16:28idea, but it's really a
- 16:29powerful,
- 16:30methodology when you want to
- 16:32think about
- 16:33how how to generate,
- 16:35data from noise. So in
- 16:37the case of diffusion models,
- 16:38you morph
- 16:40a Gaussian distribution, which is
- 16:41something that people know how
- 16:43to handle,
- 16:44and convert that into a
- 16:45real data distribution, which is
- 16:47really hard
- 16:48to handle. So if you
- 16:49give me images or or
- 16:51speech, I don't know what
- 16:52the data distribution itself is
- 16:54and how to model that
- 16:55distribution itself. Or multi omics
- 16:57data, like, how do you
- 16:58what is the actual distribution
- 17:00of data one doesn't know?
- 17:02And so the ability to
- 17:04to generate and sample from
- 17:05that is quite useful,
- 17:07and,
- 17:08mapping to a simple distribution
- 17:10allows us to do that.
- 17:12And so how does this
- 17:13actually work in practice?
- 17:15I'll show you an example
- 17:16with images.
- 17:18You have some image on
- 17:20the right hand side, x
- 17:21zero.
- 17:25I guess I don't see
- 17:26a mouse there.
- 17:27And you take that image
- 17:28on the right hand side
- 17:31and
- 17:38Maybe it's Never mind. It's
- 17:39okay.
- 17:41So you take an image
- 17:42on the right hand side.
- 17:44What you can do is
- 17:45you can just,
- 17:46scale it down in magnitude
- 17:48by multiplying by some compression
- 17:50term, and then you add
- 17:51some noise, which expands the
- 17:53data up again. And so
- 17:54you started with some data,
- 17:56and you can generate a
- 17:57whole bunch of data at
- 17:58different noise levels.
- 18:00And,
- 18:01what you really want to
- 18:02do is to learn a
- 18:04function that takes
- 18:05data at one noise level
- 18:07and cleans it up slightly
- 18:08to a slightly less,
- 18:10noisy level.
- 18:12And so and you can
- 18:13then apply that model.
- 18:14You start with noisy data,
- 18:16and then you clean it
- 18:17up a little bit. And
- 18:18then you clean it up
- 18:19a little bit, and you
- 18:20do this over and over
- 18:21again,
- 18:22till you are back to
- 18:23the clean cleanest level, which
- 18:25is where the data itself
- 18:26lies.
- 18:28And so that that's a
- 18:29very simplistic explanation of diffusion
- 18:31models. There's,
- 18:32there's a whole range of,
- 18:34possibilities in this in this
- 18:36scheme. How do you add
- 18:37noise? How do you convert
- 18:39the noisy data back to
- 18:40clean data?
- 18:42And there's a whole bunch
- 18:43of techniques, that factor in
- 18:45different trade offs,
- 18:48in in these choices.
- 18:49There's also variants of diffusion
- 18:51that,
- 18:53don't look at it as
- 18:54a sequence of discrete steps,
- 18:56but it deal with this
- 18:57as, like, a continuous time
- 18:59step, which is almost like
- 19:00a,
- 19:03a diffusion process in continuous
- 19:05time.
- 19:06And,
- 19:07there's also techniques that apply
- 19:08this for discrete data. So
- 19:10I've been showing you continuous
- 19:11data. Even discrete data can
- 19:13work through diffusion models where
- 19:14you have,
- 19:16have categorical choices,
- 19:18kind of like maybe like
- 19:19mutations during evolution. It's,
- 19:21you know, just things
- 19:23mutate from,
- 19:26from signal down to noise,
- 19:28and then you learn a
- 19:29model on going backwards,
- 19:30to generate the data for,
- 19:33for real sequences.
- 19:35And there's even a continuous
- 19:36time version of this discrete
- 19:38diffusion process,
- 19:39if you can believe me.
- 19:41Okay. So these models work
- 19:42well.
- 19:44So I don't wanna leave
- 19:45you with the impression that
- 19:46everything just works right off
- 19:48the bat. So I wanna
- 19:49highlight an example just,
- 19:52just to leave with you
- 19:53with a vignette
- 19:56of how,
- 19:57the kinds of innovations you
- 19:58need to do to make
- 19:59some things work when you
- 20:01take on a new challenge.
- 20:02So if you use models
- 20:04work well, but if you
- 20:04get really large data, like
- 20:06high resolution images, it's a
- 20:08lot more tricky,
- 20:09to make it work right
- 20:10off the bat.
- 20:11And so what people will
- 20:13end up doing is I'm
- 20:14highlighting two different techniques in
- 20:16in literature. On the left
- 20:18hand side,
- 20:19what you can do is
- 20:20you first,
- 20:22learn an encoding of your
- 20:23data itself for high resolution
- 20:25images. You can learn a
- 20:26compression that compresses it to
- 20:28smaller
- 20:29images
- 20:30or smaller
- 20:31feature vectors,
- 20:32and then you can learn
- 20:33a diffusion model in that
- 20:35smaller space,
- 20:36and then generate everything in
- 20:38that compressed space.
- 20:40And from that compressed space,
- 20:41you can come back to
- 20:42the real data,
- 20:43from the model you first
- 20:44learned.
- 20:45On the right hand side
- 20:46is something called cascaded diffusion.
- 20:48So if you want to
- 20:49generate high resolution images, you
- 20:51generate things at a lower
- 20:52resolution
- 20:53and then use that as
- 20:55a seed for something that's
- 20:56at a higher resolution,
- 20:58and you expand it upwards
- 20:59up to,
- 21:01the full resolution.
- 21:06Okay. So I think I'm
- 21:08gonna run really out of
- 21:08time. So I'm gonna skip
- 21:10right to the end of
- 21:11my talk because I have
- 21:12only one minute, and I
- 21:13think this might be interesting.
- 21:18Okay. So
- 21:21I wanna talk a little
- 21:22bit quickly about how you
- 21:24might use this for predicting
- 21:25the structure of molecules.
- 21:27So as I mentioned with
- 21:28diffusion models,
- 21:30you have something you learned,
- 21:31which is the denoising model
- 21:33that takes in some noisy
- 21:34data and tries to clean
- 21:35it up.
- 21:36And you're given some features
- 21:38that describe the data as
- 21:39well, which help in the
- 21:40cleanup process.
- 21:42So you can do the
- 21:42same thing,
- 21:44with,
- 21:46with molecules.
- 21:47You can give it a
- 21:48mild representation of your molecule,
- 21:51which is for those that
- 21:52don't know, it's a way
- 21:53of, representing a compound
- 21:56sequence,
- 21:57that's used in in,
- 22:01chem informatics packages.
- 22:02And you can take the
- 22:04smiles feature and convert that
- 22:05to, features for a molecule,
- 22:07and then
- 22:09you have a denoising model
- 22:10that takes in the noisy
- 22:11coordinates for each of the
- 22:12molecule each of the atoms
- 22:13in the molecule and cleans
- 22:14up the coordinates. So, really,
- 22:16there's no
- 22:17no real information used. Don't
- 22:19you don't really bake in
- 22:21any sort of information about
- 22:23bond angles and any of
- 22:25any of that at all.
- 22:26You just basically train a
- 22:27model. You give you're given
- 22:29the compound,
- 22:30and you mutate you noise
- 22:32up its structure, and then
- 22:34you learn how to denoise
- 22:35the structure
- 22:36back.
- 22:37And so the way the
- 22:38the features are computed is
- 22:40you take a compound,
- 22:41and then you you label
- 22:43all the atoms. And from
- 22:45the atoms, you can compute
- 22:46a graph.
- 22:47This graph basically represents
- 22:49what atoms are connected to
- 22:51what other atoms.
- 22:53And,
- 22:56I guess the detail is
- 22:56not important, but you can
- 22:58represent the structure of the,
- 22:59of a molecule in a
- 23:00graph. And then from that
- 23:02graph, you can actually compute
- 23:04something called a graph Laplacian,
- 23:06which allows you to compute
- 23:07features for each of the
- 23:08atoms
- 23:09in the graph.
- 23:11And you can then add
- 23:12on some descriptors for each
- 23:13of the atoms in the
- 23:15graph, things like the obvious
- 23:17but very basic things like,
- 23:19the atom type, the degree,
- 23:21the valence, and and so
- 23:22on.
- 23:23And then you just run
- 23:24a diffusion model. I won't
- 23:26go into the details of
- 23:27it, but, essentially,
- 23:29you get features for the
- 23:30atoms based on connectivity, some
- 23:32extra descriptors, and it's three
- 23:34d noisy coordinates.
- 23:35And then you just learn
- 23:36to predict the noisy coordinates
- 23:38at the next type. And
- 23:39you can then, during inference,
- 23:41just run that model. You
- 23:42start with the structure, some
- 23:44some random three d assignments
- 23:46for the,
- 23:47positions of each of the
- 23:48atom.
- 23:49And once you run them,
- 23:51the model starts,
- 23:53to clean up the three
- 23:54d positions.
- 23:55And at the end,
- 23:57it'll give you a a
- 23:58full structure for the molecule.
- 24:01Same for another one.
- 24:10And, yeah, the the TLDR
- 24:12is this works quite well.
- 24:13We got state of the
- 24:14art results on predicting structures,
- 24:16compared to prior works, although
- 24:18we actually didn't use any
- 24:20information about
- 24:21chemistry from it. All of
- 24:23that was just learned, by
- 24:24the model on its own.
- 24:27So to conclude,
- 24:29I hope,
- 24:30you'll take away that there's
- 24:32interesting ways to embed all
- 24:33kinds of data into a
- 24:34vector representation that you could
- 24:36use for your statistical models.
- 24:38Generative models will allow you
- 24:39to build a model where
- 24:40you can generate new data,
- 24:42which can be used for
- 24:43infilling or even finding correlations
- 24:45that you may not have
- 24:46expected.
- 24:48And,
- 24:49yeah, we're
- 24:52a group in Apple just
- 24:53doing fundamental machine learning research.
- 24:55So
- 24:56with that.