Server Notice:

hide

37c3-talk-12061 Latest text of pad 37c3-talk-12061 Saved Feb 4, 2024

Hallo Du!

Bevor du loslegst den Talk zu transkribieren, sieh dir bitte noch einmal unseren Style Guide an: https://wiki.c3subtitles.de/de:styleguide. Solltest du Fragen haben, dann kannst du uns gerne direkt fragen oder unter https://webirc.hackint.org/#irc://hackint.org/#subtitles oder https://rocket.events.ccc.de/channel/subtitles oder https://chat.rc3.world/channel/subtitles erreichen.

Bitte vergiss nicht deinen Fortschritt im Fortschrittsbalken auf der Seite des Talks einzutragen.

Vielen Dank für dein Engagement!

Hey you!

Prior to transcribing, please look at your style guide: https://wiki.c3subtitles.de/en:styleguide. If you have some questions you can either ask us personally or write us at https://webirc.hackint.org/#irc://hackint.org/#subtitles or https://rocket.events.ccc.de/channel/subtitles or https://chat.rc3.world/channel/subtitles .

Please don't forget to mark your progress in the progress bar at the talk's website.

Thank you very much for your commitment!

======================================================================

[Music]

our next speaker Yan is a structural biologist he has more than 14 years of experience in structural biology and a PhD in Biochemistry from the University of con why structural biology well it's the most concrete way to look at the mechanisms of how living things actually work work the talk how machine learning is changed structural biology forever or not let's hear a warm welcome for Yan thank you thank you so I'm quite amazed how many people showed up so thanks for that already I hope I can present you something which is interesting to all of you and I tried to adjust this talk so that you can learn something if you are a biologist even if you're a structural biologist I hope you learn something but even if you have no idea about structural biology proteins or all of these kind of things I hope that I break it down so much that you can actually learn at the end what Alpha does how important structure biology is and why I'm dealing this dealing with these topics for for years now and I'm still pretty Amed about it so the talk goes about Alpha fold and let me go back a little bit a couple of years it was 2001 um when the winner of the 13th Casp award was announced and the Casp award is a particular award in the field of structural protein prediction and I come to that what that mean what is Exel structur protein and what is structural protein prediction we come to that a little bit later but there is an there is a test there is a kind of championship in this field and this is called kasp and every two years people submitting their best predictions whatever they are the best guesses to this kind of uh of this uh to this kind of championship ship and then at the end them there were the guesses were compared with the wheel solution and then the winner is cured and at this particular case in 2001 it was quite an amazing thing because For the First Time a company actually entered this field and this company was Google and especially it was Google Deep Mind and here we h

ave a graph which is very easy to see you can just see here on the bottom is how complicated the predictions are so you have to give multiple predictions and here on the top how good you are in terms of how well you actually guessed how what the right answer is and these are all the other contestants and the blue line here and I think that's easy to see they're just above all of the others so they actually were for the first time they showed up on this contest and they obliviated all the other contestants which was a little bit shock for the scientific Community because we are doing this stuff for years I mean this thing is cup 13 so it's 25 years old so we all did it for years and they just wed in and took us all by surprise um then the second thing which I was actually I mean you shouldn't self reference but I do it here once only I was actually pretty um wored that this is now a company doing science but there was actually no mention that they publish what they actually found and science means you do something you find something and you share it with all the others this is what science is all about and in 2018 when they came out they said well nothing came out and it took them more than two years in 2020 to actually publish a paper how they potentially did it it was a little bit vague and nobody really knew and in 2020 um there was Al already actually the next CP which was the CP 14 and they again entered this this time with a new machine learning algorithm and you don't really need to understand this graph but what you can see again is these are all the other CPS so this is easy targets difficult targets and moderately different guess so riddles you can see an easy riddle riddle riddle difficult riddle and how good are all these people solving that and the black line more or less is what Google did so they solved all the riddles perfectly from the get-go and this was also very amazing and this BR and to see how good they are you have this a little bit difficult

graph but it brings the point over I guess here on the bottom one you see all the contestants so 400 whatever groups or algorithms which submitted their guesses to this Casp championship and on the left side here this is Alpha fold and you see it's just way above all the others um and just for those of you were a little bit in the field the second one this one here s let me get my pointer going this one here this is a baker lab those of you who were in the field know they are from The Institute for protein design and University of Washington they are they were the leader of the field for ages now and they were actually also pretty much abolished so this was Major news um in at least science news outlets so um some people said well this problem is now solved nature published that deep Minds I I make a giant leap in solving protein structures um it was even going to the New York Times which say that the uh AI has now solved the protein stru problem which we were working for 50 years 25 years we tried in this kind of test but before that we already tried and the spegel which is a German news Outlet so to say if you want to say that um they actually asked should this win a machine the Nobel prizes machine not sure if they actually ever read the rules for the Nobel Prize but nevertheless um and it also um I again the German one it's saying about anyone can now Pro fold proteins and I will show you that this is true everyone here in your field can in this room can at the end of this talk fold a protein I'm not sure if you can do anything with it but at least you are able to do it um and then the spiegle again is always good for some new new stuff they say that 2020 will be the year which will be known for future generation as a year where machines began to overtake Us in research this is of course a bleak outlook for me because I'm a researcher that's my bread and butter job and if now an II can do that better not sure if I have a job next year um and this was not only th

e Spiegel I have to admit also the guardian said are we witnessing the dawn of post Theory science so this is quite a bit of rockers for just a scientific paper in nature normally you don't hear that in the name in the main newspapers um so let's now have a look why is it so dramatic what is the big deal about Alpha fold what is the big deal about structural biology in general and how can we understand what's happening here so to understand this we first have to step a couple of steps back so I now take you through a quick tour during biochemistry studies so you now start as an undergrad in in our research lab here and we first have to discuss what are actually proteins and I started this by asking chat GTP to draw a more or less accurate picture of my office and this is how it more or less look like um it is some quite similar to a to a nerd office I would say or the hacker office which is because we have computers and we have quite a bit of coffee we also use quite a bit of energy drinks and but when you look now at the ingredients there actually nearly no proteins in there those the protein here is 0.1 G and actually in your Club Mar there's actually no protein in it but in the pizza it's always a good track this is now a pizza for American friends because I never tasted this is from Walmart so um I have no idea if it tastes very well but it has a lot of protein 56% of your daily average usage 8 gr of protein so and we can already see here that proteins are one of the four major micromolecules in your body so it's normally proteins nucleic acid sugars or and fats or lipids but we don't need the other three we only talk about proteins so just this just this one um so what are proteins in principle so proteins you all know that we are coming from cells and if we now take a little bit simpler we can say our cell is more or less like a house and every cell in your body is like one one room of your of your house that we could imagine and the proteins in this allergy w

ould be then all the tools which you have in your room for in your kitchen you have all the kitchen util utilities and meaning all these would be different proteins and proteins in your cell are actually what's actually doing stuff so if you smell your neighbor because he has hopefully followed the 6 to1 rule that's a protein which detects the smell then it's corn then it's connected to your brain so that's the protein which actually fires this whatever electric signal to your brain if you eat the pizza outside then there's a protein which actually cuts down this protein this pizza into some kind of small molecules which you can actually use so proteins are the building blocks of your body and are those who are actually doing the work yeah they are those which actually do stuff measure stuff signal stuff whatever whatever you can think of it's normally a protein with very rare exceptions which we have to ignore here so obviously just by looking at this image we can already see what these kind of proteins are doing so this here is probably a microwave this is probably digesting some pizza and this is cooking whatever doing some new stuff but in general if you really want to understand what they really need to do we have to normally look very closely and to make this point a little bit clearer I show you here three proteins yeah in a rela low resolution meaning we look from Rel ly far away we can't really see what's going on so we see here three different stuff and just by looking at that and this we could do for years already just by looking at these like say low resolution images we can say well there's something here on the top this might be a motor motor domain yeah for those of you are struct biologist atpa domain potentially yeah nobody knows then there's some long stuff which we don't know what it's doing so probably it's a Linker that's called a Linker and then the bottom we have an effector domain something which is doing something yeah it's somehow driven by

this motor and then it might spin or whatever it do and we can just by this very rough rough estimation we can say well this is probably a protein which do some mixing or blending action and we can all agree but if you now go very closely and really want to go into the F that's what structural biology is normally about going to really and looking very very closely and by very closely I mean we go from this eight angstrom to this let's say two angstrom resolution yeah and an angstrom is just how well you can Define pin from a f we see that in the minute then you see that this guy over here this is probably a blender which you probably all have in your kitchen household and this is very good for making smoothies yeah so this is very nice the other one it's also a blender but it's a scientific equipment yeah from Pro scientific I just named the names because there their images I don't care I don't have never used any of those yeah um this is actually doing for making cannaboid oils so you probably don't want to mix them up yeah if you make your smoothie for your for your kid don't use the wrong blender because it probably makes you the wrong product and the last one yeah it's obviously something which makes mortars and glues and you shouldn't eat it at all yeah the other one we can discuss if you should or not so you see it's very important to look in detail and not only have an overview but look very finely what is actually happening so what we learn already if we know the structure so if you have it in a very detailed structure we often and this often is now because I'm a biologist in biology never ever everything is 100% true is always often or mostly but for you you can say we can always INF fear the function of the protein yeah and this is one of the relationships you can say paradigm of structure biology structure actually leads to function so if you have a structure in a protein so if you have a blender it will blend if you have a cooking pot it will cook and i

f you have a microwave it will heat up stuff that's the idea of structure leads to function and as soon as we know the structure we can actually derive the function so but proteins obviously are not blenders they are not made of mole of electronics so we have to think a little bit of of chemistry and this is the most chemical stuff we see today so don't be afraid this uh is an amino acid which builds up our Amino our proteins and the only thing I want to tell you here proteins are multimar so you are all built up of these parts of things the blue one always connects to the red one and then another one like a Shain of multiple little things and the only difference here is where there is an R standing here so this is a so-called side chain and we have 21 of them that's the natural side chains and you have seen them now and you see all the little Wiggly arms here go down here and you can forget about them we don't need them for this talk but at least I want to show you how they look in principle for this talk we can just simplify down a little bit and say we have only six amino acids and they look so suspiciously like the pearls from my uh daughter's uh from my daughter's playroom but however so I just luse these six amino acids and if you now use the six amino acids and make proteins out of them we can make different different versions so this for example is one protein of different lengths of amino acids in a different organization and this might be for example your protein which is cleaving your pizza in your stomach and there's a second protein and this might be for example electric conductor in your brain so this tells your brain if the pizza was nice or not and the important part is now the difference is not the amount of pearls in this particular case the difference is the ordering of the pearls in this orientation so there's first in Violet then in yellow and so on this is also Violet yellow but then it changes and this kind of primary sequence as we call it or

primary structure then leads to different forms so this protein this protein string will automatically fold itself up into these kind of structure and this for example is now an alpha Helix it's an Helix which we call Alpha Helix because we are fancy and the second one here like this one this forms something else this forms this sheet and this we would call the beta sheet this is also very typical component of of our proteins and what we know by this is it depends not so much it depends about how you order your purs on your string what kind of structure you get and we already know that structure defines function so proteins are strings of perls we have 20 different ones or 21 different types of perls and they complicated knots and they are specific for this particular Pearl so whenever you have this chain it always will link to that so this is the easy introduction now we go a little bit more into what it really looks like because if we want to look at Alpha fold and what it produces we have to at least have an idea how protein really looks like so this is just our simple one and in structure biology we normally like to show this as a cartoon structure as we call it it's not very cartoonish but it just simplifies how we look at those so this is cartoon of what we've seen below and we can now have a look at this cartoon in various different ways and I just want to show you that this because this you see sometimes in the news is not really what is a protein so this is now a cartoon where we see the side chain so here we can also distinguish which pearl color we have at which position yeah you probably remember some of those you have seen in the in this big slide we can also have a look at the whole atom so here we see all the atoms so here we have now get rid of this cartoonish Helix stuff and see now at every edge of these borders we see an atom and the colors actually tell tells me which atoms they are blue is for nitrogen red is for oxygen and this is a sulfur and

the rest is carbon if you are interested if not you can forget it um but this is of course also not the same the real picture because atoms have masses and if you now look as atom the masses we see this is how it would be a more appropriate depiction of this little protein fragment yeah so this is how it looks like you see you can't really see a not a lot of things anymore because it's all hidden behind it yeah and the last thing which I want to show you is you can also have a look at the surface so this is how it would look like if you could look at it yeah if you're Antman or whatever and go very deep and then you can run around on this little protein stuff but most of the times we actually look at this because this is the one which we see the general structure that it's in Helix that it's turning around and we can only show those residues which are interested to us interestingly or sadly this is not the real picture still because this is just a very tiny fragment of a protein so if you now have a look at real program so this one is just this one and this is structure which a colleague of mine actually solved experimentally and if we just have a look so this is where we started so and this is the whole protein now so this is now the first protein you probably see hopefully or or not this is one protein here and you see it's all this little helixes around and then we have this beta sheets here which is the general General overview of how these kind of things looks and if you're now an experimental an experim sorry an experienced structure biologist just by the look at this crystal structure or at this structure you could say well this looks like a blender doesn't but just de it is what you can also see here there's a second part there's a second molecule here which is depicted in different way but it's also protein and um what you can then learn if you look very closely you see that for whatever reason and I don't want to go into detail but just that you have an i

dea we see that these protein this is the left one which is our big domain and this is the smaller ones um that it somehow recognizes this so there are some interactions between amino acids and I don't go into detail how they are working because you need a little bit of more of biochemistry to understand that I guess but you can appreciate some amino acids they like each other and they like to stick together and that's the way how you can for example make complexes and how proteins are folded okay so this now we have so we know sequence leads to to structure and weow that structure defines function and this is now the famous sequence structure function relationship in structural biology which leads all the efforts which we are doing and this is also what um Alpha for uses because he says he knows the sequence the sequence defines the structure and then you should be able to actually Define what the function of the protein is so how useful is it so now I told you what protein structures are at least I hope you have an idea so how useful is structure biology why should we care yeah and the thing is if you do structure biology you actually understand the life in a very particular and very very defined way so this is a picture from David Godel um he's an he's a structure biologist I was nearly saying former but I think I don't think that's fair he's a structure biologist who now doing all this nice very nice uh um water painted images and they are accurate in in that regard that he really use this structure biology information to make up these pictures so what you see here on the left is a cell which are connected to an intricate network of collagens which are just by chance my research interest to this outer membrane here so this is skin this is a huge collagen fiber here and we learned so much from here I can name you most of the proteins which are going in here and we know all the structures just because we did structure biology and the important part and we have thi

s already for example is that just because we did structural biology we actually know that collagens and that are those proteins we which make up your bone which make up your skin and which makes up your jelly if you're are not a vegan um all these kind of things they are all St collagens and they have a very particular form which is look like this and it just goes shortly over you see this is triple helix it looks a little bit like a DNA but it's triplicated it's a little bit more complicated and interesting enough to form these kind of collagens this triple helix you need a helper like you would need if you make ropes if you make a rope you also intertwine at least three or four whatever of different little ropes and here you need the same helper you need a helper here to make this rope and this is actually a protein I just shown you a minute ago yeah so this protein sits on the side of the collagen and helps collagen to form and these kind of of relationship we only understand if we do structural biology if we really look at how the atoms are orientated and what the atoms are doing with um this particular protein so it helps us to understand basic physiology and this is actually my main driving force because I'm really curious about how life is actually working but it has also some approach some some real life um let's say usage um so understanding diseases is a second big thing and it of course is linked to understanding the B basic physiology I'm not sure if you all know what this one is uh we have all experience it for years now this is the Corona virus um zarov 2 and all these little Parts on the top are the spike proteins we have all heard so much in all these little podcasts um and the SP of course immediately crystallized back in the day sorry structurally solved back in the day and only by that we could actually understand what all this mutations you have probably heard actually are doing yeah and this is again this little nice dis piction of David Godel

about it um this is the Corona virus in a more artsy U the depiction and how it actually in in infects the cell in your in hopefully not in your body so that's that's tape and just to see because viewing is understanding so this is an image which I just show you a very very short part of it so this is all based on structural biology we have the spike proteins the spike proteins are processed I haven't shown you that then they elong elongate parts of the amino sequence you see this Alpha heles here down which I've shown you yeah and then it actually fuses up it goes open and then actually your virus which is here on the top fuses with the with your body cell and then it ex then it excretes all its content into your body cell and infects your body cell if you want to see this movie in full length you found it here at the animation Lab at the Utah University it's actually a very nice movie and explains very well how this works I show you the cakd at the end as again so don't don't panic so how do we determine this kind of structur so we already know it is poly important you see that and we need to know them and the thing is they are very very very tiny yeah so they are one billions so this link from one atom to another one is one angstrom which is 0.1 nanometers so it's 0.1 of a billionth of a nan of a meter so it's quite quite quite small and the problem with that is we cannot really look with a microscope if we use the best microscope so to say which is an electron microscope um then we have these kind of pictures this is luckily the same picture as we see here we have here collagen again and I just use collagen because you know because I work on them you can use any protein for that so this is the collagen here this one again on em grid and then this little little white bulbs is actually our protein here which is called hsp47 and this would fit there so you don't see much you see a couple of pixels so this is not enough so a microscope is not going to do it those ar

e you were experts they say Qui am and so on yeah I get it we can talk about that but this is now a traditional microscope and it didn't work for that so you have three different experimental methods to actually determine protein structures um so one is so-called cryoem one is bio NMR and one is protein crystallography um and I do protein crystallography most of my days sometimes I do a little bit of cry but if you look how many of the structures we actually solved up to now you see that most structures we know up to now are actually solved by protein crystallography this will change in the future cam is on the uprise but let's let's ignore that we just have time to look at one of these methods and it's obvious which one we took which one we take it's crystallography and also there I will be very shortly it's not the topic of the talk but at least I want to convey how much work it is to get the crystal structures so that you actually appreciate how easy it is now with Alpha fold and how much this might change all the stuff so if you want to crystalize as a protein yeah and meaning crystallization sorry this means getting the structure at the end yeah so this is this crystallography is just in the middle so you first have to identify which one you want to take so let's say we take the spike protein so uh first of all and I can you have to isolate the spike protein from the virus because you cannot use the virus completely for various reasons you have to isolate the spike protein on its own which means cloning and here's one of my colleagues actually doing some fancy cloning technique it's just blue color um but this is how you do it then you have to purify the protein yeah and if you're lucky enough then you mix your purified protein with thousands and I mean really thousands of different chemical cocktails we have robots for that pipeting a lot of stuff and you have lots of different trays of combinations and if you're lucky in one of these 5,000 combinations you ge

t a protein Crystal and this protein crystal is actually what you think about it I mean it's a little bit complicated to explain but in principle it's a crystal it looks like a salt crystal it is all matters it's a crystal it looks like more or less like this if it's a beautiful one so you see it immediately it's a crystal like a diamond yeah and then you have to pick it because it's tiny it's so small you can only look at them under a microscope normally so you pick them and then you go to a fancy synchroton source for example so this is the SLS in villing but we have one here in hurg um the pet three ring that's the embl which we also go regularly so and there you actually find x-rays on them and this is me doing it in our home source it's not that big as this one but it yeah more or less does the job so then you you you shine x-rays on it and then you get this very nice pattern so you don't really see the protein it's not like microscopy that you enlarge your protein you see a defraction pattern and this is this one it looks like this and then you have to back calculate there's some magic going into that as well which I won't explain today and then you have these kind of electron density so where are all your electrons and then you can sit in the middle of El you can put some proteins and I did this this morning a little bit for you so um at some point I will find some proteins and will put some protein structure in it so here here we start with building up the model and this is what we have seen more or less before yeah just this wireframe model of where items are connected and where they are sitting so how long does it take so it is of course an experiment so you never know but Target selection and cloning poate days protein purification depends how good the proteins to purify weeks two months crystallization can actually sometimes never work so it might be that you even after years have never found a crystal but it takes at least weeks to months well the colle

ction is easy it takes hours or minutes or seconds in our days but you have to go there so let's say hours and then um you have the data and then you have to build it up and this takes again weeks so in all of those we have a half a year before you have a single protein crystal structure and the typical PhD in our research group takes three years so if you're lucky you get five Six Crystal structures if you're unlucky you get none yeah so this is this is a way of of have to think so it's quite a big way so and now Alpha for tells you well you can just skip all this and go to the back in just a couple of hours and this is this is the difference we go from years to hours and this is of course amazing if this would be true and to some regard it is so how does alpha for cut all of this all of the stuff so first of all we have to think what information can Alpha for use and for that we have to go a couple of steps back to protein Evolution so um if you look about proteins proteins of course we are now looking again at collagens for example because you know so collagens are happening in all proteins in all animals which have bones and in this animals you see they have all some kind of bones but the sharks so the sharks are out of the picture yeah unfortunately for them well they have some collagens but not the ones I'm interested in so if you have coll in all of these that means you have all proteins you should have the same proteins on all of them which are helping to F the collagen the ones which we just has this little one which helps the triple helix of coll to form they all have it but as they are very old so the one which are down here the predecessor animal which was living here which all of them are descendants from this animal also had the protein which folds collagen and now over the time this protein has changed and this is called mutation and this is how Evolution works and that means you have a a version of this protein in all this animals but they are all sl

ightly different and I can just show you an image and it's just about the colors so ignore what the letters are the amino acids but it's just about the colors yeah I I showed you here that we have uh sorry we have the dog around here the cat somewhere is the chicken here all the amphibians and we have the fish down here and you see most of the proteins most of the sites are the same so for example this k whatever that means is more or less everywhere once here but you sometimes see positions where you have mutations so where the protein has changed in other species okay so and how is it and what can we learn from that and for that we can have a look here so what we have already known is that proteins interact I showed you that in the on where the collagen was recognized by our helping protein there was this little Dash lines and I told you some Pro some amino acids just like to to uh talk to each other and this is what I depicted here so let's say this is our protein it's a very short protein but we can live with that and to form this KN there is one particular interaction here in the middle so there's a blue eror which is recognized versus orange whatever it is yeah and it links this Loop together so is this now if this is for whatever reason changing in Evolution for example if this one gets now the orange one then the other side has to change as well otherwise you don't get this inter action and if you don't get this interaction the whole protein will fall apart and you don't have the same structure anymore yeah so that means those parts which are interacting with each other like this ones they should co-evolve so if if one side is changed in an animal the other side has to change as well otherwise it's not functional if both changing whatever they want willly vanil like here this one changes to something and uh I don't have another one thisle then they are not doing anything particular important but this is the thing to do and what we now can do for we can now h

ave a look at our previous alignment with all of different stuff and have a look which of these changes and if this changes which other side changes and then we know that these two are probably in close combination to each other and it's not only these two we have lots of combinations how they can interact it's not only that we have just two interactions yeah so this is just simplified so that's the first thing which we can see this is an old idea actually it's not from alpha full or deep mind but this co-evolving over Evolution it's an old idea um and then we need to understand the second thing so dealing with threedimensional structures is difficult because you can rotate them and you can move them around and then everything is different so what Alpha normally uses and it was also invented before they used it is they don't use a three-dimensional representation which I don't use here either but which I used before they use a two- dimensional representation and this looks like this so what they do they plot the amino acid here on the top and on the left side and then in these little boxes you just enter how close are these prote these amino acids to each other so obviously the blue one to the blue one it's pretty close because it's the same yeah and then the next one so the blue one to this blue one here they're also pretty close because they're connected by the main chain that's also very clear that's nothing new so this middle line is always dark and there's always interactions but it gets interesting if you look for example for this interaction so this part this blue box and this green star they are pretty far away on their primary sequence yeah so 1 2 3 4 5 six amino acids away but they are close in three dimensional space and that's why we put a dark spot over there oops sorry dark spot over here and the same is true for this blue one and the orange one they also 1 2 3 four apart but they are pretty close in three dimensional space e in my three dimensional sp

ace which is only 2D but and then you get the same over here and obviously as this is catic you have the same side on the other side so they are always looking the same so in this we also need to understand so that is pretty simple that's an easy way to represent a threedimensional structure as 3D protein in this two-dimensional table and this is how it looks in real so this is a protein we have now looked multiple times on this is again this cin folding protein and this is how this two-dimensional I say would look like so in the middle you see of course the name and then you see some interactions somewhere and this is how how it looks okay so this is information Alpha can both use and this is just what Alp always shows it in black and white so I I just switched it here to a black and white version so and the last one is if you use about machine learning and I think we have heard it in many many talks over uh over this conference already if you use machine learning we need data to train our machine with and what data can we use and they alha could for example could luckily use um the combined knowledge of the structural biology Community uh which up to date goes up to 25,000 experimentally determined protein structures so you can just add say 215,000 years of of PhD labor at least PhD student labor at least which go into this database yeah probably more um so you can all access them in the pdb database uh the protein Data Bank is now 50 years old and we have this 215,000 and you see it was pretty shallow at the beginning but actually it rise experimentally over exponentially over the time so this is what they could use and what you have in this database in a very nice and very neat way you have the sequence and you have how it should look like and this is exactly what Alpha wanted to learn by Machine learning so they had very very good data set and they never said anything else so they are very grateful for this database and this is just one entry how it looks like

you see the structure then you see the sequence somewhere and all the stuff in there so just to see and these are 25,000 of these entries yeah so it's a very it's a database actually created by structure Bist themselves okay so now let's have a look at Alpha fold itself so this is from the paper from 2001 which is the Alpha version which uh dealt with which what the latest one and I'm not a machine learn expert anyway um so some of you probably understand this better than me but I try to guide you through as best as I can and I'm happy to get Corrections at the end if there's anything to say about it so the first thing is here on the left side um we first do the data Gathering so you put some input sequence in and then you look for this multiple sequence alignment that's exactly what I told you we look through all the species do we have similar proteins so that we can learn something about these co-evolving proteins the second thing which they do they make this two-dimensional plots yeah which they don't know where the positions are yet but they want to find it out and the last one which they also used but it's not necessary so we mostly ignore it for this talk they look for templates so they look for proteins which are already solved but probably looks the same or similar but this is you can get rid of it you don't need it so that's ignor it so first thing is gathering the data so um I again use a dog uh protein because I work with dogss unfortunately or not unfortunately but it just happened to be that case so what you do you use a protein and you put it into this databases and these databases are huge so at least this one has 2.5 billion entries of proteins and we don't even know where they're coming from so they're just from soil sampled and you just SE them through and there are some species some soil bacteria just Express proteins and we just have a look that are proteins it's a huge database of sometimes we know what it is sometimes it's just randomly protein

sequenc we just lying around everywhere yeah uh and the same is true for other dat so we have a couple of billion databases and we just a huge we get a huge um multiple sequence alignment and this is how it then looks if you really do it so on the top ones are the very are the very good ones here and on the bottom one you get less good ones this is a quality score from from from Blue to red and you get even some where only parts of them align so you just have fragments but it's still helpful so it's not always the same protein it's just looks so similar probably has the same fault and the second s is searching for the templates so I did this here for the protein we had a look at all the time so these are now all proteins I could look in the protein database which rather look similar if you can see that yeah they look so these can all be used by Alpha to to guide so it should look roughly like this yeah that's the templates and we Mak this two- dimensional contact Maps out of that and then we have this part done as well so we have our multiple sequence alignments our MSA we have our this protein our amino acid distant blots which is this two-dimensional representation of our threedimensional structure and then we actually go into two neuronal red works the first is called the evil former which has 48 blocks and the second one is the structure model with it eight blocks and the first one is what they try in this Evo form is actually to get all the information they can from the multiple sequence alignments and from the templates to get this pair represent a so the distances between all the between all the amino acids as good as possible so that's the idea try to predict which amino aets are how close to each other and then the last one is doing the folding and I don't want to talk so much about the folding so how does it work so here we have our multiple sequence alignment um it gets into representation and we get our distance block and the first thing they do they an

alyze all the sequences and all this coevolution variation based with an attent with attention with Buyers towards what they believe is already the right distances between all the amino assets and they know that either by just guessing which is just randomly guessing like diffusion model more or less or if they have templates they use the templates to guide the first initial values for these parts yeah so this is the first part so this is first the multiple sequence alignment which guides the re-evaluation of this multiple sequence alignment of representation and the second part is then after they extracted all the information of the multiple sequence alignment they update now their um pair representation so the distances between all the amino acids and try to get a better destination of that so this is all pretty St and what they did then is actually a little bit nice because they did then a triangle equality check and let me shortly explain that to that if you look at a threedimensional structure like this one and you have defined that these lungs here for example where is the information so let's say this one here from four to one you have found this is a very good uh a very very close connection and you see that also in this structure now then you and you have also the determin that 3 to one which is this one no sorry four this is all messed up a little bit the number so this is 1 to five and this is one to four so these two you know which are or you believe they are very close to each other and that's what you get yes and then you immediately know that the next one which is between four and five needs to be also in some of close proximity because the triangle needs to be fulfilled with 180° in all corners and this is Tri triangle equality so whenever you change this one you have to change this one and if you change both of them you have to change this one as well now this is built into the second tower of the evil former and what they're doing by this is to tea

ch the neural network that it it works in a threedimensional space because otherwise it would just change all of those independently and it wouldn't fit up to make a three-dimensional model later on yeah that is the triangle equality check which they build up in there and this they do now 48 times and at the end they get a pair representation out and a single um which defines the distances between all amino acids yeah and they get the first they get the sequence and then they put it into a structure model and I don't want to explain the structure model too much because first of time reason and second it's it is interesting but it's not as nice to explain how it works what they're doing they're modeling their restraints so all these pair restraints they got like a like a protein gas and we can have a look how this looks like if we see this modeling so this is now Alpha F modeling a protein um normally you don't see that this is from the homepage from from De M and you see it it forms it up and gets bigger and bigger and bigger and it samples all these thing and at the end it decides that come on this one now is the final structure nearly 48 blocks so this is the final structure okay um so this is very very nice um and what they did really really really well as well is that they embedded into the neural network also an error estimation so they not only giving you this is our solution they also tell you how good is the solution and they do this with this color code so this called the predicted local distance local PDT uh predicted local distance I now miss the word PDT is the right abbreviation and if it's blue for you then it's a good prediction and if it's red it's a bad prediction and that's the way it is so here we see again a picture and it's colored exactly in the way one the same way so the bottom one is very blue so this is very very nicely defined Alpha fold and then we see this Loop here which as this Dr I can tell you this is [ __ ] and I see that immediatel

y but by the color Alpha knows it as well so it doesn't say it's good they say it's nobody knows what it is so this is the red one okay uh and it gives you also this pae blocks which are similar to what we have already looked um so what you can tell here if you look at two positions that says this one and this one it tells you 300 to 250 how well it believes that these two are to each other positioned to each other that's the predicted alignment error so it says this one is pretty nice but if you compare this one to this one so that 300 to whatever 600 it tells you I'm not sure if this is really oriented very well so it gives you this local local accuracy in this color code blue to red and in this green color as I callor it here it tells you how how sure it is about the distant colors the distant itself and if you now look you just color this part and this is this folding domain how we call that and this one is the other one and what alha tells you here is it knows both rigid bodies very nicely this Globes but it doesn't really know how they are linked to each other can be any way yeah doesn't tell you how it is but if you know need to know for the function you should need you you need to determine that so this came out in 200 21 if I'm not mistaken and actually after the second cast they was very fast in publishing and also they gave us for the first time the source code and all the means to actually run the code before that in 2020 they only gave us a mega paper oh which is very bad thank you um so this went viral on Twitter and it was the first time where science and Twitter really worked so um immediately people found out that you can hack the the network very nicely um to do things which were not intended you not only can predict individual domains you can also predict domain complexes by just hacking it I don't go into detail now but um here M did some very simple programming to just change how alphaa sees the sequences and then could also comp predict complex

es yeah and the thing next thing is Google didn't thought about this but Ser oov shikov thought about it then just implement the whole thing in cab so you don't need to download the stuff you can do it all in collab by Google so uh collab fold making proteins fold accessible so you can go to GitHub scop uh to get to this GitHub folder and just make your own predictions and I actually wanted to do that but I saw the time so we just skip it now and you can do it on your own it's very easy you just put in your protein stens and it goes very fast so how does it change how did it change structure bology so mostly if I do now structure biology I still solve protein crystals experimentally because it is an experiment it's a real world it's not a prediction and if I then compare it later on with my predictions with the predictions this is one which I solved relatively recently and if I then compare the predictions with it I have to admit very often it is pretty accurate you see I mean the overlays up up 100% I mean there's a little bit here if this is your interesting part then obviously it's it's nice but most of the time it's not most of the time it's might be might be not but sometimes it's actually more wrong so this is another structure I solved last year with a colleague I didn't tell their names because it's it's a little bit secret so it's easier like this I don't have to to all give you mdas to not tell anyone about the structures so this is the predicted structure and then I solved it uh in a second and it looks it looks quite a little bit different um so this is the real structure so the blue one um so it is it is recognizable but it's different yeah and if you morph it around the videos are too small too slow come on turn around you see it's quite a bit of movement going on into the structure yeah and sometimes so this is my collagen which we talked about so you've seen this so this is what alaa think about the collag looks like it's a little bit mean because I

didn't tell them all the stories but this is how Al thinks this collagen looks like it should look like this it makes it like it knows it's [ __ ] it says it's red so it tells you that it's not correct but for me it looks like little bit my probably some divine intervention into this I don't know but it's not yet there so how can it be helpful for experimentalists it's very helpful because I didn't tell you the whole story for crystallography we have to use old knowledge to actually solve our structures and we can do that now with using Alpha fold and this is also how I solved this structure so this is the one I just showed you and I solved it oh sorry it's not working I solved it by just using the red one and the blue one indiv individually and just put them differently together more or less to make this long part symbol so I can use it but I have to always think about it what is really the problem then we have integrative model building so people have this very very low resolution of huge structures so this is a nuclear pore complex at seven angom this is more or less the resolution we saw earlier with my blenders which you couldn't really say if it's now making CBD oil or mortar this is the resolution R yeah and if you now have those and you have all this high resolution predictions you can up and out fit those because from the outside you could probably see the differences so this is how it looks in real um so this is one of the major outputs that we can now make huge structures which we couldn't do before so what can not do and this is the last pple of slides so I'm probably going three minutes over so but um I think we can live with that hopefully so what it can't do it can unfortunately not predict the effects of single mutants and this is very important because this is now again our dog and we had this dog not in our lab but we Veterinary veter veterinarian came to us because this dog always had broken bones and there was a single mutation in this protein an

d this I just indicated here where this little mutation over here and this obviously changed the protein quite a bit but if I do it in Alpha fold obviously this one mutation over this whole MSA so this one change of it just it's just lost in the in the neural network it just ignores it so what it does just it just gives me the same structure again and just changes amino acid but obviously if I would solve that that would look completely different because it is a delerious mutation in the moment so in the moment we cannot predict effects of single mutations of proteins then normally what we wanted to do we want to use this for docking so normally we have proteins we want to change whatever they do and then we would dock small molecules it like viral stuff whatever and this is also not good enough in the moment so ala cannot use cannot be used for that so we have to do still experimental s and finally you can never be sure and this is my major problem I'm a structur list because I want to know the truth and the truth I only know when I have experimental evidence for it and alphaa it's very gu it's it's a very nice and it's a major breakthrough but obviously it is just a theory and it should be dealt with as a theory and you have to do the experiments anyway to test if this theory is correct or not as you always do in science and this is now the major thing which goes through all the stuff and I would just stop with this F with this um uh with this quote from D lover which is a columnist at nature Stu has been greatly Advanced by these new tools but it has not been outmoded replaced or rendered IR relevant it's more relevant than ever and now we C down to even bigger questions with it and with that I hope you enjoy the talk and I'm open for question

[Applause]

thank you so very much Yan we have microphones in the room there's one there there's one there I'm a little bit blind but I think there's also one there and one there of course our folded protein signal angels are monitoring for translations and mutations and all sorts of questions on the internets in Matrix in IRC and in the fediverse we are in Zeus so the hashtag is # 37 C3 Zeus or in the IRC room 37 C3 Hall Zeus so let's start here in the front hi I'd like to thank you for your wonderful talk and I'd like to ask um if there is like any possibility to correct the predictions of alha fold because you said a lot of times F fa that's like yeah like bu you you see on the first look like that cannot be the truth so is there any possibility to use our knowledge of structural biology and combine that with alha fault to have like the ultra tool to use best of both yeah yeah it is it is it's a good question and I always need to be careful Alpha for is not making [ __ ] in 99% of the cases it is doing a very very brilliant job just to make that clear 95% it's only couple of times which is getting pretty bad so and probably I've over over over brought this point because it's so close to my heart that um structure biology is not dead yet so but yeah you can you can use multiple other things you can use NMR structure we can use restraints from other stuff we can change the multiple sequence alignment so there's a whole field I mean I just recently over Nicola's day I was actually on the conference for 4 days which just dealt with structureal Biology so we are not at the end of this computation structureal prediction yeah this is just the beginning Alpha fold made a huge step and since then we have evolved quite a bit so the last cast for example Al for didn't even go there but many many other programs did there which did it in a similar way so yeah yeah definitely yes and it gets better every day so I can see that the internet has crystallized out a question um the internet w

ould like to know um what your personal thoughts are on um where this progress uh could be utilized in the field of medicon is this still far off or is there already progress being made no there's quite a bit of progress being made I mean it depends a little bit what you need what you need your structures for I mean I told you about the docking so this this this typical idea that you for example you have a Target like an insulin receptor and you now generate a new insulin which fits perfectly in this Bound in this in this pocket this is W sorry so this is probably not not there in the moment um but if you need more more broad let's say more Rollo resolution structure this is already uh going and in the moment I mean I I support or supervise many structur biologist uh projects and also some medical ones and none of them are not using alha fault it's not that you're not using it it's always there yeah and sometimes you need to have more data and then you have to do experimental work as well so yeah I see microphone 3 has been staring at the X-ray machine is there a question yes um thank you for the presentation so my question is if in the future maybe Alpha F would be such good that it would be possible like with omix techn technology to combine it with interatomic to simulate all organisms to answer scientific questions for that yeah very good question we we are on the way already I mean there are groups um there are this minimal minimal viable cell which is an AR it's not artificial it's a cell which is so boiled down that it has very few proteins and they already trying to simulate the whole cell organism all the proteins on a physical way and they use alphao to to get this parameterization of their of their model in that yeah so we are on this way if we are ever there how should I know I mean if you asked me four years ago if or 5 years ago if there will be a neural network predicting protein structur I would say no so I don't know probably but we are we are doing

that for years and it has done now multiple multiple steps forward yeah the collagens at microphone one yeah uh so um most of the uh uh proteins we know the structure of can be crystallized um how's your estimate uh regarding Alpha's uh performance to to uh proteins which cannot be crystallized because they don't turn up in the training data as much um I'm not sure how far you aware about crystalization I mean in theory all proteins who have a defined three-dimensional structure which are not all but we forget about them in the moment because this needs another talk um all of them can be crystallized we haven't found the real con real conditions yet potentially that's at least the idea behind it so so I guess as long as there's a threedimensional structure in this protein a prediction method should be able to some at some point predict it if Alpha fold is doing the trick or some other probably language models in the future which also are there nobody knows but y thank you could it be that the internet has distilled down the structure of another question um the internet wants to know whether the structure of a protein can change during crystallization uh this is a mean question yeah um yes of course it can um and some people say a crystal is so artificial we shouldn't actually consider it a protein structure but I I I beg to differ yeah it can do that and we see this Crystal artifacts as we call them in the crystal structureal Community but uh we have 200 whatever thousand Crystal structures and most of them are tested in many biochemical ways that they are correct because you can make hypothesis based on the structure and you can test these in real life scenarios like in cells or in Vivo or Vito whatever so yes but most of the time it's not a problem sometimes it is as always in biology never 100% true so there's time left in our PhD for two small crystallizations or one big one microphone 4 yeah thank you for the talk um so I was wondering um how how do you get to

the initial amino acids that you need to put into Alpha fold or need to crystallize because if you not know don't know what's there how do you know how to PR purify that or get yeah I get this question it's a it's it's a good one but in principle um you are normally you're going the other way around so normally you have a question what is your scientific idea what do you want to do so you have an hypothesis I want to deal with cancer or with let's say I want to deal with Corona virus and then you know the Corona virus genome and you know all the proteins in coron virus because we have sequenced them so we know the DNA and I didn't put it here sorry but we know the DNA and the DNA is directly translated to amino acids so as long as you know the DNA you know all the proteins I didn't put this link um sorry forgot that should have put it somewhere a very short question for microphone too okay just again to problematize this idea of a single um folded structure I know it emerged from the history of x-ray crystallography but um like in human body sometimes a protein can say Malou form mold or it can have multiple metastable configurations so could we imagine alphao and other Technologies evolving to be a more Dynamic thing where we're seeing the evolution of the folding from when it's initially synthesized the protein and um could we imagine new Optical techniques that allow us to just double check that these Crystal proteins are the same as these living proteins this is this is a very important Point uh and because first of all alphaa does not know know how proteins are folded just knows this is the sequence and this is how it should end up but it has no real way how to fold it so the physics be the folding it's completely unknown to Al which is my major concern because it just makes the beginning and the end so in Crystal graphic we have very often different Crystal forms of the same protein so we know this the vi viability sometimes and my hope is and now we go to th

e feature to the Outlook that at some point we have probably like a large language model of proteins which actually understands how the protein folding is kind of happen not like we come from A to B but how what is the different steps and meta for example the Facebook company has done this esm fold is a lang large language model but it's much much less good than alha fold in the moment and as as far as I know they actually dropped the development so it probably will never be better but someone is probably working on a large language model for proteins and then we understand real the folding the biophysics behind it and not only this is how it looks like before and this at the end in the middle is a black box you perfect so I have a masteron and I have a DCT you can find me yeah if you have further questions I'm on the conference tool and Yan thank you so much let's put our hands together one final

[Music]

time

Viewing latest content
Link to this version
Link to read-only page
Edit this pad

Download as

HTML

Plain text

Microsoft Word

PDF

Server Notice:

37c3-talk-12061 Latest text of pad 37c3-talk-12061 Saved Feb 4, 2024

Download as

Authors