Hallo Du!
Bitte vergiss nicht deinen Fortschritt im Fortschrittsbalken auf der Seite des Talks einzutragen.
Vielen Dank für dein Engagement!
Hey you!
Please don't forget to mark your progress in the progress bar at the talk's website.
Thank you very much for your commitment!
======================================================================
Has any of you ever been using, like, formal methods or automated proof softwares? Oh, many of you. I find this topic super interesting because when I first read it, I was like, oh, my God, no, I'm going to prove the security of everything I've written so far. Then I realize it takes a lot of time and it's very difficult. And most of this stuff doesn't have Othell any kind of documentation. So but I mean, Kornelius. Or some formal method as, um, for for security network management. So I believe he will be able to tell me more about this. And in particular for this talk of today, he will be telling us about verifying properties of a portable firewall rules. So please give him a big round of applause. OK, thank you very much. Thank you all for getting up early and joining my talk. This talk is called Verified Firewall Rules and verification. So I think it's best to first refresh our knowledge about firewalls in the first few seconds. So he has a firewall rules that I found on a network attached storage. It's a Linux IP tables firewall. Let's just walk through that router to see what it does. So let's say we have a packet for this box. The packet is given to the input chain and the box will now walk through all these firewall sequentially until it finds a material to know what to do with this packet. Let's just do that for the sake of example so we all know what firewalls do. So this is the first rule. The box will look at what you see there on your right hand side is what we call the match condition. It tells you if this rule matches for a packet, this one matches for all protocols or types, all destination IP. So essentially this rule matches everything and then it will execute this action, which is called us protect, which means we jump to the user defined chain does protect, which is down there. And we will continue our evaluation down the first rule as we damage condition. First looks at ICMP protocol. Any source, any destination IP, ISP Type eight, which is an ec
ho request, as we all know, and check if a certain limit is not exceeded. If this is the case, we are we will return. We return to where we came from. Otherwise we will go to the next rule. It looks at ICMP request again and drops them. So we see a pattern here. Those two rule essentially implement rate limiting for us. If the limit is not exceeded, we can go out of the tech giant. Otherwise things are top here. The same pattern we see here for TCP Pickett's TCP Flex. Obviously with that flex that there if a certain limit is exceeded, they will be dropped. Otherwise we can return. So let's say we get through the dust protection without being dropped. We end up where it came from. The next rule accepts all tickets which belong to an established or related connection. Accepting everything that belongs to an established connection is usually considered best practice opinions differ on the related match. Well, the interesting question is how do we get a connection in an established state? Well, this is what most of the rules of a firewall are about. So how do we get in and establish state for this firewall? Well, not for us as packets. They are blocked. Also, a lot more TCP ports are blocked. Also a lot more. UDP ports are blocked. Finally, the firewall is accepting something. It's accepting all packets with a source IP address in the local 192 one sixty eight zero range and it's dropping everything else. OK, I hope by now we have refreshed our knowledge about firewalls. So what's the problem with firewalls? Well, I guess everyone in this room loves open source solutions, so we don't have to care about pectus and proprietary software. We also don't have to care about global infrastructure security made in Germany. We love our open source firewalls to Linux, our firewalls. They are quite good. So what's the main problem? Well, there was a study that looked at real world enterprise firewall out there in the wild and the study found, well, there are no good high complexity
rule sets. So the main problem of firewalls is administrating them, setting them up, which all comes down to configuring the ruleset for the firewalls. And, well, a few if you were laughing that was that we saw on the slide before was actually quite simple. Just imagine those sets get larger. Study repeated this finding a few years later and it still finds real firewalls are still poorly configured. So it's all about the firewall rules, sets that well, if the rules get complicated, we make errors. And by no means do I mean that administrators are incompetent. No, thank you very much to all those masters of complexity who get our network running. Thank you to the NOC for everything. Thank you for not filing us here so we can have more fun. So it's all the configuration of the firewall. So people had idea, well, let's just write a few tools that check our firewall rules and tell if the rules and is good. People had that idea and we looked at the tools that were out there. So we fed those tools, some real world firewall routers and we found, well, there's essentially no tool out there that understands our real world firewalls, because if you ever open men IP davits extension's, well, then you understand why things can get very complicated there. So and even if we had a tool that would check our firewall, said, would we trust the tool? Well, probably not. So let's do where others fail. Let's try again. Let's try to write a tool to check firewall ruleset. But because we found our tools, we have so far do not understand our ruleset. Let's start from the very beginning. Let's first talk about specification and implementation. When we mean implementation, we are usually talking about the code, about our tool. We talk about low level hacks to increase the performance of our tool all the on the right hand side. And this is usually the stuff you don't show to your users, to your users. You show them the specif. Or the documentation or just a question, what does your tool do?
So here we are to right at what we have to answer the question, what is the correct rule said and for the sake of example, for this talk, we will stick to spoofing protection. So our ultimate goal will be a tool that checks if our firewall rules, that has proven protection. We only have 30 minutes. So let's stick to spoofing protection. But before we can specify what spoofing protection actually means, well, we need to specify what is a firewall and therefore we need a model of the packet filtering of IP tables. And that model is the firewall in the beginning should really, really be expressive so that we can really say, well, if this model mirrors reality and we can get all our complex firewalls with all the fancy IP tables matching extensions into that model so we can in the end run our tool that checks. If we are spoofing protection, we will implement everything in the eyes of a theorem improver. Why are we using a tier improver? Well, you know, the good old problem, you look up the documentation of some library you are using, then you look to the implementation and you found, well, the documentation was just plain lying to you because documentation are usually horribly out of date. So in this talk, we want to write an implementation and improve that implementation corresponds to the specification. That's why we are using a theorem proven to have a proof in the very end that our code really does what it is specified to do. So the most important thing here is to specify what it should do. So to summarize what we're talking about, we are verifier for ruleset and we will verify the verifier itself. So let's get started with step one. We need a model for IP tables before we can specify anything. OK, we so we need to match expression's. Let's start simple. Let's write down the syntax of match expressions. The syntax is all about how do I write things down there. Let's define a data type LECG expression and this data type should be polymorphic over the type apostrophe
A, which means this apostrophe A can be any type who will call this the primitive, which will be the features IP tables can match on. Let's keep that generic for a moment. So how can I match expression then be constructed where we can match on such a primitive we can have a match expression that just matches playing anything, we can negate a match expression or we can combine to match expression to one larger match expression. As an example, we are combining to match expression with this match and and there we see in the thing. These are the primitives. So those things we keep completely arbitrary. Here we are meeting on destination IP and protocol. Tsipi And again, we will keep all these primitives. The features we can match aren't completely generic. This was the syntax how we can write things down. But what does this mean? So we have to specify the semantic semantics is all about what to match. Expressions mean. Well, match expressions are matching on packets, so we can't specify the semantics without a picket. Let's look at the type signature of the matching semantics first. There it is. Looks a bit intimidating for us. Let's look at what we have to. The first parameter of this Metrowest function is a function itself. We call it Gummo and we also call it a primitive measure because look at the type of the function. The first parameter is the primitive, some primitive match thing. We can mention, for example, two types of protocol. The second apostrophe P is A is the packet it should match on by the apostrophe. You see, we also have a generic packet model. You can plug in anything and it returns a boolean, true or false and well, dysfunction. Gamma should return true if and only if this primitive match condition matches for our packet. The second parameter for our semantics is a match expression as we defined it before. Then we give it the packet. We want to know if it matches and returns true if and only if the packet matches for the match expression. So let's l
ook at the individual words here. First rule and straightforward. If we match on the primitive A for a packet P well we just ask our primitive Metcher if there's a match for packet P, then the next very straightforward match. Any major expression matches everything if we have this match not, which means we negate our image, expression and image. And you might already have guessed it as just a conjunction of the two match expressions. So now we have defined syntax and semantics of match expressions so we can match on packets. No, we only need to specify the filtering behavior of IP tables there. It is quite simple, first of all. First of all, what the fuck? So the good news about this is, well, this is really everything we need to know about IP tables, packet filtering. It's on one slide. If we have enough time, we could really read it. It's much more concise than the main pages. Well, it just has a few funny symbols on it. Let's try to read it. Everything we have to look at the following. Maybe we can recognize something. P should be the packet the firewall is examining there. We have to gamma again or a primitive measure which has encoded all the features IP tables can match on. Then at this position we have to set the rules. It is just a list of rules the firewall is currently examining. As should be the start state. The firewall starts looking at the packet. Usually the firewall in the beginning is undecided about what to do with the packet and at the end we have the final stage. So in the end, the firewall usually makes a decision either to accept or to drop a packet. So let's read the rules. Let's read a simple rule, skip rule, they are all written with this line, which means everything above the line is to preconditioning everything below the line is the conclusion here we don't have any preconditions or this rule holds unconditionally. So let's read what this means. There we have it. First of all, this rule looks at the empty rules that the empty list. So wha
t does a firewall of the empty rules that to the rule says the start date and the final state are the same. So essentially, this rule says for the empty rules that a firewall does nothing, not really that hard, but we need to state we need to start at some point. Let's look at a quite more complicated rule here. We also have a precondition. And what we are looking at is a ruleset which only consists of one single rule, which has some match condition and direction is except our precondition is that we assume that the match condition matches for the packet. Well, what would then happen to Actionis except for two to five, will do well if we don't have a decision for the packet yet. The action was accept, then we are going to accept it. OK, also not that hard. And I guess we all agree that this is the behavior of our common firewalled. So all the other routes they actually read pretty similar, and we only have 11 of them, so if we have enough time, it's not that hard to read like you were first laughing at. So let's directly jump to the most complicated route we have to. And our view is that it's critical return. Well, looks a bit complicated. It is. So first again, we have a route with a single rule, with some policy there. And our first precondition is this one rule matches otherwise to five. I would not do anything at all. Then there is the complicated part. The action of the rule recorded are called SI, which means we want to call or jump minus Jakobson and IP tables to some user defined chain. The name of the chain here is generically expressed as C in the example from the beginning, for example, the C was the name does protect. And we also have as a precondition that chain and the background rules, it looks as follows. So we have the capital gang out there of C, which basically means look up in the background of all the user defined chain and how the chain of this rule, how the chain food chain with the name C looks like. But so essentially we get what is in the d
ust protect chain for the sorry for the example from the beginning. And here we say, well it looks the following first years Rs1 then there's no return and then there is ours too, which means well this user defined chain can look as follows. First, there is an arbitrary part called Rs1. An arbitrary amount of goods could, for example, be the first ICMP rules we saw in the chain. Then there is a rule which has an action of return and then there can be more arbitrary rules called RS2 can be an arbitrary long list of other rules. OK, then we have the next precondition, it says we can process this first part, ours one without getting a decision. So so far we have called to a user defined chain, have processed our first part of this chain without getting any decision yet. Then there's the next assumption. We have a matching return. So what does happen there in IP tables? So we are called to a user defined chain. We have processed something, didn't get a result. Then we got a matching return route. So we came back to where we started from without any result. And this is what the rule tells us. Well, I know I'm going over those formalism a bit very fast. I hope a few can follow. First of all, the cool thing about this, this is really a mathematical specification. It's not an implementation. And we can see that. We can specify the behavior, for example, of crawling toward returning from user defined chain without a college stick. So this is really just a specification, not an implementation. It's not executable, but hopefully it tells quite clear what the firewall does. Well, anyway, if I already lost you, I hope you can join now, because there's the question, why are we doing all this formalism? Well, let's get a bit more applied now with every and every slide we have, we will get more applied and we will now use this formalism to actually do something useful with it. Now we just have specified the filtering behavior of a firewall. OK, let's specify more, let let's read th
is formula down your left hand side here of behavior of a firewall, then we have if and only if. And on the right hand side, again, behavior for five. So the left hand side and right hand side are equal. OK, where's the difference? The difference here is this f f is a function which takes a router and transforms it to another ruleset. So what could this function f possibly do? Well, we said the behavior of those fireboats are equal, so it means that this function F takes a ruleset, transforms it to a different ruleset. But it didn't change the behavior of the firewall. So for example, F could be a function that improves the performance of your roots. It and you could safely run this function on your ruleset because. Well, they are the formula says F doesn't change the behavior of your firewall. You can safely deploy to production. And we have implemented several such F functions. A very simple example. We can remove all the logging rules or semantics only cares about packet filtering. Logging doesn't influence that in IP tables. So if we remove all the logging rules, the behavior of the firewall stays the same. OK, simple example. You also can do more. We can unfold all calls to and returns from user defined chains. So when after we have run this function well, the firewall will be a long list of rules which will already process sequentially. There's no more jumping around, which is a cruel thing. Unfortunately, the match conditions will get quite complicated for the toolset. But again, we can normalize these match conditions to simpler ones. So in the end we will have a firewall, which is just a long list of simple rules. There is no jumping around and the actions are either accept or drop in the end. So we can essentially simplify our firewall and to formulas as this simplification doesn't change the packet filtering behavior at all. So we can safely run this. OK, let's look at more things we implemented on semantics in ternary logic, in Boolean logic, we either h
ave true or false internal logic. We have true, false and unknown. Well, what can we do with it? Well, cool stuff. Let's first read this from the let's see what we have there in the middle. In the middle. We have a set we have a set of pickets and the conditions for this picket as they are accepted by the fire where you start an undecided state and in the end, you accepting it. It's all there in the middle. It says we have to set off all pickets accepted by the firewall. Well, we can specify that a theoretical computer scientist, we believe in specification. We have specified a set of all pickets accepted by the firewall. Quite cool. Well. I guess as Hecker's, we don't believe in just such a set specification, which is standing just out there in the void, having no connection to reality, you can't even execute. That's it. No, that's take us. We also believe in running code. And this is what we have there in the states above and below. Above, we have an approximation, which is essentially a stricter version of the firewall. And that's important that we embedded the whole thing in ternary logic because it's quite impossible to implement all the matching features you have in IP tables in an analysis tool. And I'm not aware of any tool that implements to full set because the net filter team also adds new features with many releases. So here we have an approximation embedded in ternary logic and well, we have proven that this is a solid approximation, which makes the firewall stricter. So basically this firewall may accept less packets than the original firewall. And on the set, on the bottom, we get an approximation, which essentially makes the firewall more permissive. This firewall may accept more packets than original firewall. And the important thing here is, well, those are executable and well. The reality or the specification, what we want is in the middle and we have executable code around the reality, so to say, and while we will use these things all in the back
ground now, because now we can execute the whole thing and we can safely approximate something if we run into some features we don't understand because we have embedded internal logic. So essentially we have no sound approximation where we can say, well, there's a match expression in somewhere in the firewall. We do not understand the result. There is unknown and the whole thing will be safely approximated according to what you want to. You want to be stricter or more permissive. So I said, I will explain how we get spoofing protection in the end. Let's look at spoofing protection. Let's first specify what spoofing protection is. Let's say we have an IP assignment. Zero belongs to the IP range, 190 to 168 zero slash twenty four. So what does spoofing protection mean? It means for any IP packet that enters zero, the source IP must be in this 190 to 160 age IP range. So let's specify that again, we are looking at a set and we have seen that before. We require that in the set. These are the packets accepted by the firewall. There's an additional constraint about the packet. The packet should come from the interface zero. And just like before, we are looking at the packet but not the complete package, we only look at the source IP. So this big set expression on your left is essentially the set of all packets accepted by the firewall coming from zero, but only the source IP. Well, and what should the set fulfill that the whole firewall implement spoofing protection. Well all this IP should be in the range of that interface. I guess we can all agree that they specify spoofing protection for that particular example. Let's generalize that. Let's say we require this for all the interfaces we have in our system and now we have specified spoofing protection. OK, again, we have specified it. I guess we all agree that is the spoofing protection. What can we do with the specification? Well, we wrote an algorithm, an algorithm. You only give this IP assignment as above and to rule
set. And if the algorithm returns true, you definitely have spoofing protection in your firewall. And let me point out the cool thing about it. There you see this arbitrary Gummo, arbitrary in mathematics always means, well, it's all quantified. What does this all quantified gamma mean? Gamma was the set of all matching features our firewall model supports here. It's arbitrary. So our algorithm can tell if you are spoofing protection for any magic IP tables match condition you may have in your firewall. Even if the net filter team decides to implement a new cool matching feature in the future. Maybe they don't know yet that they will implement that feature here. The specifications as well. This algorithm will correctly understand it and check if you have spoofing protection anyway, even if there is some unknown cool future feature in it. And I think that's a pretty, pretty cool specification there that we have about the algorithm. So it essentially says we understand all possible match conditions you can ever use in your firewall. Well, so what about the specification? Well, you know, with normal specifications, well, they are quite imprecise and not really tell you what the algorithm does. This is a mathematical specification. It's concise, it's really precise. And it tells you exactly what the algorithm gives to you. What else about your specification? Well, very often they lie to you. They have some implicit assumptions. They don't tell you you have to call in a special way, but they didn't document it. Well, not in this case. We have a formal proof that our algorithm really fulfills this condition. So there is no implicit assumption somewhere this specification will never night will never lie to you and all the other for some other documentations. You always know that the documentation is usually quite outdated, especially for a particular project. Well, here we have a machine verifiable proof. You can run that proof on your computer. You don't have to reconstru
ct it by hand. You can give the proof to your computer and tell your computer to reject the proof at any time so you can check it all the time if this specification is still what the algorithm provides. So this specification will never be outdated. And I claim, well, this is probably the best way to document your code to prove it correctly. It will never be outdated. It's really precise what it does. It doesn't have any hidden assumptions and it will definitely not lie to you. So I said we have executable code. Well, if this looked like a lot of them just exited with executable code look like, well, we wrote the whole thing and approved it and once we got executable code, we can export it to different languages. Here we use Haskell and you can really run that algorithm on your machine. You don't need Izabel or anything to improve Aslambek. And it's a standalone program here. You can see how you can call it. You just feed your IP table safe timba into the tool. Daniels applied IP assignment, as we saw on the slide before, or probably generated by Conficker or IP address and the toolbar just automatically run. It doesn't need any additional input. You don't have to prove anything by hand. It will just check if you have spoofing protection on your firewall. So there we have it. We have a verified tool to verify that you are spoofing protection on your firewall. And because I'm running out of time, I just want to show you a short outlook what else we have in all verified toolset there? Well, probably you've seen on the slide before. How big was the firewall? Four thousand eight hundred was quite large. I guess this firewall was that now hit the 5000 words. And we had the question. Well, by the way, who is allowed to set up SSL connections with whom in a 5000 words firewall? We said, well, there is the answer. All of these ranges, you see their internal servers, ayna, they essentially correspond to the complete IP for address range. I just gave it symbolic names here bec
ause, well, those ranges are really split up, maybe noncontiguous because there are so many exceptions in there, but they are all clustered together. And we have to name a few things about this graph. It's the complete IP before Edwards range. All these IP addresses that are true together, for example, in the service or internal group really have the same access. Right. And this is the best possible solution you can get for this question, because you cannot compress this any further. You can't get a clearer or better answer to that. Well, you may ask, why do we have to on the left, iiNet and iiNet Prime? Well, because of a typo in the firewall, which is now fixed. Well, it's pretty clear in the graph that you don't want to Internets there and you can really see the box there. And what was really interesting essentially for us, we can we looked at a long list of servers there and Internet and really verified that all these IPS for the servers are the machines which should be accessible by the internal ones are the ones which should not be accessible by association. And one of our 5000 rules, as you can see, that there are more, if not more funny exceptions, which are funny access rights, which you probably also should look at. OK, that's it. Thank you for your attention. Well, it would be really, really great if you have some firewall rules that's lying around and would want to donate them for research, for research would be really, really happy. And of course, everything we are doing here is not only fully verified, but also completely open source. You can get everything. Thank you for your attention. I guess we have a few minutes left for questions. So, yeah, just line up in front of the microphones right here and here. I think there is one. There you go. So have you done any work or are you aware of any work that involves studying final sets of multiple cooperating firewalls, short answer possible with the law at the moment? No, but there is other work and researc
h which already claims to have solved this problem, and they did. But they have a very limited firewall model. What we can do with our tool, you can give them you can give us an IP Tabors safe done with all the complex features you have in the firewall. We can simplify it either to an overall approximation, which will create a much more simplify it and simplify what rules it is, then actually understood by some tools which have been developed in academia. OK, thank you. We can preprocessed, but we don't have yet incorporated further analysis in a fully verified manner. I thank you for the talk is very, very well done, and I would like to get more involved in formal methods and proof theories and stuff like that, but I have trouble knowing where to start. I was wondering if you had some insight into what resources would be good to get started with programing and Isabella Colker even address things like that. So first you should look at theer improver at your choice. I prefer Isabella. If you're more from the programing community, you will probably enjoy COK more. It depends a lot on personal choice and everything. If you're a student at UT Munich, you should definitely go to Tobias Gnip course great courses because there we have the Isabella Group directly. And if you want to get started with Isabella, there's the book Concrete Semantics by Tobias Nick and Kevin Kline. And I hope I didn't miss another also, which is, I guess, freely available. You can also buy it. And all the examples in the book should be included in the default is about distribution. So that's the easiest way to get started. Download the complete Ysabel package. Look at the book it runs. Usually on your system, you don't have to compile any additional packets and get started with the book. OK, thank you very much. We're running out of time, but hey, we're running out of time. So there is time for one last question from the signal angel. Yes. On the IRS here, they would like to know was the time, co
mplexity of the algorithm, the runtime complexity and, well, bad news. This preprocessing can blow up to exponentially many rules in theory. In practice, we looked at a lot of firewalls to. Well, and this exponential complexity is sort of exponential in how fucked up you managed to call your user defined chains. So usually you do that by hand or see me automated so it doesn't blow up that much. The worse things to tell you, real numbers I saw as it was about a 500 word firewall based on a Schwarzwald setup, which was really, really quite well. It triggered that a lot. And also when we revised IP address ranges to non-negative zero range, which can also blow up a lot. The worst block we ever saw was from 5000 to 20000 words, which are still very good lendable by our computer, because afterwards, basically all our reasons are mostly polynomial time, if not even linear. To tell you real numbers, this took about one minute of prepossessing for 5000 words and then about one second per interface. And this took less than two minutes. Or the four or five thousand girls, so you can really run those things on your computer. Cool, thanks.