/299$31c3-talk-6499

Hallo Du!
Bevor du loslegst den Talk zu transkribieren, sieh dir bitte noch einmal unseren Style Guide an: https://wiki.c3subtitles.de/de:styleguide. Solltest du Fragen haben, dann kannst du uns gerne direkt fragen oder unter https://webirc.hackint.org/#irc://hackint.org/#subtitles oder https://rocket.events.ccc.de/channel/subtitles erreichen.
Bitte vergiss nicht deinen Fortschritt im Fortschrittsbalken auf der Seite des Talks einzutragen.
Vielen Dank für dein Engagement!

Hey you!
Prior to transcribing, please look at your style guide: https://wiki.c3subtitles.de/en:styleguide. If you have some questions you can either ask us personally or write us at https://webirc.hackint.org/#irc://hackint.org/#subtitles or https://rocket.events.ccc.de/channel/subtitles .
Please don't forget to mark your progress in the progress bar at the talk's website.
Thank you very much for your commitment!

======================================================================

Because in the keynote, it was said that we should go to talks where we have no clue what the title or the description of the talks means. So I thought I can use that as an excuse to do the Herald job for this, sir. And this guy will talk about reverse engineering of chips of ICS integrated circuits in nondestructive but rather complex way of doing electrical stuff that's not in the data sheets rather randomly at times. The priest that should give us this introduction into this type of voodoo magic is let's be excited about Exide. All right, so thanks, everyone, for coming out here. So the title of this talk is GLITCHING for Newby's A Journey to Coax Out Chip's Inner Secrets. So basically, this is kind of over the last couple of years, I've got interested in the topic of glitching and have been trying a whole bunch of different experiments and trying to learn for myself what it was all about. So this will kind of be a chronological kind of summary of what I've been up to in the last couple of years and what my findings have have been. So just the quick agenda for the talk, quick intro background, which is kind of the classroom learning about what glitching is platforms, which is some of the various hardware platforms I've I've come up with in the last couple of years. Some of them were epic failures. Some of them were actually actually seemed to work. So it'll be an explanation of the pros and cons, for example, will be a real world example of a secure microcontroller where I was able to basically get some glitching results out of and and maybe some food for thought, some some thoughts that you guys could carry forward and how you could approach some of the some of your own chips. And then finally, any Q&A section. So intro about me. I'm an IT monkey or a consultant by day and I consider myself a hardware hacker by night. So some of my interests are designing and reversing embedded systems. I see security and failure analysis, arcade platforms and automotive stuff.
Anything electrical or mechanical or whatever is pretty cool to me. And my contact info you can see there is just my Exide three one three three seven at Yahoo! Dotcom email. So let's go into the background section, the classroom section, so what is glitching so a glitch and this is not necessarily electrical. Right now, the definition would be a transient, which can induce alteration in a device operation. So a glitch is something that can mess up a device's normal operation. For this talk, we'll talk about electrical glitches specifically and specifically Klok glitching and voltage or power glitching. And there are other our other variants like laser thermal, radioactive. But I'm not enough of an expert in those topics to to give them a good speech. So if we focus on the right hand side, they're on noninvasive, semi invasive and invasive types. So electrical glitching would be considered a form of noninvasive attack on a device. So this doesn't permanently alter the devices package, the physical epoxy block part of the chip. It doesn't permanently alter operation of the device. So when you remove the glitching stimulus or you stop glitching, it should work normally again and it's repeatable, which means you can you can start glitching, stop, go away for a little while, come back and do it again. And it's not going to harm the device and you can keep repeating it. It's also surreptitious, which means there's no miling or drilling or etching or things of that nature. So it shouldn't look like you actually did anything to the chip physically. It should just look like normal. And another characteristic that's fairly important is that noninvasive attacks are usually cheap. So you don't need an expensive lab and you usually don't need things like specialized microscopes or other expensive tools. And the kind of drawback to the noninvasive attack is that any background details you have beforehand are very helpful because they'll help to narrow the scope and what strategy
you want to do when you're trying to glitch rather than a completely black box device where you have no idea to where to start. You could take many wrong turn. So any information you have beforehand would be quite helpful. So some examples of noninvasive attacks in the umbrella, there's three one umbrella, so there'd be fault injection, which would include Klok leeching voltage glitching. You can do thermal glitching, which is kind of where you're trying to affect the junction temperature of transistors. So but really, from a noninvasive standpoint, you're either trying to heat up an individual pin or try and heat up the whole chip package all at once. And it's not really precise. So I'm not sure if there's if a lot of beneficial effects could come from the thermal side. There's also radiation radioactive glitching. So if you just happen to have a source of x rays, gamma rays, alpha particles are neutrons walking around in your pocket, you may be able to sit those nearby the chip and get it to flip bits of memory or cause the CPU's instruction to cause the CPU to latch or invalid instruction or something like that. So the next umbrella is kind of side channels. So that's where there's power analysis where you're basically studying the current consumption or power consumption. The chip, which can leak operations being performed, can reveal things like Krypto round keys or kind of intermediate keys that could be used to derive like a full break on the encryption. And it can also indicate where the CPU CPU is, provide an indicator where the CPU is in its instruction, in its instruction execution of the overall program. So there's timing attacks, which is simply trying to exploit the fact that conditional branches, when you're checking for a password or something else, you might stop when you find the first incorrect character and it'll stop a lot faster than if it went through all the correct characters. So you'd be able to exploit the difference in timing to know if
your guess at a secret password is correct or not. Data Revenants. That's pretty much kind of like your cold boot type attacks. Or if you do a reset or power up the device and it doesn't wipe its memory, then there might be secrets still in memory. And then finally, the third umbrella is software. So this could be simple code vulnerabilities. The authors of the secure device may not actually be that versed in secure coding practices. So there may be just vulnerabilities sitting around like buffer overflow, stack overflow and things like that brute forcing. So you this you could try you could simply try brute forcing. If the key strength is is small enough, the the secret that gets gains you access to, to restricted memory areas, code protection. You might try brute forcing a crypto key, but if it's a relatively modern implementation, it's probably not going to work for you. And then finally, the back doors, which could be undocumented instructions in the CPU core, could be debug interfaces geotag. You are hanging off the device somewhere. I scored CSPI, stuff like that. So those are can be some of the more low hanging fruit, but may or may not be present. So the second major class of attack is semi invasive, so this is where you are altering the package of the device, so you might decapitate, so you might etch away the epoxy packaging of the chip or you might miss the chip from the top of the bottom to to gain a better access of the actual dye sitting inside the chip package. It doesn't permanently alter the device operations. So, again, you'll be able to to apply or remove some sort of glitching stimulus to the chip. And when you're done glitching, it should it should operate normally. Again, it's repeatable unless you're doing laser laser glitching where you leave the laser on too long and you burn up something that you didn't want to, it's more expensive. So now you you may need things like lasers, microscopes, chemicals, a M. And this may be this class of attack
maybe beyond a single person's budget. So it depends how well-funded you are not. And then this kind of attack can provide background details rather than require them. So you'll be able to to help narrow the scope and strategy potentially for your noninvasive glitching attack and get a basic floor plan of the chip, for example, if you've got an optical microscope or something like that. So some semi invasive examples glitching, you can still glitch semi invasively. So now you've got access to the chips surface in some way. So you can use things like laser flash, like a camera flash, high intensity light and thermal glitching where now you might be able to direct a source of heat at a more precise area, but still going to be probably pretty could be fairly unreliable all year round. You'll end up altering bits, order or transistor gates in a larger area. So another type of example is laser scanning. So you can do it with the device being unpowered or powered. And so when it's unpowered, you basically have an optical beam inducing a current flow in the chip, which will change the current signature, the like the power consumption signature. And then if the device is powered on your optical beam can cause a measurable voltage change in the in the output of the transistor or the bus that the transistor is connected to. So it may be possible to do things like read out memory bit at a time by watching the current consumption and then sweeping the the beam across the different rows or columns of a memory, for example. And then finally, there is the you can do imaging attacks were either due from the front of the chip or the back of the chip where you mulloway the back material. You can do visible wavelengths versus infrared and you can do things like using optical microscopes versus electron or iron beam based workstations. And this will allow you to get the floor plan of the structures and features of the chip a lot more precisely. So things like rom ram, flash E squared
configuration, security, fuzes, things like that. So now the the highest notch, the most complicated type of attack is the invasive attack. So this is where you not only have the copulation and miling of the semi invasive, but now you also have DIW alteration itself. So the actual little the little chip part of the of the microchip. And you can render the device nonfunctional with this process. For example, if you're trying to image the device layer by layer, obviously you're removing your etching away material. So the device, once that layer's gone, it's gone for good. So you'll want to have many samples available so that you can image the device like that. However, if you if you don't want to do that, but you want to have access to the to the service of the chip, for example, and depending if you're if you've got access to an FRB workstation where the device input pins like voltage, ground clock, et cetera, are outside the vacuum chamber or outside the chuck, you can actually power the device up and run it while you're making modifications to it. So like I said, this these most these techniques are one time, especially the delayering process. Where is the FIB workstation can allow you to create edits, undo edits, and so you can go back and forth. So this class of attack is can be very costly. So whereas the the decapitation and the readouts of the the imaging of the chip can be somewhat reasonable, the actual being able to edit the chip can be very prohibitive, depending on if you have access to the equipment or kind of an hourly rate to get on the equipment. And then finally, this type of this class of attack will pretty much provide you with complete background details so you can use all the floor plan data. You can actually force certain transistors or busses on circuit nets on and off and actually get a good idea of how the device operates. And then you can feed this information back into the semi invasive and noninvasive attacks to make them a lot easier beca
use you know where on the chip to target. So as I mentioned previously, so some examples of invasive are decapitation. So taking the chip out of the package, delayering the actual dye, you could do a memory readout, which if the circuit has ROM, for example, you'd need to get through all the layers, all the metal layers down to the very first metal layer. And then that's where the actual Romme transistors are formed. You can do circuit edit. So etching where you're removing material from the from the dye in certain areas, deposition where you're using something like platinum or tungsten to to deposit conductive material on the surface. So you actually create a conductive path wire bonding where you're actually taking a wire bonding machine and putting gold bonding wires from the dye of the chip or from areas of the chip out to a larger, more human friendly package, like a very large dip or something, a dip package where it's two rows of, you know, 10 or 20 pins or whatever. And then you can you can easily work with that. And then you could also purposely destroy traces or transistors at this point if they're causing some sort of functionality you don't want. And finally, you can do microprobe probings. So when you've got really, really tiny, for example, tungsten needles, you can actually stick them down on the surface of the chip and either listen to what's going on or if that or drive drive a signal back into the core of the chip somewhere. So that kind of concludes the different classes of attacks and kind of from the cheapest, the most expensive. So back to electrical glitch. So how do you actually where do you get started? How do you how do you do how do you generate glitches? So when you're when you're making these glitch pulses that you're sending into the chip either through either on the clock lines or the power lines of the chip, here's four methods that I basically came up with. And I'm you guys are some of you guys are probably really smart and can think
of other ideas. But these these are the ones that I could think of a simple clock divider phase locked loop. If your device in this example had an FPGA with a PLL, use that Pawley pulse with multiple modulation where you've got multiple ummed signals for their part from each other and then poly phase where you've got three signals that differ from each other in their phase. So the first. Divider example, this one's the simplest, where you literally take as many flip flops as you want, and every time you go through a through D flip flop, you basically divide the original input signal by two. When you when you when you feed the output of the flip flop back to the input. So you go from forty eight divided by two down to twenty four, divide by two again down to 12. And so now what you do is you have this multiplexed where you feed it, the slow 12 megahertz signal and the original system clock. Forty eight megahertz signal for example, and you run the device through most of its lifetime on the slow signal and then you toggle the glitch select line down here. At the moment you want a glitch and then now all of a sudden you'll get some forty eight megahertz pulse pulses, pulse train instead of the slower 12 megahertz in this case. And you can use this directly as the clock signal to the input of the device, or you can use it to get the switching of the voltage from a high value to a value that's known to cause the device issues so it gets flexible and can be used either way. So this is what the waveform would kind of look like. Here's let's say here's a single speed clock and here's a double speed clock. And then when you bring your select line high on the which is a select line on this multiplex down here, then all of a sudden you will switch the actual waveform that goes to the chip from the slow to the fast. So it simply just switches between slow and fast. So the second method is the phase locked loop PLL, so the the the PLL uses integer multipliers and dividers to cr
eate a fraction of an integer fraction, something over something that can be used to multiply up and then divide down to get you many more different kinds of clock speeds than simply dividing by two each time. So then this way you'll get for example, have the feed the PLL with the normal fast clock and then instead of 24 and 12, you also get 16 and four or whatever combination of speeds you want in between. Combine all those with the with the fast system clock, add a couple more select lines and now you've just given yourself more choices in terms of what speeds you want to play around with. So if you do the if you do this work up front, then you don't have to have to keep changing your circuitry down the road. It's more flexible. So the third method is poorly. Um, so this is where you use multiple pulseless modulation blocks to generate clock signals was successively longer and longer duty cycles. So now in this case, instead of change in the frequency, we just keep our system clock at 12, 12, 12, 12 all the way through these blocks. But now we have a 50 percent duty cycle, which means 50 percent duty cycle means half half of the waveform. There's equal parts on and off in the cycle of a waveform. So you'll see in the picture 70 percent means that the waveform is on 70 percent of the time, off 30 percent of the time. So the remainder between seventy and one hundred percent. And then the third one is eighty five percent. And then what you do is you feed this into XOR gate and then the output of that couple it with one more exer. So it's basically like you're, you're exploring the two signals and then exploring in the third signal with it and then that'll get you a glitch pulse. And then again you just take your select line to go between the original clock and then the the shorter pulse. And here's kind of how how the how the pulse, the short pulse gets generated. So again, the frequency is the same. The phase is fixed. So these things are locked. If you look here ri
ght in the middle, those lines all start at the same time. So the phase is they're all locked with each other. However, when you when you change the duty cycle, you'll see that the 60 percent wave is on a little bit longer than the 50 percent and the 70 percents longer than both of them. And it kind of it's like a staircase effect. And so what happens is when you run these through the those two XOR gates, the difference between the. The first and the first and the second pulse gives you when you want the pulse to start. So that's right. This this left side of it right here is when you want it to start in horizontal relation to the end part of the pulse and then this third duty cycle with the difference between the third and the second. So the 70 percent in the 60 percent gives you how long you want the actual pulse to last for how long you want it on for. So this is actually a pretty flexible method and you don't need PLL hardware in your device if you want to be able to generate these waveforms. And so basically you get one pulse of it, whereas the kind of the fourth Messitte method that I can think of was poorly fais so multiphase. And this is where you generate multiple waveforms, but each one is phase shifted from the previous waveform by some so many of degrees. So the frequency again, the frequency is the same 12 megahertz, 12, 12, 12. So it's all 12. But now you're shifting the actual relation of the second and third waves to the first wave and again, selecting when you want normal normal clock versus glitch clock. So now the only real difference is now the waves, the on time duration, for example, right here is the same in all three waves. So it's not different like last time, but the waves are offset. They're beginning when they start is is further and further ahead from each other. And effectively what it does is it gives you a glitch pulse on the leading the beginning edge and the trailing the the end edge of the waveform. So you get twice the many pulses
as you did with the polyp. So do you need it or not? It all depends on your application. It may help you to be able to generate them more quickly or more often, but it's just another way to to do that. So a quick aside, like I'm using ultra FPGA so you don't have to worry about reading all this big paragraph. This was just an excerpt out of Ultra's manual on, for example. And Xilinx will be similar if the FPGA has a hardware PLL how you're able to. These are the steps. How to instruct the Paltalk should create phase shifts from its different PLL outputs, which is basically how I took those different phase shifted outputs and created the these waveforms. So then it's got this specific timing diagram where you're supposed to give it these, whether you want to step at one or more degrees, one that you want to step forwards or backwards and phase done is just what the module outputs back to you when it's done shifting. So this was looking kind of complicated to get all these timings right, because the FPGA was using had a soft CPU in it. So I ended up having to make a state machine because the my soft CPU was so slow in relation to the peoples ability to shift its phase that I would say go shift by what I think is one degree and it would come back like seven or eight degrees of shift, which to be to be scientific about it. I want to go one degree at a time so I could see the effects with each degree of shift. So I just made a simple state machine that just literally when the CPU says, I want you to shift one degree, it goes off programs, the PLL and then it exits the state machine and waits. So and the CPU will still have said, I want you to to shift one degree and it'll be stuck in the start equals one position. Then finally, like many, many clock cycles later when the CPU, because it's much slower than this PLL when it finally responds, then you can tell it to put the start bit to zero and then it'll, it'll bring you back to the initial. So this way allows you to shi
ft one degree at a time so it'll get trapped in this loop until you're to your CPU is actually able to shift. So back to the corner of the classroom. So what is glitching actually doing? So basically it's a momentary burst in frequency, as you could see, with those little pulses compared to the normal clock pulse. Also, you had a quick one, usually greater than the max frequency of the device. So simple cases. If you got a datasheet, look it up, see what the device is rated to run at and go even faster or many multiples faster. The glitching is timing critical. So the value of the program where the program is and its overall execution, where the CPU is and its overall execution of its program, you need to know if at the specific point you think it's going to be doing a compare. You know, it's doing a compare. You have you want it to land there and then now you know that it's doing a compare where in the actual compare instruction, do you want the glitch to hit? So that's your offset of the glitch within a single instruction. And then finally, how long do you want that pulse to last, which was kind of that third duty cycle or that third phase shift wave in those diagrams, determined how long the actual pulse lasted for? So basically what this does is it causes registers inside the device or flip flops to latch invalid data because signals are still propagating through combinatorial logic through the device when you suddenly clock it. And so basically the destination flip flop. So from from the source to the destination, as the signals propagating, you clock it ahead of schedule. So the device will basically latch invalid data because the correct signal hasn't propagated its way towards the destination flip flop yet. So what the SO will actually be happening is you'll either get instruction instructions in the CPU korrell, either be duplicated or mutated so and so what would happen with a duplication is, let's say, the real program, how to compare and then followed by
a jump. So it's checking some condition in an if statement and then jumping. What you'll actually get is the COMPAR will become compare, compare, so that jump will actually go away. And that's usually caused by a fault in the fetch stage of of instruction processing. So you've got usually fetch, decode, execute memory operations and register right back, or typically you're four or five stages of your risk CPU, for example. So it'll mess up the very first stage. The next is mutation. And this is where you actually turn like a jump instruction into an ad, which is probably harmless in this case. It gets you what you want. It bypasses an error check, for example, but just turns it into an ad. And usually in the fetch decode execution step, it's the actual execution stage that gets messed up when instruction is mutated. So the actual core is about to execute an instruction and then it gets mutated into something else. So this is kind of the hardware equivalent of patching a software binary where you go into your hex editor at its edit instruction to become something harmless. And so kind of technically the instruction is not actually skipped. So the program counter, the instruction pointer on the CPU doesn't just skip ahead to two memory locations. The next instruction, it's still executed. It's just it's either going to do it's going to become a duplication or or a mutation. So it'll feel like it's being skipped, though, in those cases. So sometimes this quick burst of clock frequency can affect your config or security fuzes. So they'll either fail to set in some in some cases or they're set incorrectly. So this can this could actually be kind of helpful to wipe out some certain code, protect fuzes or things like that. But it's a lot more particular in how it works depending on the device. So here's kind of an overview of that phenomenon where you have the source flip flop and the destination flip flop. Then you've got a bunch of material like individual and or etc. g
ates through the middle. And what you're doing is you have. You have a clock event, so now you also know you have your glitch pulse, and this pulse occurs down here well before it was expected over on the right hand side, which coincides with where the actual destination flip flop is in time and propagation distance. So you clock it way ahead of schedule and now it'll clock in some garbage data here rather than the proper signal making its way all the way through. So that was clock ticking, so what are the what are the what's the mechanism of voltage glitching? So this is a momentary reduction in supply voltage to the device. So what you do is you you drop the the voltage to or below the transistors switching threshold. And a rule of thumb is try supply voltage divided by two and and start from there. It could be it could be lower. It could be higher. But it's a good starting point. So what this does is this increases the propagation delay, which is literally the delay this of the signal propagating through the device. So it gives you kind of the same end effect. And why that happens is because you when you decrease the supply voltage, it decreases the drive strength of the transistors and this lower drive strength will cause slower rise time. So you'll actually, instead of a sharp edge rate, a sharp transition of of signal transitioning, you'll actually get a slow it'll take a long time to plateau and a long time to discharge, basically. So that gives you that buys you that gives you that effect where it's slows that propagation of the signal down. And again, just like clock clicking, you want to be accurate to where the instruction is in the overall program, where inside the particular instruction you want the offset of the glitch and then how long you want it to how long you want it to be active for. So also. So what it's doing is it's also altering the values at the memory sensor amplifiers for for for flash E squared, RAM, et cetera. And so this has the effect
of corrupting, corrupting data latched onto the address or data. So you can actually have the program swing off wildly to an invalid location because you you lock a bad value onto the address. Most cases it'll crash the chip, but in some cases it might jump you into an area that the program was never supposed to reach. So and again, Security Fuze logic in the voltage glitching mode can also latch KRUP values due to that effect where you're right at the switching threshold of the transistors. So just to dispel a few misconceptions. I don't recommend throwing random volt voltage sags and sags and surges at the AC and seeing what happens. I would recommend respecting the absolute maximum VXI and VXI for the Eyeopener ratings on the data sheet if you have one. Otherwise you can have lock-Up occur, which is basically kind of a short between the power rails of a device or two to two pins of an icy. And this can cause the device to overheat or basically destruct due to overcount. So you want to avoid Lache up some 74 series logic. You can you can give it very high and low voltage swings on the input pins, but usually they have a current limited condition in the data sheet, like specific Fairchild chips, for example. But not National Semiconductor will say you can do this, and that's because you have to put a giant current limiting resistor in front of the pin, basically. So. Don't don't throw these crazy high or low voltages out a chip unless you've got a bunch of them, basically. And you're not randomly jarring the clock frequency, to what extent you're you're specifically targeting that pulse at a certain point and you're not technically skipping instructions, you're as I said, you're kind of duplicating or mutating them. Again, it's timing critical. And finally, if the chip unless the chip is stuck in a loop, just randomly glitching like like with with random voltages or the clock at certain offsets randomly is not is usually going to be counterproductive unless the dev
ice is stuck in a loop, a tight loop with a few instructions. Then obviously the search space that your glitch has to hit is very constrained. It's very small and it's more likely you can pop out of the loop. So what are some of the outcomes in general that voltage or clock glitching can or potentially any glitch and can can provide for you so you can make the CPU replace impeding instructions. So you turn that jump into a compare compare, which doesn't jump anymore. You can truncate cryptographic operations or keys, so reduce the number of rounds in a crypto encryption or decryption process. You can do linear code extraction where you basically dump you. You walk the address space of the device, address location, one, two, three, four, all the way until the memory map loops, dumping out the data from the device bite by bite. However, you do usually need an IO channel to actually get the data out. So you are PIN or something of that nature. You can do things like bypass bootloader enforced checks so you can stop the memory management unit or page tables from initializing if they're mapping in sections of memory overtop of the bootloader to hide it or conceal it or just to save to provide more space, you can stop that from happening. In some cases, you can prevent lock out counters from rolling. So if it's a secure crypto memory or something like that where you only have so many tries before your lock, though, the device, if you glitch when it's recording or decrement in your try counter that the number of tries will never change. So you can just keep doing your malicious activity over and over again without the device finally reaching zero and then erasing itself or halting or something of that nature. And then finally, in some cases, you can trace security forces are lock bits. So what this will do is keep the Flash and R-squared intact. So then you can just take the device off the board, for example, plug it into a parallel, parallel or similar program and just re
ad out the device that way. So if you're looking at chips to try some of this stuff on, there's pretty much your general purpose and your security enhanced categories. So things like general purpose, things like CPU's, microcontrollers, memories, digital signal processors. Then on the security and hand side, you've got things like SIM card, smart meters, military devices, chip and PIN, pay-TV transit transmitter, metro and then automotive devices. However, the security and hand side is I'm not saying like stuff's going to work necessarily there. It depends on the age of the device and how smart the developers were when they made it. A lot of these security enhanced devices are actually really, really good. So things I don't recommend is trying to trying to attack like FPGA is or ASICs simply because there's so many unknown variables. And unless, you know, there's a certain CPU core inside that async or that they've programed a certain block of logic in the FPGA, it's you'd be fishing around in the dark, basically. So what are some countermeasures that, for example, a manufacturer or if you're writing code or embedded embedded code for a device, what could you do as a countermeasure? You could use a CPU which halts or traps on invalid instructions. However, in the case of the instruction mutation, where your jump became an ad, the ad instruction is still a valid instruction in the in the table of instructions in that device. So that may not even trigger in an invalid instruction fault. So better than nothing, though you could erase volatile memory on start up or reset just no matter what device comes up. Just a good best practice to wipe the wipe the memory. So what you want to do is minimize the number of copies of important secrets are primitives. So like for RSA, P and Q or any combination of those intermediate values that could drive back to your private key, keep as few of those as possible and obviously wipe them between iterations of a routine in certain parts
of the program. If you don't need them, if you don't need them, get rid of them. So clocking when you're clocking the device, you could run, you could use a device that runs off an internal oscillator. So it just ignores the clock pin from the outside world. So that pretty much would cut off any clock launching attacks. You could use asynchronous logic wherever you could. So if something didn't need to be clocked by a clock signal, then don't. And finally, you could use a periodic or random clock period generation, which is where the actual clock period is changing between cycles. So it's it's unpredictable in terms of the timing. Finally, you could use obscurity, which is another kind of last layer of defense. It's not a. Not a prime defense, but use a really complicated 48 bit, very long instruction word DSP corps with poor documentation in your product, it'd probably make it harder for you to write if you're the developer as well. So it's not that great of a countermeasure. Finally. So supply voltage use, glitch or brownout detection. And this can be very complicated with fans fast transition detection that actually detects and responds. You could use a simple low pass filter which simply ignores and erases that quick transient as far as the chips concerned so it doesn't even see it. Or you could be more aggressive and reset, halt or wipe the device if you detect someone is trying to mess with your chip this way. So many general-purpose devices have little or no design and protection. So these are chips that you guys could look at in terms of interesting targets. So Abers picks Tempy, for example. They do have they do have code protection. So it may not be your first choice if you're if you're just starting out learning. And then at the at the extreme level, modern spart smart cards, chip cards have extensive protections. So they've got glitch detectors. They've got the random in a periodic internal clock, which is changing and changing in its length between cyc
les. And you'll have to CPU cores in lock step that are sanity checking one another. So if something if some instruction goes wrong in one chord, the other core will be offset and catch it and do something like either reset the device or erase the memory, etc.. So this will detail some of the actual hardware platforms that I made over the last couple of years that that were that I use for voltage and clock glitching. So the summary of this guy is basically this is an off the shelf aero low power reference platform board that I found for really cheap on eBay, like I think it was between 20 and 50 US dollars. So not not crazy expensive. So it has an ultra cyclone, three FPGA, which I put a MIPS 32 bit soft CPU inside of it, that 3:00 a.m. clock generator. So it should be for the three polyphasic the polyphasic clock generator. It's got a regular 60 and 550 UAT. It's got some driver functionality for RAM and flash control and then some output. Multiplex's to switch between your low speed normal signal and your high speed glitch signal. And then this massive breadboard at the bottom is just doing a voltage level shifting and signal conditioning, conditioning and buffering so that the end target in the FPGA, for example, doesn't get blown up by if the target device is running higher voltages than the FPGA. So this is a close up where you've got kind of general purpose. It opens up here. You got three point three volt supply, five volt supply, thirty two USB to you, chip SD card, which I'm not using at the moment, but could probably be used for data logging. CPD, which is just used to program the FPGA from a PC, the actual FPGA intel flash and then some micron dram which is actually dram with rapt with an stream interface. So it's just easier to to work with it. You don't need to have all the crazy DRM timing signals. Exactly right. And then on the Saulius breadboard, there's just some these 74, the LV is important. So one twenty five is just a buffer that takes it just a
buffer, takes an input signal, provides an output and then it's got an beside each pair, those input output pins, there's an output enabled pen lets you float the signal or drive it. And the LV is important because it's five volt tolerant. So you can power the device with three point three but it'll accept five volts as input. So any outputs from it will be three point three. But let's say your device runs at five volts. It can output it signals to the input of this as five and then this will output them at three point three to the FPGA where you won't blow it up, whereas if you drive it the FPGA at five, it won't last very long. So other not just your five and three point three volt power rails. And then just the pull up part I was playing with to strengthen the the drive strength of the one of the signals. So here is another another iteration, Saderat Breadboard, which is kind of beefed up version of this that I thought would work at higher frequencies because it's slaughtered. But clearly you can see from this mess of wires that my routing is amazing. And so I use this board for both Voltage and Klok looking. And it has what I call a ghetto deck, which is basically you provide a source of varying duty cycle signal of how much on time to off time into a low pass filter. And the output out low pass filter will actually be a steady zero to five volt range DC voltage based on the pulse width of the signal coming. And so it's just an easy way for a microcontroller or FPGA to send a signal and you end up getting a varying voltage rather than having a real digital to analog chip that does that in its own way. So for a while, I seriously considered Arduino for about seven minutes, because why not? And the problem with Arduino is the crystal is fixed on board wherever wherever it is. Uh. Can't see it at the moment, but anyways, it's I think it's 16 megahertz. And as soon as you go and use any of the timer output, compare registers to divide that clock down. You can't act
ually you can't take the 16 megahertz and provide it directly at 16 megahertz on an output pin. It automatically goes through a divide by two as soon as you turn any of those timers or compare features on. So already before I even started, all I could do is eight megahertz signals out of this thing. And so if your device is running at two or four megahertz, that might be enough. But if your device is running at thirty two megahertz, then obviously this isn't even going to help you. So it just wasn't flexible enough. So then I thought I'd make an even more feature. Rich Boards and because I'm thrifty, decided to educate myself, which was pretty much an epic failure because what happened was my transparency that you shine the fluorescent light through with your mask of the layout was slightly off the surface of the board, which had the effect of being out of focus. The artwork was out of focus. So basically I had blurred pads and traces that didn't develop properly. So you can see stuff like this probably isn't conducting too much credit through it. Sections right here were wiped out entirely. And you can see the ground plane is starting to get attacked because you leave it in there to try and eat through some areas. Meanwhile, it's starting to eat through areas you want to stay there. Here's another kind of example of that. More failure, so then what I decided to do is break down and go to Ash Park and make a professional PCB because as you can see, it required a few edits after the fact already. And so this board was primarily designed for for voltage glitching. It has a tiny 23 13 CPU that I just had lying around. And why it's not great for Klok leeching is that it's obviously there's a fixed crystal on there. So you only have a certain range of divisions you can do with that. And it also, like the other board uses, the uses the ghetto deck, and this buffer simply strengthens the output drive current so that you can actually power the device up through this buffer
rather than having a weak, you know, five, 10, 20 million signal coming right off of one of these seventy four series logic chips or. Here is another device, my my sniffer board, which is just basically used as a man in the middle, so this would plug into the FPGA and then this HHC one twenty five is just like this seventy four, LV one twenty five. So it's just a five volt tolerant chip that can drive signals at three point three volts to correct the voltage mismatch between the FPGA and the target. And so basically that allows for data logging. And then what you can also do for a cheap and dirty data logging and logic analysis is you can use this logic block called ultra signal top two in the FPGA. And what this is, is a logic and kind of a soft logic analyzer block that can analyze almost any signal that boss, external pins, whatever you want, and you can save more and more samples by using up more logic elements or slices if in Xilinx terminology of the FPGA. So there's plenty of trigger options from simple, low, higher edge triggering, too. You can chain events, do multiple segments of capture. So it's got all sorts of triggering and storing that a full logic and a hardware logic analyzer would have. Then you can export the data in plaintext images or other formats of the plaintext would be a time a comma separated list of the signals over time, like one zero one zero. Then you can pack that back into an actual parse it back into a protocol. So it's equivalent. It's also called Xilinx Chip Scope if you're using if you're using the Xilinx product. So here's a quick summary of what you do. Just you basically pick the clock that you want to clock the logic analyzer at pick which signals of interest you want and what logic levels or triggering you want the the recording to kick in at. And then you get a nice you get a nice waveform view where it shows you what those signals did after the trigger, after and below that before the trigger point if you want. So let's ro
ll into the last section, which is the example of example device I was I was playing with. So I had a victim, I see I knew it was a secure microcontroller, but I wasn't sure what the internal architecture was of the CPU core, I knew that it paired with a partner device. So a reader and then the target chip. So the target, the reader would send data to the chip. The chip would encrypt and decrypt it with a key that was inside of it and then send the data back to the reader, which would go off to the rest of the device. So I was basically starting with a black box and so I wasn't sure what data sheets to look for. Even if the device was not on, the data sheets might not have been public anyways. So what I did was basically start probing the pads of the of the chip of the victim. Chip did an initial sweep of the multimeter. I've got like a little fluke meter. So the little bar graph part of the meter will respond a lot faster than the actual numeric digits. And then I'd come back with an actual oscilloscope for any pads that showed interesting quick moving activity. So one I found one pad appeared to speak slowish serial protocol. So all I did was capture and just transcribe the beginning of that waveform because my scope had a really small amount of memory onboard memory and it was only one pin doing that. So my guess was that it was some sort of have to flex communication going back and forth because I knew what the the victim talked to the reader. So then I used that sniffer board to to basically man in the middle of the conversation. And I use that signal type logic analyzer software in FPGA to export the waveforms, the plaintext pack those individual bits back into bytes, read the byte string. And I found out that I after Googling that, I had a ISO seventy eight sixteen AP du header that I found. So at that point this was good. So then what I was able to do is add a U r to the FPGA and so that sixteen five fifty. So what this does was allow for hardware framing of
the transmission and receive data with the victim. Otherwise you don't need a report, you can do it with Big Bang, but then you have to waste like two or three days potentially to get the timing perfect. So just easier. And then with the Altera you can use this logic block called Geotag. You are to talk to the MIPS thirty two bit soft CPU running in the FPGA and then the CPU can talk to the victim. So then that way you just need one programing cable from the board to a USB port on your computer. You don't need a cable to the victim and a cable to the FPGA. So now that I had that kind of intermediate speaking going on, I had the PC speak ISO seventy eight, sixteen smart card protocol with the victim. And so the seven, eight, 16 header has a length field, so I made a I proposed the theory that the victim is probably comparing the length that you send it in the length field from from from the device or from the reader to the max that'll allow as its buffer input usually like I'm not going to allow you any more than this because I've only set aside sixty four bytes of RAM to store the commands to RAM, for example. And if the length is if the length, my hunch was that if the length was too long then issue an error. So then what I was able to do, the next theory was issued a whole bunch of two long commands to the victim, but otherwise corrected up the checksum to correct it up the checksum so that it was correct and then observed the error response from the from the CPU. And at this point now is when you get ready to glitch. So this is what I call the sucker punch, which is this is a clock glitch where you see a quick pulse in time versus the normal. Like this pulse wouldn't be here normally in the normal speed of the device. So you can do a one two punch, which is simply two pulses, one after another, and you can try any variation of this one, two, three, four, five different periods. So this is clock clicking and glitching during the suspected victims command handler.
So where the where the victim would be accepting commands and checking the length on them, the length of the packet. So what I do is try a different pulse, offsets and durations to try and narrow down when it was when it was executing, for example, to compare instruction that would be checking the length of the packet you're sending it. And so, you know, you've hit a milestone when the victim instead of instead of when you give it these length, these packets that are very long, but with correct checksums and it normally errors out, all of a sudden it doesn't erode. And it actually processes the command, even though it's carloss garbage bytes at the end of it, to make it way longer. So now you know that you've probably hit the COMPAR or the jump instruction with your glitch and you've stopped the the device from issuing an error. So at this point, if if you're already sure, like there's usually Motorola six zero five based cores or until 1851 cores are the majority of of smaller eight or 16 bit embedded devices. So use that as your guess you can. So as I said, you had more and more data to the end of the to the end of the command and then wait till the victim crashes or does something weird. So you might if so, as you're padding more and more data eventually if it crashes. So now you've sent in a too long value, but you make it even more longer. More long. That's not good English, but whatever. So eventually you'll stock the smash or but it could be hard to notice if there's a hardware watchdog that notices that all of a sudden the CPU flew off into nowhere land and then reset it. But basically, that's what I was able to get to the point where I knew where the stock pointer was gone, be on it and over and over wrote the return address. So now that you know, now that you know where the return address is, you can actually start writing programs for this device because now you control the return address so you can write minimal tiny little program that tries to write t
o low address special registers to like Motorola 60 800, for example, port, which is the output pin value pin, which is the input pin DDR, which is your data direction register. Start playing with those and seeing if you see if you can get your eyeopener one of the important pins on your victim to toggle all of a sudden change because now you know where the address now. Now you know which that those pins exist and how they're mapped into memory. So here is your typical layout of the the victims memory space. Yeah, so your next milestone is where you do actually have the output pin chain, one of the pins on the device, either the eye open that you're talking to it on or a different pin that might be bonded into the chip if it changes value. Now, you've confirmed code execution. Your architecture guess is probably pretty good because you wrote a little program in that target architecture of a few bytes to write to that low area. And it's probably von Neumann or modified hardware Halvard that lets you do that. So now you're getting really close to the next thing I did was write more code in that architecture in the six zero five that loads a dummy ASCII byte like five or F or A or some value of words, alternating bits into a register like. Yeah, so a on the sixty five for example, then sweep's jumps into outer space. So what that's doing is I'm searching for the serial transmit routine software, the address and software, because this thing, the victim bit bang the output. So it didn't have a hardware, you thought it had to jump to a software address when it wanted to exabyte back to me. So I just kept sweeping addresses in as far as that return address with the with the smash stack until I got my byte back that I sent in. And now I knew I found that the serial transmit handler and software of the microcontroller. So now all you have to do is make a code loop that starts wherever you want, wherever the current execution is, or maybe it jumps to zero zero zero zero loads
. The data from zero zero zero addresses, zero into a register jumps to the serial transmit routine which will which will echo that data, byte out the serial port, increment the address pointer, and then keep going over and over again, moving to the next memory location. And you have to be prepared to empty the FGS, receive your buffer quickly and regularly, because basically the entire code and data space in this particular chip will be dumped out in an endless loop. It'll just keep mirroring and rapping over the the outer space. And this is kind of what's known as linear code extraction. So the summary, so now that you've got this whole dump of the code and data space, you can try and figure out the memory map. If if you're still not sure of it, analyze the dump for any mirroring of the outer space. So you know where the overall dump starts repeating because it's going to be in an endless loop. So eventually it's going to be a finite bounds of where the memory map is. Try poking values into certain memory locations, see if they change. If they are, you're probably dealing with REM or maybe E squared or flash depending, but usually estcourt and flash of more complicated right routines. And now you're back in familiar territory so you can disassemble that code dump you have or write a disassembly. If if you don't have one on hand, you can search for crypto secrets or keys in that dump, serial numbers, keys, whatever, and you can discover any code vulnerabilities that where it was just pure craftsmanship on the on the creator of the of the code where you can just find Vollans so conclusion's electrical glitch and can be a viable attack vector against a variety of eses except for security, hardened purpose built security. EQs can be cheap to perform. You don't need a big lab or expensive lab. It's usually nondestructive in nature, so it doesn't affect the device. And it's another tool in the in your arsenal when when other approaches have failed. So that is everything
. And I guess I'm not sure if we can get a few a couple of questions or. Yeah, I think we have some time for time for maybe three or four questions. If you have questions, please line up at the microphones down here. Up there, there are no microphones for questions. And while you pile up, we hear a question from the Internet. Angel, thank you. First question, how many chips do you destroy on average until you successfully break in some devices where you where you only have one device? You have to be very careful with how you proceed. So in those ones you like I said, with the absolute maximum ratings of the device, you do not exceed them. You play it very safe. Other devices where it's a more general purpose, microcontroller, whatever, you got a whole tube of them, then you can throw seventeen volts at a five volt chip or whatever you want, and in some cases you'll blow up 10 percent of the devices very quickly. But the other story, you'll blow up ninety percent of your devices very quickly. About 10 percent of might actually latch something advantageous and do something you want before they blow up. Thanks. There is somebody at microphone one, please ask a short question. But I was just wondering how reproducible the glitches are, like if you find a particular offset in length, once you find the offset, depending on the timing drift of your own hardware. That's pretty much the limitation. You will be able to hit that construction every single time. Nearly always do the same thing every time. Usually. Usually, yeah. Like if it is a compar or a jump right after it from a conditional branch, it will stop the branch from happening or caused the branch to happen with very good repeatability other than the drift in your own clocking hardware. And are there any more questions? Yes. Microphone number four, please. Yes. Is it possible to work a glitch through a PLL? It's almost impossible or two to glitch an actual PLL device or one that's clocked behind a PLL as proof that
I haven't actually tried. I would assume it would be a good defense, but I can't comment too much more. I haven't haven't actually tried specific hadan devices like that. No more questions for now.