04/15/2021 I noticed in the news there is a problem in the Mars helicopter. It was suppose to fly on Sunday April 11 but on a test on Friday April 8 they had a problem and postponed it until Wednesday and now again for at lest another week. I don't normally comment on news stories, as I'm usually pretty clueless about what's going on :-), but on this story I think I can give a little extra insight into it. I only know what I'm reading from second hand news reports so I could be all wrong. ********** As some of you know, I'm an Embedded System Software Engineer. Tiny little computers are "embedded" in almost everything these days, from "smart" thermostats (or "smart" anything), to medical devices like pacemakers, to anti-lock brakes on cars. I like to use a microwave oven as an example. It has a tiny computer in it - which has a keyboard (the keypad), a display (usually just one line of text), timers and other basic computer stuff like a CPU, Main Memory and I/O ports, etc. It has a complete computer inside but it only runs one custom program - a microwave oven control program - whatever that needs to be. This type of computer system is sometimes referred to as an "embedded system" or a "dedicated device". You can't surf the web with it or do your taxes on it or play a computer game - it's just does microwave oven stuff. Anyway that's the kind of computer programming I do. I don't work on microwave ovens - but I could - it's similar to the kind of programming I do do. ********** Getting back to Mars - they described their problem as a “watchdog timer expiration". A "watchdog timer" is basically like a "dead man switch". On a railroad train, the engineer has to keep pushing a button to indicate he is alert and driving the train. If for any reason he stops pressing that button, i.e. he dies (or more likely falls asleep :-) a timer runs out and the train is brought to a complete stop so it isn't a run-away train. A "watchdog timer" is sort of like that, it is usually built right into the microcontroller's hardware. In the mainloop of the program, the watchdog timer is constantly being restarted. If the program "gets lost" or "hangs" and doesn't return to restart the watchdog, it will count down to zero and then reboot the microprocessor. (It is also called a "computer operating properly" or "COP" timer.) Generally speaking you NEVER, NEVER, NEVER, NEVER want to see this happen. (Did I mention NEVER enough?) It is a last ditch, meat axe fix - a final safety net - ONLY! It's like airbags in your car - you NEVER want to see them deploy (unless of course you're about to run head on into an oncoming semi-trailer truck at 70 MPH :-) I don't want to seem critical of other programmers, especially without knowing all the details, but I'm surprised a bug like this is showing up so far along in the project - it raises a much scarier possibility - that the hardware itself may have been damaged during landing and is causing the problem. If that's true it's an even ten times bigger headache and may not be fixable at all. Let's hope that's not the case. ********** The big problem with fixing this kind of bug is that generally there is no information as to WHY the program failed or WHERE it was in the code when it did fail. Just that it failed. I'm sure they've tried to reproduce the situation on a parallel system here on earth, but there's no guarantee they can recreate the EXACT scenario that failed. And if they can't do that it will be a real nightmare to fix. I don't want to sound too negative, it's POSSIBLE they may just spot a flaw in the program, fix it and be on their merry way. Unfortunately, in the real world, and I think that includes Mars now too, there's about a four out of five chance this is going to be a really nasty problem to figure out and fix. When they said on Saturday that they might fly on Wednesday, I figured they were dreaming and would be lucky to even identify what had actually gone wrong by then. ********** Getting back to Mars again. Their statement that "the team considered and tested multiple potential solutions to this issue" actually tells me that they still don't even know what went wrong in the first place but are just beefing up parts of the program hoping that will fix it. They mentioned that other parts of the unit are working fine, but I think that's just "happy talk" and may be misleading. I suspect there are a bunch of totally separate microcontrollers performing different functions on the helicopter. All this says is that those aren't failing - which means nothing about the current problem. (If you had a flat tire the fact that the other three are okay doesn't really tell you anything :-) ********** So what do I expect to happen? Well, the trouble with any bug is that it could take 10 minutes or ten weeks to figure out and fix (possibly NEVER) - it just depends on the breaks. I assume they will try out their fixes here and then upload them to Mars and try to repeat the failed test and see how it goes. I suspect a reset during an actual flight could be catastrophic. Who knows, they could have two firsts - first flight on another planet AND the first aviation crash report from another planet :-) I hope not - I think space stuff is even cooler than "BattleBots" and I'm a big fan of both :-) NASA hasn't called me yet for help - so they must think they have it under control :-) ********** Dead man's switch From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Dead_man%27s_switch Saturday April 10, 2021 Mars Helicopter Flight Delayed to No Earlier than April 14 https://mars.nasa.gov/technology/helicopter/status/291/mars-helicopter-flight-delayed-to-no-earlier-than-april-14/ Monday April 12, 2021 NASA's Mars helicopter Ingenuity won't fly until next week at the earliest Work Progresses Toward Ingenuity’s First Flight on Mars https://mars.nasa.gov/technology/helicopter/status/290/work-progresses-toward-ingenuity-s-first-flight-on-mars/ Top ten best 2019 Battlebots fights https://www.youtube.com/watch?v=_JRWVqwf_rU ...