On-Board Software Self-recovery Technique for Defending MBU Effect
-
摘要: 为减少多位翻转(multi-bit upset,MBU)对星载计算机的危害,提出了一种抵御单粒子多位翻转的系统自恢复技术.该技术利用硬件EDAC(error detection and correction)检测多位错的能力,结合系统自恢复的容错技术实现MBU的捕获,并选择性地启动系统自恢复,以防止MBU造成的系统安全性问题.通过建立关键数据查询,避免不必要的系统自恢复,采用除法散列法和适度恢复策略提高处理速度.SEU(single event upset)危害性分析以及某卫星在轨SEU观测数据表明,提出的系统自恢复技术可使SEU引起卫星故障的概率下降90%以上.该技术已成功地应用于我国XX02卫星.Abstract: A software self-recovery technique was proposed to mitigate the effect of multi-bit upsets (MBUs) on on-board computers to guarantee satellite safety. This technique takes advantages of the inherent multi-bit error detection capability of hardware EDAC (error detection and correction) and self-recovery feature of fault-tolerant technology to capture MBU and selectively starts system self-recovery to prevent on-board computer problems caused by MBUs. Key data index is set up to refrain from unnecessary self-recoveries. A division hash table and a novel moderate recovering strategy are applied to minimize the interrupt processing time. SEU (single event upset) hazard analysis and flight data show that the proposed technique can decrease the probability of satellite failures caused by SEUs by more than 90%. The technique has been successfully applied to XX02 satellites.
-
赵海涛,张云彤. 东方红三号系列卫星在轨故障统计分析[J]. 航天器工程,2007(1): 33-37. ZHAO Haitao, ZHANG Yuntong. Statistical analysis of DFH-3 serial satellites failure[J]. Spacecraft Engineering, 2007(1): 33-37. 古士芬,臧振群,师立勤,等. 美国几架航天飞机所发生的SEU研究[J]. 空间科学学报,1998,18(3): 253-259. GU Shifeng, ZANG Zhenqun, SHI Liqin, et al. Study on SEU occurred on board of several space shuttles[J]. Chinese Journal of Space Science, 1998, 18(3): 253-259. 冯彦君,华更新,刘淑芬. 航天电子抗辐射研究综述[J]. 宇航学报,2007,28(5): 1071-1080. FENG Yanjun, HUA Gengxin, LIU Shufen. Radiation hardness for space electronics[J]. Journal of Astronautics, 2007, 28(5): 1071-1080. 胡刚毅. 微电子器件的抗辐射加固和高可靠技术[J] . 微电子学,2003,33(3): 224-231. HU Gangyi. Radiation hardening and reliability technologies for microelectronic devices[J]. Microelectronics, 2003, 33(3): 224-231. 曲峰,崔刚,杨孝宗,等. TS-1·1小卫星星务计算机RAM纠检错电路的设计与实现[J]. 计算机工程与科学,2002,24(2): 70-76. QU Feng, CUI Gang, YANG Xiaozong, et al. Design and implementation of the RAM EDAC in the house-keeping computer of the TS-1·1[J]. Computer Engineering & Science, 2002, 24(2): 70-76. 张钰,郑阳明,黄正亮. 皮卫星星载计算机存储模块的容错结构设计[J]. 宇航学报,2008,29(6): 2057-2061. ZHANG Yu, ZHENG Yangming, HUANG Zhengliang. Fault-tolerant design of memory module for pica-satellite on-board computer[J]. Journal of Astronautics, 2008, 29(6): 2057-2061. 孙吉利,张平. 基于FPGA的星载计算机自检EDAC电路设计[J]. 微计算机信息,2009,25(8-2): 131-133. SUN Jili, ZHANG Ping. A self-checking EDAC design based on FPGA for spacecraft computer[J]. Microcomputer Information, 2009, 25(8-2): 131-133. 张宇宁,常亮,杨根庆,等. 星载高速海量存储系统的并行R-S纠错方法[J]. 航天控制,2009(3): 86-89. ZHANG Yuning, CHANG Liang, YANG Genqing, et al. Parallel reed-solomon error correction for spaceborne mass memory system[J]. Aerospace Control, 2009(3): 86-89. BENTOUTOU Y. A real time EDAC system for applications onboard earth observation small satellites aerospace and electronic systems[J]. IEEE Transactions on Aerospace and Electronic Systems, 2012, 48(1): 648-657. BENTOUTOU Y. A real time low complexity codec for use in low Earth orbit small satellite missions[J]. IEEE Transactions on Nuclear Science, 2006, 53(3): 1022-1027. BENTOUTOU Y. Efficient memory error coding for space computer applications//Proceedings of the 2nd IEEE International Conference on Information & Communication Technologies: From Theory to Applications. Damascus: , 2006: 2347-2352. BENTOUTOU Y, DJAIFRI M. Observations of single-event upsets and multiple-bit upsets in random access memories on-board the Algerian satellite[J]. Nuclear Science Symposium Conference Record, 2008(Sup.): 2568-2570. LIU Jie, MA Feng, DONG Ming, et al. Heavy ion induced single event effects in semiconductor device[J]. Nuclear Instruments and in Physics Research Section B:Beam Interactions with Materials and Atoms, 1998, 135(1-4): 239-243. UNDERWOOD C I, The single-event-effect behavior of commercial-off-the-shelf memory devices a decade in low-earth orbit//Proceedings of 4th European Conference on Radiation and Its Effects on Components and Systems. Cannes: , 1997: 251-258. TIPTON A D, PELLISH J A, REED R A, et al. Multiple-bit upset in 130 nm CMOS technology[J]. IEEE Transactions on Nuclear Science, 2006, 53(6): 3259-3264. MERELLE T, SAIGNE F, SAGNES B, et al. Alpha induced SEU and MBU rates evaluation for advanced SRAMs by monte-carlo simulations//Proceedings of the 8th European Conference on Radiation and Its Effects on Components and Systems, 2005. Agde: IEEE, 2005: E3-1-E3-6. 王纪文. 计算系统的自恢复模型构建和自愈策略的研究. 南京:南京理工大学,2006. 卢东昕,滕丽娟,洪炳熔,等. 基于看门狗的星载软件抗SEL、SEU保护系统设计[J]. 哈尔滨工业大学学报,2001,33(1): 13-19. LU Dongxin, TENG Lijuan, HONG Bingrong, et al. Design of a five-level protection system based on watch-dog for spacecraft computer[J]. Journal of Harbin Institute of Technology, 2001, 33(1): 13-19. LI Guohui, HU Fangxiao, DU Xiaokun, et al. Energy-efficient deterministic fault-tolerant scheduling for embedded real-time systems[J]. Journal of Modern Transportation, 2009, 17(4): 283-291.
点击查看大图
计量
- 文章访问数: 905
- HTML全文浏览量: 50
- PDF下载量: 446
- 被引次数: 0