In this paper, a star sensor tracking method without a star library based on the angular distance chain algorithm is proposed to solve the problem that traditional star sensors rely on a fixed star library and need to be configured to work with multiple units in the tracking mode. This method achieves star map matching by dynamically generating angular distance chains, avoiding the dependence on the global star library. Experiments show that the recognition time of the algorithm in the tracking mode is reduced to milliseconds, and the maximum pose determination error is no more than 0.035°, which proves its effectiveness and reliability. The study provides key technical support for the development of low-cost and lightweight star sensors that are suitable for scenarios such as deep space exploration and near-Earth satellite clusters.
Keywords: angular distance chain algorithm, star sensor without star library, star map recognition, tracking mode, orientation, dynamic matching, deep space exploration
The paper proposes a two-stage method of training a robot based on demonstrations, combining a diffusion generative model and online additional training using the method of Proximal Policy Optimization. In the offline phase, the diffusion model uses a limited set of expert demonstrations and generates synthetic "pseudo-demonstrations", allowing to expand the variability and coverage of the original dataset. This eliminates the narrow specialization of the strategy and increases its ability to generalize. In the online phase, a robot with a pre-trained strategy adjusts its actions in a real environment (or in a high-precision simulation), which significantly reduces the risks of unsafe actions and reduces the number of necessary interactions. Additionally, parametrically efficient pre-tuning has been introduced, reducing computational costs for online learning, as well as value guidance that focuses the generation of new data on areas of states and actions with high Q scores. Experiments on tasks from the D4RL set (Hopper, Walker2d, HalfCheetah) show that our approach achieves the greatest accumulated reward with lower computational costs compared to alternatives. T-SNE analysis indicates a shift of synthetic data in the area of space with high Q scores, contributing to accelerated learning. The results obtained confirm the prospects of the proposed method for robotic applications, where it is important to combine the limited volume of demonstrations, the safety and effectiveness of the online phase.
Keywords: robot learning from demonstrations, diffusion generative models, reinforcement learning, Proximal Policy Optimization