基于改进近端策略优化算法的 AGV 路径规划与任务调度

Xuan Qi; Tong Zhou; Cunsong Wang; Xiaotian Peng; Hao Peng

doi:10.13196/j.cims.2023.0552

基于改进近端策略优化算法的 AGV 路径规划与任务调度

Xuan Qi, Tong Zhou, Cunsong Wang, Xiaotian Peng, Hao Peng

机械与动力工程学院

Nanjing Tech University

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Automated Guided Vehicle(AGV)is a type of automated material handling equipment with high flexibility and adaptability.The current research on optimal path and scheduling algorithms for AGVs still faces problems such as poor generalization,low convergence efficiency,and long routing time.Therefore,an improved Proximal Policy Optimization(PPO)algorithm was proposed.By adapting a multi-step action selection strategy to increase the step length of AGV movement,the AGV action set was expanded from the original 4 directions by 8 directions for optimizing the optimal path.The dynamic reward function was improved to adjust the reward value in real time based on the current state of AGV for enhancing its learning ability.Then,the reward value curves were compared based on different improvement methods to validate the convergence efficiency of the algorithm and the distance of the optimal path.Finally,by employing a continuous task scheduling optimization algorithm,a novel single AGV continuous task scheduling optimization algorithm had been developed to enhance transportation efficiency.The results showed that the improved algorithm shortened the optimal path by 28.6% and demonstrated a 78.5% increase in convergence efficiency compared to the PPO algorithm.It outperformed in handling more complex tasks that require high-level policies and exhibits stronger generalization capabilities.Compared to Q-Learning,Deep Q-Network(DQN)algorithm and Soft Actor Critical(SAC)algorithm,the improved algorithm showed efficiency improvements of 84.4%,83.7%,and 77.9% respectively.After the optimization of continuous task scheduling for a single AGV,the average path was reduced by 47.6%.

投稿的翻译标题	AGV path planning and task scheduling based on improved proximal policy optimization algorithm
源语言	繁体中文
页（从-至）	955-964
页数	10
期刊	Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS
卷	31
期	3
DOI	https://doi.org/10.13196/j.cims.2023.0552
出版状态	已出版 - 31 3月 2025

关键词

automated guided vehicle
path planning
proximal policy optimization algorithm
reinforcement learning
task scheduling

访问文件

10.13196/j.cims.2023.0552

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b0613e586b514756b99a2d55d5bf12fb,

title = "基于改进近端策略优化算法的 AGV 路径规划与任务调度",

abstract = "Automated Guided Vehicle(AGV)is a type of automated material handling equipment with high flexibility and adaptability.The current research on optimal path and scheduling algorithms for AGVs still faces problems such as poor generalization,low convergence efficiency,and long routing time.Therefore,an improved Proximal Policy Optimization(PPO)algorithm was proposed.By adapting a multi-step action selection strategy to increase the step length of AGV movement,the AGV action set was expanded from the original 4 directions by 8 directions for optimizing the optimal path.The dynamic reward function was improved to adjust the reward value in real time based on the current state of AGV for enhancing its learning ability.Then,the reward value curves were compared based on different improvement methods to validate the convergence efficiency of the algorithm and the distance of the optimal path.Finally,by employing a continuous task scheduling optimization algorithm,a novel single AGV continuous task scheduling optimization algorithm had been developed to enhance transportation efficiency.The results showed that the improved algorithm shortened the optimal path by 28.6% and demonstrated a 78.5% increase in convergence efficiency compared to the PPO algorithm.It outperformed in handling more complex tasks that require high-level policies and exhibits stronger generalization capabilities.Compared to Q-Learning,Deep Q-Network(DQN)algorithm and Soft Actor Critical(SAC)algorithm,the improved algorithm showed efficiency improvements of 84.4%,83.7%,and 77.9% respectively.After the optimization of continuous task scheduling for a single AGV,the average path was reduced by 47.6%.",

keywords = "automated guided vehicle, path planning, proximal policy optimization algorithm, reinforcement learning, task scheduling",

author = "Xuan Qi and Tong Zhou and Cunsong Wang and Xiaotian Peng and Hao Peng",

year = "2025",

month = mar,

day = "31",

doi = "10.13196/j.cims.2023.0552",

language = "繁体中文",

volume = "31",

pages = "955--964",

journal = "Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS",

issn = "1006-5911",

publisher = "Computer Integrated Manufacturing Systems",

number = "3",

}

TY - JOUR

T1 - 基于改进近端策略优化算法的 AGV 路径规划与任务调度

AU - Qi, Xuan

AU - Zhou, Tong

AU - Wang, Cunsong

AU - Peng, Xiaotian

AU - Peng, Hao

PY - 2025/3/31

Y1 - 2025/3/31

N2 - Automated Guided Vehicle(AGV)is a type of automated material handling equipment with high flexibility and adaptability.The current research on optimal path and scheduling algorithms for AGVs still faces problems such as poor generalization,low convergence efficiency,and long routing time.Therefore,an improved Proximal Policy Optimization(PPO)algorithm was proposed.By adapting a multi-step action selection strategy to increase the step length of AGV movement,the AGV action set was expanded from the original 4 directions by 8 directions for optimizing the optimal path.The dynamic reward function was improved to adjust the reward value in real time based on the current state of AGV for enhancing its learning ability.Then,the reward value curves were compared based on different improvement methods to validate the convergence efficiency of the algorithm and the distance of the optimal path.Finally,by employing a continuous task scheduling optimization algorithm,a novel single AGV continuous task scheduling optimization algorithm had been developed to enhance transportation efficiency.The results showed that the improved algorithm shortened the optimal path by 28.6% and demonstrated a 78.5% increase in convergence efficiency compared to the PPO algorithm.It outperformed in handling more complex tasks that require high-level policies and exhibits stronger generalization capabilities.Compared to Q-Learning,Deep Q-Network(DQN)algorithm and Soft Actor Critical(SAC)algorithm,the improved algorithm showed efficiency improvements of 84.4%,83.7%,and 77.9% respectively.After the optimization of continuous task scheduling for a single AGV,the average path was reduced by 47.6%.

AB - Automated Guided Vehicle(AGV)is a type of automated material handling equipment with high flexibility and adaptability.The current research on optimal path and scheduling algorithms for AGVs still faces problems such as poor generalization,low convergence efficiency,and long routing time.Therefore,an improved Proximal Policy Optimization(PPO)algorithm was proposed.By adapting a multi-step action selection strategy to increase the step length of AGV movement,the AGV action set was expanded from the original 4 directions by 8 directions for optimizing the optimal path.The dynamic reward function was improved to adjust the reward value in real time based on the current state of AGV for enhancing its learning ability.Then,the reward value curves were compared based on different improvement methods to validate the convergence efficiency of the algorithm and the distance of the optimal path.Finally,by employing a continuous task scheduling optimization algorithm,a novel single AGV continuous task scheduling optimization algorithm had been developed to enhance transportation efficiency.The results showed that the improved algorithm shortened the optimal path by 28.6% and demonstrated a 78.5% increase in convergence efficiency compared to the PPO algorithm.It outperformed in handling more complex tasks that require high-level policies and exhibits stronger generalization capabilities.Compared to Q-Learning,Deep Q-Network(DQN)algorithm and Soft Actor Critical(SAC)algorithm,the improved algorithm showed efficiency improvements of 84.4%,83.7%,and 77.9% respectively.After the optimization of continuous task scheduling for a single AGV,the average path was reduced by 47.6%.

KW - automated guided vehicle

KW - path planning

KW - proximal policy optimization algorithm

KW - reinforcement learning

KW - task scheduling

UR - http://www.scopus.com/inward/record.url?scp=105002335205&partnerID=8YFLogxK

U2 - 10.13196/j.cims.2023.0552

DO - 10.13196/j.cims.2023.0552

M3 - 文章

AN - SCOPUS:105002335205

SN - 1006-5911

VL - 31

SP - 955

EP - 964

JO - Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS

JF - Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS

IS - 3

ER -

基于改进近端策略优化算法的 AGV 路径规划与任务调度

摘要

关键词

访问文件

其它文件与链接

指纹

引用此