This research proposes a hierarchical control structure combined with reinforcement learning for the stable walking of bipedal robots. The hierarchical control structure consists of a walking trajectory planner and a low-level motion controller. The walking trajectory planner generates a CoM trajectory based on the predefined zero moment point (ZMP) trajectory. The low-level controller consists of a kinematic controller and a dynamic controller where the kinematic controller anchors robot dynamics as a dual-mass inverted pendulum (DMIP) system, and the dynamic controller tracks CoM trajectory by controlling ankle torque driven by a serial elastic actuator. Reinforcement learning generates ZMP trajectory and the training is implemented in MATLAB Simscape Multibody. By incorporating stability into the reward function, RL optimizes parameters of ZMP trajectory to improve walking performance. The control structure is implemented on a bipedal robot built in-house. Simulations and experiments verify the robot’s performance and effectiveness of RL.