Simulated the scenario between edge servers and users with a clear graphic interface. Also, implemented the continuous control with Deep Deterministic Policy Gradient (DDPG) to determine the resources allocation (offload targets, computational resources, migration bandwidth) in the edge servers
The cloud computing based mobile applications, such as augmented reality (AR), face recognition, and object recognition have become popular in recent years. However, cloud computing may cause high latency and increase the backhaul bandwidth consumption because of the remote execution. To address these problems, edge computing can improve response times and relieve the backhaul pressure by moving the storage and computing resources closer to mobile users.
Considering the computational resources, migration bandwidth, and offloading target in an edge computing environment, the project aims to use Deep Deterministic Policy Gradient (DDPG), a kind of Reinforcement Learning (RL) approach, to allocate resources for mobile users in an edge computing environment.
picture originated from: IEEE Inovation at Work
$ python3 src/run_this.py
TEXT_RENDER = True / False
SCREEN_RENDER = True / False
Mobile User
Edge Server
Request Task: VOC SSD300 Objection Detection
Graphic Interface
Description
While determining the offloading server of each user is a discrete variable problem, allocating computing resources and migration bandwidth are continuous variable problems. Thus, Deep Deterministic Policy Gradient (DDPG), a model-free off-policy actor-critic algorithm, can solve both discrete and continuous problems. Also, DDPG updates model weights every step, which means the model can adapt to a dynamic environment instantly.
State
def generate_state(two_table, U, E, x_min, y_min):
one_table = two_to_one(two_table)
S = np.zeros((len(E) + one_table.size + len(U) + len(U)*2))
count = 0
for edge in E:
S[count] = edge.capability/(r_bound*10)
count += 1
for i in range(len(one_table)):
S[count] = one_table[i]/(b_bound*10)
count += 1
for user in U:
S[count] = user.req.edge_id/100
count += 1
for user in U:
S[count] = (user.loc[0][0] + abs(x_min))/1e5
S[count+1] = (user.loc[0][1] + abs(y_min))/1e5
count += 2
return S
Action
def generate_action(R, B, O):
a = np.zeros(USER_NUM + USER_NUM + EDGE_NUM * USER_NUM)
a[:USER_NUM] = R / r_bound
# bandwidth
a[USER_NUM:USER_NUM + USER_NUM] = B / b_bound
# offload
base = USER_NUM + USER_NUM
for user_id in range(USER_NUM):
a[base + int(O[user_id])] = 1
base += EDGE_NUM
return a
Reward
Model Architecture
Simulation Environment
Result
Number of Clients | Average Total proccessed tasks in the last 10 episodes | Training History |
---|---|---|
10 | 11910 | |
20 | 23449 | |
30 | 33257 | |
40 | 40584 |
Demo Environment
Demo Video