(Tic-Tac-Toe: a 2 player game (player x and player o) on 15x15 board, where one player who forms a chain of 5 stones first wins)
So I've implemented a Tic-Tac-Toe algorithm that uses simple alpha beta pruning..
This is what I have so far:
def ab_minimax(state):
max_position, max_value, alpha, beta = None, None, float('-inf'), float('inf')
positions = getAvailablePositions()
for position in positions:
temp = next_state(state, position)
value = self.ab_min_move(temp, 3, alpha, beta)
if max_value < value:
max_position = position
max_value = value
return max_position
def ab_max_move(state, depth, alpha, beta):
if game_ended(state) or depth == 0:
return get_score(state)
value = float('-inf')
positions = getAvailablePositions()
for position in positions:
temp = next_state(state, position)
value = max(value, self.ab_min_move(temp, depth-1, alpha, beta))
if value >= beta:
return value
alpha = max(alpha, value)
if alpha >= beta:
break
return value
def ab_min_move(state, depth, alpha, beta):
if game_ended(state) or depth == 0:
return get_score(state)
value = float('inf')
positions = getAvailablePositions()
for position in positions:
temp = next_state(state, position)
value = min(value, ab_max_move(temp, depth-1, alpha, beta))
if value <= alpha:
return value
beta = min(beta, value)
if alpha >= beta:
break
return value
This works fine, but obviously it requires too much time for the agent to return a move..
Then I came across this idea of Threat-Space search, which basically aims at placing "threats."
In this tic-tac-toe, threats are attacking sequences like ".ooo." "xoooo." (if I'm player o)
The problem is, I have no idea where I should place this threat-space search in my
alpha beta function....
I'm pretty sure the idea is to combine the threat space search with the original alpha beta minimax
algorithm,,, but I'm not sure where and how this should be done...
Can someone give me some explanation or really brief pseudocode..?
Thanks!