Very nice explanation of a practical conversational search system.
Is this same dialog system used for both voice and text based conversations?
I wonder if traditional search evaluation metrics like P@1 are really appropriate in the conversational setting. It seems the user will likely reach their target product eventually in a conversation, assuming you have it in your catalog. I wouldn't expect the system to find the best product after a single dialog turn. How do you even create a ground truth to measure precision for a single dialog turn if any product that matches the intent revealed so far is a positive product? And since the full user intent has not been revealed in dialog yet, there doesn't seem to be a way to improve accuracy beyond returning any product that matches the intent revealed so far. So, P@1 doesn't really seem like a good proxy for the user satisfaction on a session level. Wouldn't we want to use something more like "number of dialog turns before conversion"?