Statistics for Similarity-Based Patent Selection using Natural Language Processing