Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening.
Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening.However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed data