Fraudsters are not Model Citizens
In my earlier post, I did an overview of the outlier detection techniques in big data and specifically Hadoop context. As I mentioned, fraud detection is essentially translates to outlier detection in data mining parlance.
In his post, I will go over a distribution model based technique which has been implemented as two map reduce jobs in beymani available in github. I will use the credit card transaction as the example.
Although distribution based models focus on the temporal aspect of the data, they are the essential initial steps towards a complete fraud detection process.