Simplex Fraud Detection
Jul 23, 2018
In the rapidly evolving fintech landscape, Simplex has emerged as a pioneer, consistently pushing the envelope with cutting-edge solutions. As a data scientist at Simplex, I had a front-row seat to this innovation, and an active role in driving it forward. Here is my journey.
When I first arrived at Simplex, I was fascinated by the labyrinthine complexity of the core machine learning models that lay at the heart of our fraud detection capabilities. These models, powered by the XGBoost algorithm, were impressively adept. Yet, I sensed opportunities for enhancement and thus, my quest for optimization began.
I dove headfirst into the world of machine learning algorithms, exploring the depths of each, from their strengths and weaknesses to their applicability to our unique dataset. This was not a mere academic exercise; it was a systematic comparison of state-of-the-art algorithms, each vying for a chance to power our models.
One algorithm stood out from the rest: CatBoost. Known for its handling of categorical features and robustness to overfitting, CatBoost seemed like a promising candidate. After rigorous testing and validation, I made the decision to switch from XGBoost to CatBoost, a move that marked a significant turning point in our modeling approach.
While the algorithm switch was a major breakthrough, it wasn't the only challenge I faced. A key obstacle was the imbalanced nature of our dataset – a common issue in fraud detection where fraudulent transactions are significantly outnumbered by legitimate ones. This posed a risk of our models being skewed towards predicting the majority class, thus overlooking the crucial minority class of fraudulent transactions.
To tackle this, I employed various strategies such as oversampling the minority class, undersampling the majority class, and utilizing cost-sensitive learning. I also experimented with different evaluation metrics that were more suitable for imbalanced data, such as the Area Under the Precision-Recall Curve (AUPRC) instead of the traditional accuracy metric.
Simultaneously, I worked on refining our training methods and enhancing our feature set. By optimizing hyperparameters, employing regularization techniques, and leveraging cross-validation, I strengthened our models' resilience. Through feature engineering, I was able to extract more predictive power from our data, further bolstering our model's performance.
These efforts bore fruit when we saw significant improvements in our fraud detection capabilities. In the realm of e-commerce, our precision increased by a remarkable 17% and our recall by 4%. Even in the notoriously volatile world of cryptocurrency transactions, we saw a 24% increase in precision and an 18% increase in recall.
To bring these enhanced models into the operational fold, I designed and developed an end-to-end model serving API from scratch. This API proved instrumental in streamlining the deployment of our models and facilitating their integration with various applications and services within Simplex.
My journey at Simplex was marked by discovery, innovation, and resilience in the face of challenges. By fusing technical expertise with creative problem-solving, we were able to elevate the standards of fintech and fraud detection. Looking back, I take pride in the strides we've made, and looking forward, I am eager for the boundless possibilities that lie ahead.