Publications

I will be listing my publications and other talks here. You can also find them in my Google Scholar page. Some of these have also include the code I've written as part of the work in the [code] tag, and each paper has identities if I did this as part of work or in service to a client.

Peer Reviewed:

Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Castricato, L., & Raff, E. (2022). VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance. ECCV. [arXiv]
DesJardin, M., Raff, E., Baranco, N., & Mastrogiannis, D. (2022). Cross-Sectional Survey of High-Risk Pregnant Women’s Opinions on COVID-19 Vaccination. Women’s Health Reports, 3(1), 608–616. https://doi.org/10.1089/whr.2022.0006
DesJardin, M., Raff, E., Baranco, N., & Mastrogiannis, D. (2022). Pregnant Women’s Opinions on the COVID-19 Vaccination in Pregnancy [A301]. Obstetrics & Gynecology, 139(1), 87S-87S. https://doi.org/10.1097/01.AOG.0000825524.73715.9a
Alam, M. M., Raff, E., Oates, T., & Holt, J. (2022). Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations. International Conference on Machine Learning. [arXiv] [official] [code]
Raff, E. (2022). Does the Market of Citations Reward Reproducible Work? ML Evaluation Standards Workshop at ICLR 2022. [arXiv]
Raff, E., & Farris, A. L. (2022). A Siren Song of Open Source Reproducibility. ML Evaluation Standards Workshop at ICLR 2022. [arXiv] Oustanding Paper Award! 1 of 5
Lu, F., Ferraro, F., & Raff, E. (2022). Continuously Generalized Ordinal Regression for Linear and Deep Models. SIAM International Conference on Data Mining (SDM22). [arXiv]
Nolet, C. J., Gala, D., Raff, E., Eaton, J., Rees, B., Zedlewski, J., & Oates, T. (2022). Semiring Primitives for Sparse Neighborhood Methods on the GPU. MLSys Conference. [arXiv] Oustanding Paper Award! 1 of 5
Nguyen, A. T., Lu, F., Munoz, G. L., Raff, E., Nicholas, C., & Holt, J. (2022). Out of Distribution Data Detection Using Dropout Bayesian Neural Networks. Proceedings of the 36th AAAI Conference on Artificial Intelligence.
Kebe, G. Y., Richards, L. E., Raff, E., Ferraro, F., & Matuszek, C. (2022). Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech. AAAI. [arXiv]
Joyce, R. J., Amlani, D., Nicholas, C., & Raff, E. (2022). MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels. The AAAI-22 Workshop on Artificial Intelligence for Cyber Security (AICS). [arXiv]
Ganesan, A., Gao, H., Gandhi, S., Raff, E., Oates, T., Holt, J., & McLean, M. (2021). Learning with Holographic Reduced Representations. Advances in Neural Information Processing Systems. [arXiv] [code]
Kebe, G. Y., Higgins, P., Jenkins, P., Darvish, K., Barron, R., Winder, J., Engel, D., Raff, E., Ferraro, F., Matuszek, C., & Hamilton, B. A. (2021). A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning. NeurIPS. [official] [code]
Ordun, C., Cha, A. N., Raff, E., Gaskin, B., Hanson, A., Rule, M., Purushotham, S., & Gulley, J. L. (2021). Intelligent Sight and Sound : A Chronic Cancer Pain Dataset. NeurIPS. [arXiv]
Joyce, R. J., Raff, E., & Nicholas, C. (2021). Rank-1 Similarity Matrix Decomposition For Modeling Changes in Antivirus Consensus Through Time. Proceedings of the Conference on Applied Machine Learning for Information Security. [arXiv] [official]
Nguyen, A. T., Raff, E., Nicholas, C., & Holt, J. (2021). Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints. IJCAI-21 1st International Workshop on Adaptive Cyber Defense. [arXiv]
Richards, L. E., Nguyen, A., Capps, R., Forsythe, S., Matuszek, C., & Raff, E. (2021). Adversarial Transfer Attacks With Unknown Data and Class Overlap. Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec ’21). https://doi.org/10.1145/3474369.3486862
Joyce, R. J., Raff, E., & Nicholas, C. (2021). A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels. Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (AISec ’21). https://doi.org/10.1145/3474369.3486867
Ordun, C., Raff, E., & Purushotham, S. (2021). Generating Thermal Human Faces for Physiological Assessment Using Thermal Sensor Auxiliary Labels. ICIP. http://arxiv.org/abs/2106.08091
Raff, E. (2021). Exact Acceleration of K-Means ++ and K-Means. In 30th International Joint Conference on Artificial Intelligence (IJCAI-21). [arXiv] [code]
Bouthillier, X., Delaunay, P., Bronzi, M., Trofimov, A., Nichyporuk, B., Szeto, J., Sepah, N., Raff, E., Madan, K., Voleti, V., Kahou, S., Michalski, V., Serdyuk, D., Arbel, T., Pal, C., Varoquaux, G., Vincent, P. (2021). Accounting for Variance in Machine Learning Benchmarks. In Machine Learning and Systems (MLSys). [arXiv]
Raff, E., Fleshman, W., Zak, R., Anderson, H. S., Filar, B., & McLean, M. (2021). Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection. In The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). [arXiv] [code]
Raff, E. (2021). Research Reproducibility as a Survival Analysis. In The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). [arXiv] [code]
Nolet, C. J., Lafargue, V., Raff, E., Nanditale, T., Oates, T., Zedlewski, J., & Patterson, J. (2020). Bringing UMAP Closer to the Speed of Light with GPU Acceleration. In The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). [arXiv]
Raff, E., & Nicholas, C. (2020). A Survey of Machine Learning Methods and Challenges for Windows Malware Classification. In NeurIPS 2020 Workshop: ML Retrospectives, Surveys & Meta-Analyses (ML-RSA). [arXiv] Best Paper!
Zhang, W., Zhang, M., Zhang, J., Liu, Z., Chen, Z., Wang, J., … Raff, E.,Messina, E. (2020). Flexible and Adaptive Fairness-aware Learning in Non-stationary Data Streams. In 32th International Conference on Tools with Artificial Intelligence (ICTAI). [arXiv]
Pillai, N., Raff, E., Ferraro, F., & Matuszek, C. (2020). Sampling Approach Matters: Active Learning for Robotic Language Acquisition. In 2020 IEEE International Conference on Big Data (Big Data). [arXiv]
Raff, E., Holt, J., Filar, B. (2020). Getting Passive Aggressive About False Positives: Patching Deployed Malware Detectors. To appear in IEEE International Conference on Data Mining Workshop (ICDM) on Deep Learning for Cyber Threat Intelligence (DL-CTI) [arXiv]
Raff, E., Zak, R., Munoz, G. L., Fleming, W., Anderson, H. S., Filar, B., … Holt, J. (2020). Automatic Yara Rule Generation Using Biclustering. In 13th ACM Workshop on Artificial Intelligence and Security (AISec’20). [arXiv] [bibtex] [code] Best Paper!
Ordun, C., Raff, E., & Purushotham, S. (2020) The Use of AI for Thermal Emotion Recognition: A Review of Problems and Limitations in Standard Design and Data. AAAI FSS-20 AI in Government and Public Sector Applications [arXiv]
Eren, M. E., Solovyev, N., Raff, E., Nicholas, C., & Johnson, B. (2020). COVID-19 Kaggle Literature Organization. Proceedings of the ACM Symposium on Document Engineering 2020. [arXiv] [code]
Ordun, C., Purushotham, S., & Raff, E. (2020). Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs. epiDAMIK 2020: 3rd EpiDAMIK ACM SIGKDD International Workshop on Epidemiology Meets Data Mining and Knowledge Discovery. [arXiv] [video]
A. Rahnama, A. T. Nguyen, and E. Raff (2020), “Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory,” CVPR. [official] [arXiv]
Raff, E., Nicholas, C. K., & McLean, M. (2020) "A New Burrows Wheeler Transform Markov Distance" in AAAI [arXiv] [code]
Nguyen, A., Raff, E., & Sant-Miller, A. (2019) "Would a File by Any Other Name Seem as Malicious?" IEEE Big Data. [pre-print]
Raff, E. (2019) "A Step Toward Quantifying Independently Reproducible Machine Learning Research" NeurIPS. [pre-print] (Spotlight! 164/6743)
Klein, A., Jauregui, J., Raff, E., Gilotra, M. (2019) "Early Outcomes and Complications of Obese Patients Undergoing Shoulder Arthroplasty: a Meta-Analysis". Journal of Clinical Orthopaedics and Trauma [official]
Raff, E., Aurelio, J., & Nicholas, C. “PyLZJD: An Easy to Use Tool for Machine Learning.” In Proceedings of the 18th Python in Science Conference, 97–102, 2019. [pre-print] [official] [bibtex].
Nguyen, A. T. , Lien, J., Raff. E., & Mekaru, S. “Improved Automatic Pharmacovigilance: An Enhancement to the MedWatcher Social System for Monitoring Adverse Events.” Epidemiology Meets Data Mining and Knowledge Discovery Workshop at KDD 2019. [arXiv]
Raff, E., Fleming, W., Zak, R., Anderson, H., Finlayson, B., Nicholas, C., McLean, M. (2019). KiloGrams: Very Large N-Grams for Malware Classification. KDD Workshop on Learning and Mining for Cybersecurity (LEMINCS) [pre-print] [bibtex]
Rahnama, A., Nguyen, A. T., & Raff, E. (2019) Connecting Lyapunov Control Theory to Adversarial Attacks. KDD workshop on Adversarial Learning Methods for Machine Learning and Data Mining. [arxiv]
Nguyen, A T & Raff, E. (2019). Heterogeneous Relational Kernel Learning. KDD workshop on Mining and Learning from Time Series. [pre-print]
Raff, E., Sylvester, J., Forsyth, S., & McLean, M. (2019). Barrage of Random Transforms for Adversarially Robust Defense. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6528-6537). [official] [bibtex] Oral!
Raff, E., Lantzy, S., & Maier, E. (2019). Dr. AI, Where did you get your degree? Artificial Intelligence in Health (pp. 76-83). [pre-print] [bibtex]
- Raff, E., Lantzy, S., & Maier, E. (2018). Dr. AI, Where did you get your degree? Joint Workshop on AI in Health at ICML. [official] [bibtex] This is a shorter version that first appeared in the workshop.
Nguyen, A. T., & Raff, E. “Adversarial Attacks, Regression, and Numerical Stability Regularization.” In The AAAI-19 Workshop on Engineering Dependable and Secure Machine Learning Systems, 2019. [arXiv]
Fleshman, W., Raff, E., Sylvester, J., Forsyth, S., & McLean, M. (2019). Non-Negative Networks Against Adversarial Attacks. In AAAI-2019 Workshop on Artificial Intelligence for Cyber Security [arXiv]
Raff, E. & McLean, M. (2018) Hash-Grams On Many-Cores and Skewed Distributions. In IEEE Big Data [pre-print] [bibtex]
Raff, E. & Sylvester, J. (2018) Linear Models with Many Cores and CPUs: A Stochastic Atomic Update Scheme. In IEEE Big Data [pre-print] [bibtex]
Raff, E. (2018) Neural Fingerprint Enhancement. In ICMLA. [pre-print] [bibtex]
Raff, E. (2018). Growing and Retaining AI Talent for the United States Government. In AAAI FSS-18: Artificial Intelligence in Government and Public Sector. Arlington, Virginia, United States. [arXiv] [bibtex]
Fleshman, W., Raff, E., Zak, R., McLean, M., & Nicholas, C. (2018). Static Malware Detection & Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus. In The 13th International Conference on Malicious and Unwanted Software (MALWARE) [arXiv] [bibtex] Best Paper!
Raff, E., Sylvester, J., & Nicholas, C. (2018). Engineering a Simplified 0-Bit Consistent Weighted Sampling. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 1203–1212). New York, NY, USA: ACM. https://doi.org/10.1145/3269206.3271690 [arXiv] [bibtex]
Sylvester, J., & Raff, E. (2018). What About Applied Fairness? ML-Debates Workshop at ICML. [arXiv] [bibtex]
Raff, E. & Sylvester, J. (2018). Gradient Reversal Against Discrimination: A Fair Neural Network Learning Approach. In
The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA)
[pre-print]
- Raff, E. & Sylvester, J. (2018). Gradient Reversal Against Discrimination. FAT ML Workshop at ICML. [arXiv] , this is the shorter version of the paper that previously appeared in a workshop.
Raff, E., & Nicholas, C. K. (2018). Hash-Grams: Faster N-Gram Features for Classification and Malware Detection. Proceedings of the ACM Symposium on Document Engineering. [official] [pre-print]
Raff, E., & Nicholas, C. K. (2018). Lempel-Ziv Jaccard Distance, an effective alternative to ssdeep and sdhash. Digital Investigation. [official] [pre-print]
Raff, E. Sylvester, J., Mills, S. (2018). Fair Forests: Regularized Tree Induction to Minimize Model Bias. AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society. [arXiv] [bibtex]
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., & Nicholas, C. (2018). Malware Detection by Eating a Whole EXE. In AAAI-2018 Workshop on Artificial Intelligence for Cyber Security. [arXiv] [bibtex]
Raff, E., & Nicholas, C. (2017). Malware Classification and Class Imbalance via Stochastic Hashed LZJD. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (pp. 111–120). New York, NY, USA: ACM. https://doi.org/10.1145/3128572.3140446 [official] [pre-print] [bibtex]
Raff, E., Sylvester, J., & Nicholas, C. (2017). Learning the PE Header, Malware Detection with Minimal Domain Knowledge. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (pp. 121–132). New York, NY, USA: ACM. https://doi.org/10.1145/3128572.3140442 [official] [arXiv] [bibtex]
Zak, R. Raff, E., & Nicholas, C. K. (2017). What can N-Grams Learn for Malware Detection?. In 2017 12th International Conference on Malicious and Unwanted Software (MALWARE) (pp. 109–118). IEEE. [pre-print] [bibtex]
Raff, E., & Nicholas, C. K. (2017). An Alternative to NCD for Large Sequences, Lempel-Ziv Jaccard Distance. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. http://doi.org/10.1145/3097983.3098111 [official-pdf] [pre-print] [bibtex]
Raff, E. (2017). JSAT: Java Statistical Analysis Tool, a Library for Machine Learning. Journal of Machine Learning Research, 18(23), 1–5. Retrieved from http://jmlr.org/papers/v18/16-131.html [official link] [bibtex] [pdf]
Raff, E., Zak, R., Cox, R., Sylvester, J., Yacci, P., Ward, R., … Nicholas, C. (2016). An investigation of byte n-gram features for malware classification. Journal of Computer Virology and Hacking Techniques. doi:10.1007/s11416-016-0283-1 [official link, official-shared] [post-print] [bibtex]

Talks & Panels:

NewInML ICML Workshop 2020. https://slideslive.com/38933742/panel-discussion?ref=speaker-22476-latest
Raff, E., Sylvester, J., & McLean, M. (2016). Fighting Malware with Machine Learning. In GPU Technology Conference. Washington, D.C.: NVIDIA. [webpage] [pdf]

In Press/Other:

Sullivan, J., Elliot, J., Lloyd, K., & Raff, E. (2018). My Fair Data: How the Government Can Limit Bias in Artificial Intelligence. https://www.theatlantic.com/sponsored/booz-allen-hamilton-2018/how-government-can-limit-bias-in-ai/1972/ [bibtex]
DeepMind claims its new code-generating system is competitive with human programmers (2022)
Could AI be used to cheat on programming tests? (2022)
How secure are your AI and machine learning projects? (2020)
Bias in machine learning examples: Policing, banking, COVID-19 (2020)
Booz Allen looks to independent research to differentiate its AI work (2020)

Pre-Prints:

Nguyen, A. T., Richards, L. E., Kebe, G. Y., Raff, E., Darvish, K., Ferraro, F., & Matuszek, C. (2020). Practical Cross-modal Manifold Alignment for Grounded Language. [arXiv]
Jenkins, P., Sachdeva, R., Kebe, G. Y., Higgins, P., Darvish, K., Raff, E., … Matuszek, C. (2020). Presentation and Analysis of a Multimodal Dataset for Grounded Language Learning. ArXiv. [arXiv]
Raff, E. & Nicholas, C. K. (2018). Toward Metric Indexes for Incremental Insertion and Querying. [arXiv]

My thesis can be grabbed here.

Irrelevant, I recently learned my Erdős number is 4! Me -> Bryan Catanzaro -> Marc Snir -> Shlomo Moran -> Erdős

5! Me -> Charles Nicholas (#4) -> Tim Finin (#3) -> Yaacov Yesha (#2).

Publications

Main Menu