EDUCATION:
Doctor of Philosophy, Georgia State University, 2014
Adrian Caciula CV
BIOGRAPHY:
Dr. Adrian Caciula worked at Columbia University, Center for Infection and Immunity, from 2015 to 2018, focusing on developing, designing and implementing bioinformatics tools for Next Generation Sequencing (NGS) data analysis. Prior to Columbia University, Adrian taught undergraduate courses in computer science at several universities in Georgia state (Georgia Southern University, Georgia State University, and University of North Georgia). Dr. Adrian Caciula received his PhD in Computer Science from Georgia State University in 2014. His interests are in algorithms, bioinformatics, and statistical modeling.
RESEARCH INTERESTS:
Algorithms, Bioinformatics, Statistical Modeling, and Data Visualization.
TEACHING
-
Spring 2019: COM-3590-331 -- Data Cleaning and Transformation
-- TR 3:00--4:15 Classroom: TBA
In real-world situations, data scientists must be able to use data from many dirty, autonomous, and heterogeneous data sources that are far from being ready to be analyzed. Preparing the data for analysis (often referred to as “data wrangling”) involves four different tasks: cleaning, sampling, transformation, and integration. For each of these tasks, interactive tools are useful both for preparing small data sets as well as for investigating the general quality or structure of a large data set. When dealing with large data sets measuring in many thousands or millions of rows, however, programmatic quantitative approaches are an absolute necessity to make data preparation a realistic task. This course covers both interactive tools and quantitative approaches to each of these tasks. Because data preparation is a focus of significant R&D and small advances may have major impacts on one’s productivity, the course also introduces students to the communities of research and practice that continue to advance the state of the art enabling students to stay abreast of valuable advances in this area.
Required Textbooks:
1. Principles of Data Integration. Doan, Halevy, and Ives.
2. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. 2ndEdition. McKinney.
Recommended:
3. Practical Statistics for Data Scientists: 50 Essential Concepts. Bruce, Bruce.
4. Data Wrangling with Python: Tips and Tools to Make Your Life Easier. Kazil, Jarmul.
5. The Visual Display of Quantitative Information. 2nd Edition. Tufte.
NEWS
- Next Wave: The National Security Review of Emerging Technologies
The Next Wave | Issue 22 | No. 1 | 2018 | Machine Learning
3 AI / Machine Learning enterprise seed startups in NYC to watch in 2019:
AI.Reverie is a simulation platform that offers data and computer vision APIs to help businesses train and improve their machine learning algorithms. Investors include Resolute Ventures, Compound Ventures, and others.
-
Comet.ml lets data driven organizations track their code, experiments and results on ML projects while also optimizing model hyperparameters, model architecture and feature choices. Investors include Trilogy Equity Partners, Fathom Capital, Two Sigma Ventures, and others.
-
A data platform as a service, Narrator.ai helps companies take advantage of their existing data, by providing fully managed data pipeline and transformations, and a business intelligence tool for visualizing metrics, trends and customer actions. Investors include Flybridge Capital.
Reference: medium.com
Credits: Organization One: Author One and Author Two / Organization Two: Author Three and Author Four