- This event has passed.
Student Data Analysis Seminar
November 13, 2019 @ 4:00 pm - 5:00 pm
The UFII Student Data Analysis Seminar (SDAS) organizers would like to invite you to the next Seminar of the Fall 2019 semester, Wednesday, November 13, 2019, 4:00PM-5:15PM, in the UFII Office, E251 CSE Bldg. Students, postdocs, and faculty whose interests intersect with data science are invited to attend Hadi Abdullah’s talk “Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems,” We are also searching for people that would like to help us organize SDAS events or give a seminar this semester—if interested, please contact Patrick (email@example.com) or Parker (firstname.lastname@example.org).
The Student Data Analysis Seminar provides a venue for students, postdocs and faculty to discuss the applications and theories of data science, machine learning, bioinformatics and other related topics. Presentation topics will focus on data analysis applications, as well as the algorithms, mathematical theory, and statistical inference tools which underpin data science. Examples of potential topics include graphical models, neural networks, topological data analysis, computational geometry, and applications showcasing the effective use of these and other methods.
SDAS is targeted at all students across the University working on data analysis related problems. Attendance is open- Graduate students, faculty, and interested undergraduate students are welcome to attend any week. Slots are available for students interested in presenting on a data analysis related topic (send an email to Patrick (email@example.com) or Parker (firstname.lastname@example.org)). We are especially interested in hosting speakers from a diverse set of backgrounds, including bioinformatics, ecology, chemistry, medical imaging, and more.
Schedule of Seminars:
November 13, 2019: Hadi Abdullah (PhD Student, Computer Science)
Title: Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems
Abstract: Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands – audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks.
A typical seminar session consists of a 50 minute presentation (40 minutes presentation + 10 minutes Q&A)
Please RSVP so that we can make sure that there is plenty of seats and food for everyone.
Twice a semester, SDAS will host a 50 minute social session with refreshments and light snacks to provide a networking opportunity for students interested in topics intersecting the Informatics Institute’s thrust areas.
October 16, 2019: Mahsan Nourani ( PhD Student – Computer Science)
Title: The Effects of Meaningful and Meaningless Explanations on Trust and Perceived System Accuracy in Intelligent Systems
Abstract: Machine learning and artificial intelligence algorithms can assist human decision making and analysis tasks. While such technology shows promise, willingness to use and rely on intelligent systems may depend on whether people can trust and understand them. To address this issue, researchers have explored the use of explainable interfaces that attempt to help explain why or how a system produced the output for a given input. However, the effects of meaningful and meaningless explanations (determined by their alignment with human logic) are not properly understood, especially with users who are non-experts in data science. This talk will focus on human-centered methods to understand how users of intelligent systems perceive its accuracy when encountering such outputs and explanations.
October 3, 2019: Atul Divakarla ( PhD Student – Physics and UFII Fellow)
Title: Bayesian Inference and Gravitational-Waves: How we detect and classify gravitational-wave bursts
Abstract: The LIGO and Virgo Scientific Collaborations have observed and catalogued eleven significant gravitational-wave signals in the first two observing runs, each of them originating from the coalescence of binary black holes or neutron stars. Considerable effort is also undertaken to search for gravitational-wave “bursts,” unmodeled (or minimally-modeled) transients. The possibility of an unexpected source is the most promising motivation for gravitational-wave burst searches. In the event of a gravitational-wave burst detection, a key question will be: what is the astrophysical source of this burst? In this work, we consider a power-law model, which can be used to describe a variety of gravitational-wave signals including bursts from cosmic string kinks, cosmic string cusps, and memory. Each of these signals is characterized by a unique spectral index. We demonstrate parameter estimation, signal detection, and model selection for memory and power-law bursts and show how Bayesian inference can be used to measure the power-law spectral index, thereby distinguishing between different astrophysical scenarios.
September 19, 2019: Parker Edwards ( PhD Student – Mathematics)
Title: Topological Data Analysis of Actin Networks
Abstract: Actin is a family of proteins which cells assemble into networks of tightly packed fibers with local geometric structure. Regulation of these networks significantly affects cellular and bodily functions like cell motility and muscle contraction. Defects in human actin regulation can contribute to disease, prompting ongoing research to quantify how different regulatory mechanisms affect network structure.
Variances in the density, length, and orientation of actin fibers gives rise to localized geometry in the network. We analyze actin networks starting from data that consist of high resolution live-cell microscopy images of cells’ actin fibers. Our methodology detects localized features using image segmentation and tools from topological data analysis: relative persistent homology and persistence landscapes. Subsequent machine learning methods quantify structural differences using persistence summaries of each image as the feature vectors.
April 11, 2019: Tong Shao (PhD student – Department of Electrical and Computer Engineering)
Title: How machine understands images: image captioning and storytelling
Abstract: The presentation will introduce a fundamental problem in artificial intelligence, which is how to automatically generate descriptions (captions) for an image, i.e. image captioning. Automatically describing the content of an image connects computer vision and natural language processing. It’s widely used in many applications and produces language information for further processing. The traditional methods are basically the retrieval methods, which find the best match in the word/phrase/sentence candidates. Currently, the state-of-the-art image captioning methods are machine learning methods, which use the generative model based on a recurrent neural network (RNN). The model is trained to maximize the likelihood of the target description sentence given the training image. The model uses a convolutional neural network (CNN) to extract image feature and inputs it into the RNN to generate captions. Many techniques have been introduced to improve the performance such as the multi-model, semantic decomposition and scene graph. Experiments on several datasets show the accuracy (BLEU score) and the fluency of the generated captions. However, the accuracy and the diversity of the captions are still not very good. Some methods to improve the diversity and introduce the sentiment to the captions will be discussed. And the concept of image storytelling will be briefly introduced as well.
March 28 2019: Dr. Nikolaos Sapountzis (Post-doctoral Fellow, FICS)
Title: Optimal Downlink and Uplink User Association in Backhaul-limited HetNets towards 5G
Abstract: By 2019, it is no secret to the global 5G networking community: there are not adequate bandwidth resources to go around anymore. The growing traffic demand rate is expected to overpass the 10 exabytes per month soon, driven by not only the 10 billion user equipment (UE) but also the IoT devices that are anticipated to bring online 50 billion other devices. Such differentiated types of communications, offering a plethora of differentiated services, in either downlink or uplink direction, sharpen the traffic demand heterogeneity adding tough and complex constraints in terms of latency, capacity, jitter, etc. Operators (e.g., AT&T, Verizon) struggling with such traffic increase bet on aggressively dense deployments, overlaying the conventional macro cell, where low-power small cells dominate. Such heterogeneous network deployments will allow to considerably improve spectral efficiency. Nevertheless, the higher the deployment density the higher the chance that these networks will suffer from intense spatio-temporal variations. Such fluctuations can create serious problems in terms of performance if not well studied. Also, the aggressive small cell densification, followed by a tremendous capacity crunch, threatens the capacities of the corresponding backhaul links that provide connectivity to the core and emerges the backhaul network as a key performance bottleneck for 5G systems. All these make the problem of associating a user (e.g., tablet, mobile phone, IoT device) to a BS challenging. In that context, we propose an analytical framework for user association in heterogeneous backhaul-limited 5G networks that jointly considers radio access and backhaul performance. We derive an algorithm that takes into account spectral efficiency, base station load, different degrees of load balancing, backhaul link capacities, and topology, and uplink and downlink traffic demand, and we analytically prove it converges to an optimal solution. We then use extensive simulations to study the impact of backhaul capacity limitations towards 5G on different key performance metrics.
March 14 2019: Dhruv Mahajan ( PhD Student – Computer Science)
Title: Analyzing Traffic Signal Performance Measures to Automatically Classify Signalized Intersections
Abstract: Traffic signals are installed at road intersections to control the flow of traffic. An optimally operating traffic signal improves the efficiency of traffic flow while maintaining safety. The effectiveness of traffic signals has a significant impact on travel time for vehicular traffic. There are several measures of effectiveness (MOE) for traffic signals. In this study, we develop a work-flow to automatically score and rank the intersections in a region based on their performance, and group the intersections that show similar behavior, thereby highlighting patterns of similarity. In the process, we also detect potential bottlenecks in the region of interest.
February 28, 2019: Parker Edwards ( PhD Student – Mathematics)
Title: Topological Data Analysis of Actin Networks
Abstract: Networks of filaments assembled from the protein actin contribute significantly to cells’ ability to move and change shape. These actin networks exhibit distinct local geometric structure. Some networks contain regions of straight and tightly packed fibers, for instance, as well as loops of varying sizes. We analyze actin networks starting from data that consist of high resolution live-cell microscopy images of cells’ actin fibers. Our methodology detects localized features using image segmentation and tools from topological data analysis: relative persistent homology, a novel approach in the field, and persistence landscapes. We are presently experimenting with a number of subsequent machine learning methods using geometric summaries of each image as the feature vectors.
February 14, 2019: Sankalp Gilda ( PhD Student – Astronomy)
Title: Machine Learning and Astronomy
Abstract: Abstract: Modern astrophysical surveys are capable of collecting larger amounts of images and data in a single night of operation than projects of the past did during their entire lifetimes. Astronomy in the 21st century has firmly and surely entered the era of Big Data, which brings with it a fresh set of challenges and opportunities. Scalability and practicality require that practitioners employ efficient machine learning and image analysis algorithms. I discuss here some of the methods and tools used by astronomers and cosmologists to tackle these problems and highlight some exemplary results. We then shift gears and move from a high level overview of the field and focus on the specific problem of model over-fitting. Specifically, we refer to the problem of stellar characterization from low-mid resolution spectra. We highlight the deficiencies of the current indices-based method, and discuss a robust K Nearest Neighbors-based feature selection strategy that is able to deliver excellent out-of-sample results, even in the presence of small training data sets.
November 8, 2018: Tyler Richards ( DSI President and Industrial and Systems Engineering undergraduate)
Title: Analytic Frameworks for Student Organizations: Exploring 3 Years of Data at DSI
Abstract: DSI is a Data Science Organization hyper focused on teaching and growing a data science community at UF. For the past three years, it has been hosting workshops on everything from machine learning in R or Python to Natural Language Processing. Over the course of the past semester, DSI has generated analytic capabilities focused on the over two thousand workshop attendances for visualization and predictive purposes to increase the organization’s effectiveness. This talk will cover both the analytic insights (workshop creation, timing, etc.) from the data and also the learnings from practical applications of data science.
September 6, 2018: Patrick Emami (PhD Student – Computer Science)
Title: Machine Learning for Intelligent Transportation Systems
Abstract: The field of intelligent transportation systems (ITS) involves the development of computational methods for improving the quality of transportation; key applications include autonomous vehicles, traffic surveillance, and traffic network optimization. ITS has been revolutionized by advances in machine learning (ML), partly due to the availability of large-scale datasets and scalable algorithms that can extract insights from data. For example, deep learning has significantly improved our ability to perform vehicle and pedestrian detection and tracking. In this seminar, I will broadly survey applications of ML within ITS and provide details on projects within the UF Transportation Institute that are using ML. The talk will also include a discussion on the sensors used for data collection and real-time decision making.