Practical use of machine learning and data science practices in the big-data age of microscopy
The last decade has seen a huge increase in the availability and size of large datasets coming from synchrotrons, user facilities and even benchtop instruments (in addition to every-growing numbers of papers). Handling this large increase in data volumes, and maximizing the knowledge from the ever-growing datasets is a problem that requires considerable application of data-science practices.
In this talk, I will outline how we acquire and analyze large multidimensional datasets at the Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, with a view towards extracting dynamics unavailable in standard methods, and to maximize information gain and knowledge extraction. I will first discuss how acquisition speeds can be increased by orders of magnitude through our General-mode (G-mode) microscopy platform, allowing acquisition of ferroelectric hysteresis loops at 3500x faster than current state of the art, as well as acquiring I-V curves hundreds of times faster than the standard I-V sweeps in atomic force microscopy and scanning tunneling microscopy. These advances are facilitated by the use of data-driven filtering, machine learning and Bayesian methods, for instance in reconstructing the current to nullify capacitance contributions in the case of rapid I-V acquisition.
At the same time, much of the data that can aid in designing better materials is scattered throughout the extant but is scattered, not collated and not indexed. In the second half of my talk, I will discuss a method to employ text mining through the form of regular expressions in combination with crowd-sourcing, to yield a database of growth condition-functional property information for select oxides grown via pulsed laser deposition. Open source tools facilitate the text mining, automatically annotating the relevant information, while user-led efforts in the form of crowd-sourcing sift through annotated data as well as figures, to compile database entries in an efficient manner. We have produced a database with hundreds of entries, which show growth windows, trends and outliers, and which can serve as a template for analyzing the distribution of growth conditions and provide starting points for related compounds. Moreover, the database provides a community-wide resource that is both dynamic and searchable, and can be mined in the same method as first-principles repositories. Such tools will comprise an integral part of the materials design schema in the coming decade.
This research was sponsored by the Division of Materials Sciences and Engineering, BES, DOE. This research was conducted at the Center for Nanophase Materials Sciences, which is a US DOE Office of Science User Facility.
Dr. Rama K. Vasudevan is a Research and Development Associate at the Center for Nanophase Materials Sciences (CNMS) at Oak Ridge National Laboratory. He completed his dissertation at the University of New South Wales on scanning probe microscopy of ferroelectric materials under the supervision of Prof. V. Nagarajan. His thesis was awarded the best submitted thesis that year in the Faculty of Science at UNSW. He subsequently joined the CNMS as a postdoctoral associate, supervised by Prof. Sergei Kalinin, and conducted research into relaxor ferroelectrics and in-situ atomic-scale electrochemistry experiments by scanning tunneling microscopy, for which he was awarded the ORNL Postdoctoral Researcher Award in 2015. In 2016, Rama became a staff member at CNMS in the scanning probe microscopy group, with current research interests in utilizing machine learning and literature mining for extracting physics from atomic-scale images and understanding epitaxial growth processes.