© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:02 So welcome to my talk on data systems. So my name is

00:09 So the research in my group involves large data sets, large files,

00:15 documents, tables, et cetera that continuously growing right and become very

00:24 Right? And that's what we call days. Big data which stands for

00:31 velocity and variety. So how do do it? We use fast external

00:37 . We use efficient structures that we that work at two storage levels prime

00:43 disk. In general. We do in parallel. We develop parallel algorithms

00:48 can work on a multi node cluster distributed storage as major goals. We

00:58 algorithms that have linear time complexity and linear speed up uh some analytics we

01:06 in the group include machine learning graphs are exploration that I will explain in

01:12 little bit more detail. So in , my approach to conducting research is

01:19 to apply CS theory to develop good . So data science involves combining theory

01:29 programming again from a theory perspective we time complexity of algorithms. We perform

01:36 o analysis as well as I a analysis. We extend and study many

01:44 algorithms in computer science. Uh We mathematical tools from linear algebra and as

01:54 analytic applications. We have machine learning and graphs. Uh The programming uh

02:01 done mainly in C plus plus C python. But they are combined with

02:08 written in our SQL scala and javascript on the application. Most of my

02:16 is conducted on UNIX machines. That's we develop the code. Right?

02:20 we also uh do some development and on windows from the system side we

02:28 multithreaded programming. We perform my own and binary files. We exploit parallel

02:34 systems. We are careful about memory using main memory in a wise

02:40 We generate code, we optimize et cetera. So we have a

02:45 of fun. This is an example problems we solve in my group,

02:53 upper left we have a cube in we explore uh multidimensional dataset with

03:02 Trying to find important trends on the right. We show an interesting summarization

03:10 works for many machine learning models including regression, basically classification principle, component

03:19 analysis and K means clustering on the left. We have some of my

03:26 advanced research in machine learning Multivariate statistical where we show a sample of several

03:33 models that today represent. One of best approach is competing with neural networks

03:40 develop predictive models. On the lower hand we have graphs, right.

03:46 also work a lot on graphs and those problems include reach ability measuring

03:54 detecting clicks, right? And the is showing a rich ability problem in

03:59 graph with nine versus is going from to 8. So why should you

04:05 my group? The research we conduct presents a balance between theory and

04:13 We are proud to say that we how much learning algorithm works by step

04:19 step instead of just calling them. , we re learn classical algorithms that

04:25 saw previously in computer science, but see them working on truly large

04:30 Uh, We build open source open source tools, right? And

04:35 will be part of that. Our , our programs have many applications going

04:40 data science, big data to databases even images. From a job

04:46 I mean, the outlook is right? And any company develop developing

04:52 software may be interested in you when finish your peers day.

-
+