| Thursday, March 18 |
| 8:15 AM–9:15 AM |
There is a lot of talk about the so-called nosql movement. The big idea is not that sql is bad, but that we building a one-size-fits-all datastore is not only extremely hard, but not really possible. There is nothing wrong with sql in the cloud if it is the right tool for the problem. There are single node alternatives to sql databases that also make sense for some use cases. With the era of distributed systems and big data, there are many other solutions targeted at particular big data problems. In this session we explore data stores designed to be distributed from the ground up make sense vis-a-vis scaling out existing single node stores, and explore the tradeoffs between different implementations, architectures and use cases. We'll also look at distributed file systems and blob stores as a fundamental service of cloud infrastructure and how far you can get with those, patterns for leveraging them, and the economics of distributed stores like s3 vs. stores that require you to run a persistent cluster. Speaker - Bradford Stephens, Founder, Drawn to Scale Bradford Stephens likes to do big things and kick ass.He is co-founder of Drawn to Scale, who built the first easy, scalable data platform. Their platform is much more than a database: you can process, store, serve, search, and query *all* your data. To *all* your users.He has a love for public speaking, Hadoop, cloud computing, HBase, Lucene, graph theory, Iron Maiden, and using the right tools for problems. Bradford was formerly the Lead Engineer of Data Platforms at Visible Technologies, a social media analytics company. When not writing software or talking to customers, he’s usually playing guitar and drinking wine. He also hosts the popular software blog, Road to Failure (roadtofailure.com).Bradford has spoken at events such as OSCON, Hadoop World, LinkedIn TechTalks, ApacheCon, and many more. You can catch him speaking at CloudConnect, Interop, and GlueCon later this year.He can be contacted at bradfordstephens@gmail.com, and is always happy to give advice or consulting (if you’re really interesting). Speaker - Florian Leibert, Software Engineer, Research, Twitter
|
| 9:30 AM–10:30 AM |
There are a very broad range of needs for processing big data. These range from simple needs like calculations for log analysis that just need to occur at scale, to middle of the road needs like BI, to complex needs like scalable modern machine learning and retrieval systems. As with data stores, there are a broad range of tools to service specific needs. Again, we see the pattern for the cloud is moving away from one-size-fits-all stacks, and toward building for your needs. That said, there are very generic abstractions like mapreduce that work well for a lot of use cases. We'll look at how people use tools like hadoop for things like log analysis, to machine learning, as well as other tools like distributed indexing systems like lucene + katta. We'll also talk about why these systems are so hard to get right and do the world a favor by explaining that people should not attempt to create these systems on their own unless they really know what they're doing. We may also touch on some of the pain of dealing with these systems, configuring and deploying them, etc. Speaker - Nathan Marz, Lead Engineer, BackType Nathan has been working extensively with Hadoop and related technologies such as Cascading since 2008. He is the Lead Engineer at BackType where he is building technology for real-time search and discovery. Previously, Nathan was an engineer at Rapleaf where he led the development of a scalable architecture for Rapleaf's people search engine. He maintains a blog at http://nathanmarz.com/blog. Speaker - Chris Wensel, CTO and Founder, Concurrent, Inc. Chris K Wensel has been a Software and Systems Architect for over 15 years. He is the founder of Concurrent, Inc., and the author of the Cascading data processing and workflow application. Over the last 5 years he has installed large and sophisticated Cascading, Hadoop, and Nutch applications for use by companies providing web content, behavior ad-targeting, and financial data services in both the traditional enterprise data-center and on Amazon Web Services.
|
| 10:45 AM–11:45 AM |
Now that we can store so much data so cheaply, it becomes very attractive to do cool stuff with it. In fact, the cool stuff you want to do with your data should be a big part of what drives your storage and processing choices. As with processing we'll touch on the simple, mid, and high ends of the spectrum and what you should be looking at for your different needs from basic analysis and BI, to machine learning, recommendation engines, or retrieval systems. Speaker - Michael Driscoll, Founder, Dataspora Speaker - Ted Dunning, CTO, Deepdyve Dr. Dunning joins DeepDyve from Veoh, an Internet-based television service. Prior to Veoh, he was Chief Scientist at MusicMatch, (now Yahoo Music) where he architected the company's renowned music management and recommendation system. Prior to MusicMatch he served as chief scientist at ID Analytics, a leading identity fraud detection company, and at Aptex, an HNC/FairIsaac company, where he researched methods for pattern discovery and analyzed symbolic sequences in language, genetic sequences, web-browsing behavior, musical preferences, purchasing behavior and financial transactions. Dr. Dunning also performed academic research at the Computing Research Laboratory at New Mexico State University, investigating computational linguistics and information retrieval. He earned a BS degree in electrical engineering from the University of Colorado; a MS degree in computer science from New Mexico State University; and a Ph.D. in computing science from Sheffield University in the United Kingdom.
|
|