"Big data"...
 

  You don't need to be an 'investor' to invest in Singletrack: 6 days left: 95% of target - Find out more

[Closed] "Big data" - recommendations for a beginners training course?

10 Posts
10 Users
0 Reactions
53 Views
Posts: 14410
Free Member
Topic starter
 

In my organisation a lot of people are talking about "big data", "data lakes", Hadoop etc. etc. but the reality is that they are mostly repeating buzzwords that they've heard elsewhere. With that in mind, I'd like to do a beginners training course in my own time to learn for myself what it's all about, in order to join the conversation with some knowledge.
Can anyone recommend a decent course that they've done? I know there are loads out there e.g. Udemy , Coursera etc. but a good recommendation would help reduce the amount of courses to choose from.
thanks!


 
Posted : 29/11/2017 2:22 pm
Posts: 0
Free Member
 

I did one on Coursera which was pretty good


 
Posted : 29/11/2017 2:31 pm
Posts: 91000
Free Member
 

Problem is it's not one thing, one technology, there are loads of tools for dealing with data that is 'big'.

Basically, if you have stupidly large amounts of the same kind of data, then you could use it. If you don't then it's meaningless.

Something to bear in mind when looking.


 
Posted : 29/11/2017 2:33 pm
Posts: 145
Free Member
 

Guess it depends on what angle you want to come at it from. Do you want to host the data, administrate the data, or analyse the data? For me it was the latter, so I just learned to use SAS and SQL.

When Big data came along, I was working in a company that had the biggest customer db in Europe, so we were like. whats new, we already had big data tables, stored it in non related tables to analyse. I think all that really changed was where the queries were processing?


 
Posted : 29/11/2017 2:35 pm
Posts: 0
Free Member
 

Check out the Microsoft virtual academy first, its free and will give an insight into the subject before committing to anything financially


 
Posted : 29/11/2017 2:43 pm
Posts: 14410
Free Member
Topic starter
 

@trailwagger - thanks, I've never heard of that

@djglover - my angle is a dabbler who does bits and pieces for all sorts of tasks but I'm no expert at anything. I'd like to understand the overview so that I can discuss it with a little confidence and I suppose eventually the querying of the data (beyond my current basic SQL skills). I'm not a DBA or infrastructure guy so that side doesn't appeal to me.

@atlaz - was it [url= https://www.coursera.org/learn/big-data-introduction ]this one[/url]?


 
Posted : 29/11/2017 3:46 pm
Posts: 11
Free Member
 

I've used MVA as above, EDX also have good Microsoft specific training.

IMO big data is a terrible term. As already mentioned there are relational databases out there that scale to ten's of TB, if not larger and do not fall into the 'big data' category.

When I get involved on a new project and they start talking about Big Data I fall back onto looking at the 3 V's; Volume, Velocity, Variety

Even then sometimes the problem is still best solved in a relational database like SQL Server or PostgreSQL which both now provide support for designs beyond what most people think relational databases can do. If there really is a lot of data coming in e.g. sensors on industrial machines sometimes the simplest approach is to dump that to a HDFS layer but still use a SQL database to host the data that end users will actually query.

Fascinating subject and is keeping me busy...

I sit on the infrastructure/admin side of the fence so don't bother me with questions about your mapreduce, R or Python jobs 😉


 
Posted : 29/11/2017 4:08 pm
Posts: 0
Free Member
 

[url= http://eecs.wsu.edu/~yinghui/mat/courses/fall%202015/resources/Big%20data%20for%20dummies.pdf ]Dummies Guide[/url] for general knowledge!


 
Posted : 29/11/2017 4:13 pm
 nerd
Posts: 433
Free Member
 

I work in Big Data and the only thing we keep in a 'database' is metadata (data about the data, like EXIF on camera files).
For that noSQL databases are useful - we use Elastic Search.
The actual data is too big to store in a database (currently 30PB and growing daily!), and is not relational. It's mostly arrays of floating point numbers in HDF format, or image files in GeoTIFF or JPEG2000, or even csv text files.
I'd learn about metadata. That's the useful stuff.


 
Posted : 29/11/2017 4:27 pm
Posts: 0
Free Member
 

I sit on the infrastructure/admin side of the fence so don't bother me with questions about your mapreduce, R or Python jobs

I sit on the other side of the fence - as I type this I'm currently developing a data lake....

As mentioned above -ensure that what you actually have is data that warrants a big data solution. It's one of the IT buzz words at the moment that have been sold to many big corps that they absolutely need it, when in fact what they need is decent relational architecture. Heads of IT and operations then grab hold of it as the next 'must have' and they then haemorrhage cash on something that isn't fit for their needs. And whilst big data infrastructure can be relatively cheap, the skills to develop on it cost a fortune.

It's worth learning about just so that you can prove to them that they don't need it. Pluralsight isn't a bad place to start and has plenty of other topics to keep you interested once the BD fad has died out...


 
Posted : 29/11/2017 4:32 pm
Posts: 7656
Full Member
 

Pluralsight isn't a bad place to start

Sign up for microsoft dev essentials and you get several months free access. Plus sql server and other dev tools for personal/small business use.


 
Posted : 29/11/2017 4:42 pm

6 DAYS LEFT
We are currently at 95% of our target!