Click me

Data Science: More Than Mining

DATA SCIENCE Wikibon MORE THAN MINING "The sexiest job in the next 10 years will be statisticians." - Hal Varian, Chief economist, Google While the concept of data science has been around for decades, the notion of a data scientist has become a sought-after and in-demand career leading to a rise of a new generation of data scientists. The phenomenon in technology development significantly exposes the staggering growth rates of "big data." Technology innovation and the World Wide Web provide for the growth of new types of data – such as user-generated content – and tools that can be used to interpret it. Social media platforms such as Facebook (the largest social network and valued at $52 billion) depend on data science to create innovative, interactive features that encourage users to get interested and stay that way – all so that we know it's important. But what does the term 'Data Science' really mean? What is data science? Data science can be broken down into four essential parts. Mining data Statistics Information analysis Collecting and formatting the information Interpret Leverage CLOUD COMPUTING TODAYS LANDSCAPE PRATEENTERISE CLOUD Chp Representation or visualization in the form of presentations, infographics, graphs or charts Implications of the data, application of the data, interaction using the data and predictions formed from studying it Defining a data scientist A good data scientist understands the importance of: Scouring ion Their eyes search for information on the web 1 Vectorized operations I Algorithmic strategizing I APIS Their voice asks questions about what they hope to accomplish at the end of the project, setting information goals. E g-l.g=e 3-14 n-l U Mis fu CE) = 2*tn-de=1t (u-1)! 1Ehe-1t Gim n! 0=(73) Civ j=1 lim st 39. u-too Zaj biy. Insa-hen)-log 4 limP Seitxd Fco = 9sci) Coge Ph fgCui)= 01 00 01 010 001 0101001 01 11001 11 01011 101 01 1010 01 1001110 0 1 01011010 11 10 1111011110 111 10 11110 11l10111 10 10 1101011010 110 10 1101011010 110 10 01 00001 00001000010000 100001000 01 01 0100 101001010010100101001010 0 1 010100101001010 01010010100101001 010100101001010010100101001010 0 1 01 1100 1 1100 1 11001 11001 100 1110 01 1101011010110101101011:010 1010 11 10 11110 11110 11110 1111 0 11,10 111 10 10 11010 11010 110 10 1101011010 110 10 10 0 1111 Extraction 10 UI1U1 Expansion & Takes information they want and 10 0 1111 Application organizing it using formulas. They 01( organize the information in order to form educated, insightful conclusions 1 000 C The appropriate data flows out of the person in the form 10: using statistical and these mathematical methods: 0 1101 of keywords, Facebook "Likes" ) and other statistics. 1 001C I Factor Analysis I Regression Analysis 1 Correlation 1(- Time Series Analysis 0 10100 10100 1010 0 10110 10110 101 1 1 001010010 1 001 0 0111U U 1lIUUIII U0 11100 11100 111 0 1 00111 00111 001 11 00111 0011 1 001 1 0 110 0 11 0 110 0 11 0 110 0 11 0 110 0 11 0 110 0 11 0 110 0 11 0. Creating new theories and predictions based upon the data Ask questions to further expound upon the data beyond the reaches of hard numbers or facts. pile-up and missed opportunities. For example, statistics regarding holiday shopping trends are imperative around the holiday season. If the statistics are processed and the conclusions are drawn too late, the season has passed and the information can no longer be utilized to its full potential. Apply the information in a useful, innovative manner to applications whose success depends on data science. Immediately process terabytes of data that flow in to prevent Required skills for a data scientist A successful data scientist must have a combination of skills that opens up possibilities both for that individual and their team. Visualization processes are often disjointed since each person is typically assigned to a specific part of the project. The designer depends on the information architect. The information architect depends on stats from the statistician, and so on. A true data scientist should be skilled in multiple areas. Hacking and Computer Science Expertise in Mathematics, Statistics, Data Mining Creativity & Insight Knowing how to take advantage of computers and the internet to create Pulling important statistics and Knowing what statistics are coherently organizing them using mathematic prowess and computer formulas important and how to leverage them data-mining formulas Dangers of data science Statistics can be displayed in a misleading manner Leading the pollee: What type of question are you more likely to answer "yes" to? 85% 70% No Yes Should taxes support the government's aid to those who are unable to find work? Should Americans be taxed so others can take advantage of welfare and avoid working? Facts that are left out Including only the starting and ending points of data makes the change seem more drastic. A collage of carefully selected information combined to induce a certain opinion Selection bias occurs when an unrepresentative population has been taken for a survey or study and then the results are advertised to the public consumers as if it represented the total population. An example is a toothpaste brand that shows the user how 'studies' can often be ПП п weighted in a company's favor. Ironically, facts and stats can be used to paint a very inaccurate - and damaging- picture of a business, organization or general topic. Facts about data science 1790 The first big data collection project in history was by the U.S. Census, which started in 1790. 5MB When hard drives were first invented, a 5 megabyte server took up roughly the space of a luxury refrigerator. Today, a 32 gigabyte micro-SD card measures around 5/8 x 3/g inch and weighs about 0.5 grams. 32GB When collecting mass quantities of data, some human remedial input is needed, this gave birth to crowd sourcing. The best example is Amazon's mechanical turk. Modern collecting of big data is possible with cloud computing. or the spreading of the data across several physical resources that can be accessed remotely, rather than concentrated at one location. "The computing and processing of data is literally 100 to 1,000 times faster and cheaper than before." - Scott Yara, Greenplum INFO GRAPHIC WORLD Wikibon =j0 9 of 10

Data Science: More Than Mining

shared by infographicworld on Aug 12
While the concept of data science has been around for decades, the notion of a data scientist has become an in-demand career leading to a rise of a new generation of data scientists. Proliferation of ...


Unknown. Add a source


Did you work on this visual? Claim credit!

Get a Quote

Embed Code

For hosted site:

Click the code to copy


Click the code to copy
Customize size