`
houzhaowei
  • 浏览: 498289 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

How to Think Like a Data Scientist

 
阅读更多

How to Think Like a Data Scientist


A lot of the answers here focus on the learning the skills and techniques of a data scientist. To complement those, I wrote this post to talk about how non data scientists can start thinking like data scientists.

This answer also includes seven challenges that help you develop your ability tothink like a data scientist and develop the right attitude to become one.

(1) Satiate your curiosity through data


As a data scientist you write your own questions and answers. Data scientists are naturally curious about the data that they're looking at, and are creative with ways to approach and solve whatever problem needs to be solved.

Much of data science is not the analysis itself, but discovering an interesting question and figuring out how to answer it.

Here are three great examples:


Challenge: Think of a problem or topic you're interested in and answer it with data!

 

(2) Read news with a skeptical eye


Much of the contribution of a data scientist (and why it's really hard to replace a data scientist with a machine), is that a data scientist will tell you what's important and what's spurious. This persistent skepticism is healthy in all sciences, and is especially necessarily in a fast-paced environment where it's too easy to let a spurious result be misinterpreted.

You can adopt this mindset yourself by reading news with a critical eye. Many news articles have inherently flawed main premises. Try these two articles. Sample answers are available in the comments.

Easier: You Love Your iPhone. Literally.
Harder: Who predicted Russia’s military intervention?

Challenge: Do this every day when you encounter a news article. Comment on the article and point out the flaws.

 

(3) See data as a tool to improve consumer products

 

Visit a consumer internet product (probably that you know doesn't do extensive A/B testing already), and then think about their main funnel. Do they have a checkout funnel? Do they have a signup funnel? Do they have a virility mechanism? Do they have an engagement funnel?

Go through the funnel multiple times and hypothesize about different ways it could do better to increase a core metric (conversion rate, shares, signups, etc.). Design an experiment to verify if your suggested change can actually change the core metric.

Challenge: Share it with the feedback email for the consumer internet site!

(4) Think like a Bayesian


To think like a Bayesian, avoid the Base rate fallacy. This means to form new beliefs you must incorporate both newly observed information AND prior information formed through intuition and experience.

Checking your dashboard, user engagement numbers are significantly down today. Which of the following is most likely?

1. Users are suddenly less engaged
2. Feature of site broke
3. Logging feature broke

Even though explanation #1 completely explains the drop, #2 and #3 should be more likely because they have a much higher prior probability.

You're in senior management at Tesla, and five of Tesla's Model S's have caught fire in the last five months. Which is more likely?

1. Manufacturing quality has decreased and Teslas should now be deemed unsafe.
2. Safety has not changed and fires in Tesla Model S's are still much rarer than their counterparts in gasoline cars.

While #1 is an easy explanation (and great for media coverage), your prior should be strong on #2 because of your regular quality testing. However, you should still be seeking information that can update your beliefs on #1 versus #2 (and still findways to improve safety). Question for thought: what information should you seek?

Challenge: Identify the last time you committed the Base rate fallacy. Avoid committing the fallacy from now on.

(5) Know the limitations of your tools

 

“Knowledge is knowing that a tomato is a fruit, wisdom is not putting it in a fruit salad.” - Miles Kington


Knowledge is knowing how to perform a ordinary linear regression, wisdom is realizing how rare it applies cleanly in practice.

Knowledge is knowing five different variations of K-means clustering, wisdom is realizing how rarely actual data can be cleanly clustered, and how poorly K-means clustering can work with too many features.

Knowledge is knowing a vast range of sophisticated techniques, but wisdom is being able to choose the one that will provide the most amount of impact for the company in a reasonable amount of time.

You may develop a vast range of tools while you go through your Coursera or EdX courses, but your toolbox is not useful until you know which tools to use.

Challenge: Apply several tools to a real dataset and discover the tradeoffs and limitations of each tools. Which tools worked best, and can you figure out why?

(6) Teach a complicated concept


How does Richard Feynman distinguish which concepts he understands and which concepts he doesn't?

Feynman was a truly great teacher. He prided himself on being able to devise ways to explain even the most profound ideas to beginning students. Once, I said to him, "Dick, explain to me, so that I can understand it, why spin one-half particles obey Fermi-Dirac statistics." Sizing up his audience perfectly, Feynman said, "I'll prepare a freshman lecture on it." But he came back a few days later to say, "I couldn't do it. I couldn't reduce it to the freshman level. That means we don't really understand it." - David L. Goodstein, Feynman's Lost Lecture: The Motion of Planets Around the Sun


What distinguished Richard Feynman was his ability to distill complex concepts into comprehendible ideas. Similarly, what distinguishes top data scientists is their ability to cogently share their ideas and explain their analyses.

Check out Edwin Chen's answers to these questions for examples of cogently-explained technical concepts:


Challenge: Teach a technical concept to a friend or on a public forum, like Quora or YouTube.

(7) Convince others about what's important


Perhaps even more important than a data scientist's ability to explain their analysis is their ability to communicate the value and potential impact of the actionable insights.

Certain tasks of data science will be commoditized as data science tools become better and better. New tools will make obsolete certain tasks such as writing dashboards, unnecessary data wrangling, and even specific kinds of predictive modeling.

However, the need for a data scientist to extract out and communicate what's important will never be made obsolete. With increasing amounts of data and potential insights, companies will always need data scientists (or people in data science-like roles), to triage all that can be done and prioritize tasks based on impact.

The data scientist's role in the company is the serve as the ambassador between the data and the company. The success of a data scientist is measured by how well he/she can tell a story and make an impact. Every other skill is amplified by this ability.

Challenge: Tell a story with statistics. Communicate the important findings in a dataset. Make a convincing presentation that your audience cares about.

分享到:
评论

相关推荐

    Think Perl 6: How to Think Like a Computer Scientist

    Think Perl 6: How to Think Like a Computer Scientist by Laurent Rosenfeld English | 8 May 2017 | ASIN: B0716P9W11 | 466 Pages | AZW3 | 1.02 MB Want to learn how to program and think like a computer ...

    Practical D3.js(Apress,2016)

    Practical D3.js does not just show you how to use D3.js, it teaches you how to think like a data scientist and work with the data in the real world. In Part One, you will learn about theories behind ...

    Practical D3.js [2016]

    Practical D3.js does not just show you how to use D3.js, it teaches you how to think like a data scientist and work with the data in the real world. In Part One, you will learn about theories behind ...

    Python for Informatics.pdf

    - **December 2009**: Major revision to chapters 2-10 from "Think Python: How to Think Like a Computer Scientist", and new chapters (1 and 11-15) were added to create "Python for Informatics". ...

    Big.Data.MBA.Driving.Business.Strategies.with.Data.Science.11191811

    You'll learn to “think like a data scientist” as you build upon the decisions your business is trying to make, the hypotheses you need to test, and the predictions you need to produce. Business ...

    Python for Information

    - **December 2009**: Significant changes were made to chapters 2-10 based on "Think Python: How to Think Like a Computer Scientist," and new chapters (1 and 11-15) were added to produce "Python for ...

    最好Python合集

    Deep Learning with Python、Hands On Machine Learning with Scikit Learn and TensorFlow、Learning Python (中英都有)、Python ...How to Think Like a Computer Scientist、一共10本书(有一本中英文都有)一本1分

    python原文教程

    - **Major Revision (June 2008)**: Title changed to "Think Python: How to Think Like a Computer Scientist." - **Major Revision (December 2009)**: Chapters 2-10 were extensively revised, and new ...

    python书籍

    A Guide to Graph Colouring Applied Deep Learning Applied Natural Language Processing with Python Building Chatbots with Python ...Think Like a Data Scientist Website Scraping with Python

    适用于所有人的Python:探索Python 3中的数据Python for Everybody: Exploring Data in Python 3

    这本教材借鉴了《Think Python: How to Think Like a Computer Scientist》的部分内容,但作者对全书的结构进行了重新调整,以便学生能尽快地投入到数据分析问题中,并从一开始就有连续的案例和数据分析练习。...

    Python 3.7 is a programming language

    Books such as How to Think Like a Computer Scientist, Python Programming: An Introduction to Computer Science, and Practical Programming. The Education Special Interest Group is a good place to ...

    Python for Informatics

    Downey’s “Think Python: How to Think Like a Computer Scientist,” which underwent significant revisions and additions by Charles Severance. In December 2009, Severance started to adapt the content ...

    Python for Everyone: Exploring Data Using Python 3

    Downey, Jeff Elkner等人撰写的《Think Python: How to Think Like a Computer Scientist》的“remix”。Charles R. Severance教授在密歇根大学教授SI502-网络编程课程时,发现并没有找到一个既适合初学者又侧重于...

Global site tag (gtag.js) - Google Analytics