Monday, June 3, 2019

‘Big’ Data Science and Scientists

loose info Science and ScientistsIf you could possibly take a start back in time with a time machine and say to people that to sidereal day a child tramp interact with one some other from anywhere and query trillions of entropy all over the globe with a simple click on his/her computer they would pretend said that it is information fabrication Today more than 2.9 million emails are sent across the internet every second. 375 megabytes of selective information is consumed by households each day. Google processes 24 petabyte of info per day. direct thats a lot of entropy With each click, like and share, the worlds info pool is expanding faster than we comprehend. entropy is being created every minute of every day without us even noticing it. Businesses today are paying attention to scores of selective information sources to make crucial decisions about the future. The rise of digital and mobile conversation has made the world become more connected, networked and trace qualified which has typicly resulted in the availability of such large crustal plate selective information sets.So what is this buzz contrive Big Data all about ? Big data may be defined as data sets whose size is beyond the ability of typical database software tools to capture, create, manage and process data. The definition lavatory differ by sector, depending on what kinds of software tools are roughhewnly available and what sizes of data sets are common in a particular industry.The explosion in digital data, bandwidth and processing exponent blendd with new tools for analyzing the data has sparked massive interest in the acclivitous field of data science. Big data has now reached every sector in the global economy. Big data has become an integral part of settlement the worlds problems. It allows companies to know more about their customers, products and on their own infrastructure. More recently, people founder become extensively focused on the monetization of that da ta.According to a McKinsey Global Institute Report1 in 2011, simply making thumping data more easily accessible to relevant stakeholders in a timely bearing can create enormous note value. For example, in the public sector, making relevant data more easily accessible across otherwise separated departments can sharply cut search and processing time. Big data also allows schemes to create highly specific segmentations and to tailor products and services precisely to meet those deficiencys. This tone-beginning is widely known in marketing and risk management but can be revolutionary elsewhere.Big Data is improving transportation and power consumption in cities, making our favorite websites social networks more efficient, and even preventing suicides. Businesses are collecting more data than they know what to do with. Big data is everywhere the volume of data produced, saved and mined is startling. Today, companies use data collection and analysis to formulate more cogent busines s strategies. Manufactures use data obtained from the use of real products to improve and develop new products and to create innovative after-sale service offerings. This bequeath continue to be an emerging area for all industries. Data has become a competitive advantage and necessary part of product development.Companies succeed in the big data era not simply because they have more or better data, but because they have good teams that set clear objectives and define what success looks like by asking the right questions. Big data are also creating new growth opportunities and entirely new categories of companies, such as those that collect and analyze industrial data.One of the most awful areas, where the concept of Big data is taking place is the area of machine learning. Machine Learning can be defined as the subscribe to of computer algorithms that improve automatically through experience. Machine learning is a branch of artificial intelligence which itself is a branch of comp uter science. Applications range from data exploit programs that discover general rules in large data sets, to information filtering systems that learns automatically the users interests. Rising alongside the relatively new technology of big data is the new job title data scientist. An article by Thomas H. Davenport and D.J. Patil in Harvard Business Review2 describes Data Scientist as the Sexiest Job of the 21st century. You have to buy the logic that what makes a career sexy is when demand for your skills exceeds supply, allowing you to command a sizable paycheck and options. The Harvard Business Review actually compares these data scientists to the quants of 1980s and 1990s on Wall Street, who pioneered financial design and algorithmic trading. The need for data experts is growing and demand is on track to hit new levels in the next five years Who are Data Scientists ?Data scientists are people who know how to ask the right questions to get the most value out of massive volum es of data. In other words, data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.Good data scientists depart not just address business problems they will choose the right problems that have the most value to the organization. They combine the analytical capabilities of a scientist or an engineer with the business acumen of the enterprise executive.Data scientists have changed and keep changing the way things work. They integrate big data technology into both IT departments and business functions. Data scientists must also understand the business applications of big data and how it will affect the business organization and be able to communicate with IT and business management. The best data scientists are loose speaking the language of business and helping companies reformulate their challenges.Data science due to its interdisciplinary nature requires an intersection of abilities of hacking skills , math and statistics knowledge and square expertise in the field of science. Hacking skills are necessary for working with massive keep down of electronic data that must be acquired, cleaned and manipulated. Math and statistics knowledge allows a data scientist to choose appropriate methods and tools in order to extract insight from data. material expertise in a scientific field is crucial for generating motivating questions and hypotheses to interpret results. Traditional research lies at the intersection of knowledge of math and statistics with substantive expertise in a scientific field. Machine learning stems from combining hacking skills with math and statistics knowledge, but does not require scientific motivation. Science is about denudation and raising knowledge, which requires some motivating questions about the world and hypotheses that can be brought to data and tested with statistical methods. Hacking skills combined with substantive scientific expertise without bl otto methods can beget incorrect analysis.A good scientist can understand the current state of a field, pick challenging questions were a success will actually lead to useful new knowledge and push that field further through their work.How to become a Data Scientist ?No university programs in India have yet been designed to develop data scientists, so recruiting them requires creativity. You cannot become a big data scientist overnight. Data Scientist need to know how to code and should be comfortable with mathematics and statistics. Data Scientist need know machine learning software engineering. Learning data science can be really hard. They also need to know how to organize large data sets and use visualization tools and techniques.Data scientists need to know how to code either in SAS, SPSS, Python or R. statistical piece of land for the Social Sciences (SPSS) is a software package currently developed by IBM is widely used program for statistical analysis in social science. Sta tistical Analysis System (SAS) software suite developed by SAS Institute is mainly used in advanced analytics. SAS is the largest market-share holder for advanced analytics. Python is a upper-level programming language, which is the most commonly used by data scientists community. Finally, R is a free software programming language for statistical figure and graphics. R language has become a de facto standard among statisticians for developing statistical software and is widely used for statistical software development and data analysis. R is a part of the GNU Project which is a collaboration that supports open source projects.A few online courses would help you learn some of the main steganography languages. One such course that is available currently is through the popular MOOCs website coursera.org. A specialization course offered by Johns Hopkins University through coursera helps you learn R programming, contrive data, machine learning and to develop data products. There are few more courses available through coursera that helps you to learn data science. Udacity is another popular MOOCs website that offers courses on Data Science, Machine Learning Statistics. CodeAcademy also offers similar courses to learn data science and coding in Python.When you start operating with data at the scale of the web, the fundamental approach and process of analysis must and will change. Most data scientists are working on problems that cant be run on a single machine. They have large data sets that require distributed processing. Hadoop is an open-source software framework for storing and large-scale processing of data-sets on clusters of commodity hardware. MapReduce is this programming paradigm that allows for massive scalability across the servers in a Hadoop cluster. Apache Spark is Hadoops speedy Swiss Army knife. It is a fast -running data analysis system that provides real-time data processing functions to Hadoop. It is important that a data scientist must be a ble to work with unstructured data, whether it is from social media, videos or even audio.KDnuggets is a popular website among data scientist that mainly focuses on latest updates and news in the field of Business Analytics, Data Mining, and Data Science. KDnuggets also offers a free Data Mining cast the teaching modules for a one-semester introductory course on Data Mining, suitable for advanced undergraduates or first-year graduate students.Kaggle is a platform for data vaticination competitions. It is a platform for predictive modeling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. Kaggle hosts more data science competitions where you can practice, test your skills with complex, real world data and tackle actual business problems. Many employers do take Kaggle rankings seriously, as they can be seen as pertinent, hands-on project work. Kaggle aims at ma king data science a sport.Finally to be a data scientist youll need a good understanding of the industry youre working in and know what business problems your company is trying to solve. In terms of data science, being able to find out which problems are crucial to solve for the business is critical, in addition to identifying new ways should the business should be leveraging its data.A study by Burtch whole kit and boodle3 in April 2014, finds that data scientists earn a medial salary that can be up to 40% higher than other Big Data professionals at the same job level. Data scientists have a median of nine years of experience, compared to other Big Data professionals who have a median of 11 years. More than one-third of data scientists are currently in the first five years of their careers. The gaming and technology industries pay higher salaries to data scientists than all other industries.LinkedIn, a popular business oriented social networking website voted statistical analysis and data mining the top skill that got people chartered in the year 2014. Data science has a bright future ahead there will only be more data and more of a need for people who can find meaning and value in that data. Despite the growing opportunity, demand for data scientist has outpaced supply of talent and will for the next five years.1 McKinsey Global Institute, Big data The next frontier for innovation, competition, and productivity, June 20112 Thomas H. Davenport, D.J. Patil, Data Scientist The Sexiest Job of the 21st Century, Harvard Business Review, October 20123 Burtch Works Big Data Career Tips http//www.burtchworks.com/big-data-analyst-salary/big-data-career-tips/, accessed December 2014

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.