Analysis of trends in Predictions for Data professions
Raymond Okolie-Alfred
Introduction
What are the Analysis of trends in Predictions for Data professions? Unlock the future of data professions with our in-depth predictions that unveil the dynamic evolution by designation and in units. The demand for skilled professionals is skyrocketing in a world driven by data. From data analysts to machine learning engineers, each designation is poised for explosive growth. Imagine a landscape where data scientists become the architects of innovation, and business intelligence analysts are the navigators of strategic success. The numbers don’t lie. Our forecast, broken down meticulously in units, reveals trends that will shape industries and redefine career trajectories. Whether you’re an aspiring data professional or a seasoned expert, these insights are your compass to navigate the data-driven future. Dive in and discover how the confluence of technology and analytics will revolutionize the job market, offering unparalleled opportunities and challenges. Ready to future-proof your career? Explore our detailed analysis and stay ahead of the curve.
2 some assumptions
2.1 What are some trends of the different designations, units, and salaries over the date under review in the data profession?
2.2 How could these trend be used by the data profession?
2.3 which designation have highest salary and which designation has the lowest salary
2.4 How could these trends help to predict the most lucrative designation in the data profession?
3 We shall be producing the report with the following deliverable
3.1 A clear summary of the business task
3.2 A description of all data sources used
3.3 Documentation of any cleaning or manipulation of data
3.4 A summary of your analysis
3.5 Supporting visualizations and key findings
- Your top high-level content recommendations based on your analysis
-
In order to answer the key business questions, I will follow the steps of the data analysis process of ask, prepare, process, analyze, share, and act as a guide to answering this question.
4.1 Phase: Ask
#What is the problem you are trying to solve?
We are trying fined trends in the designations in data profession.
#How can your insights drive business decisions?.
My insight can drive the business decision if we properly follow the data analytics process and it will allow us to see how data profession is trending over time and this will be used to form our recommendation that is capable of actualizing the data profession trends and adopt the strategy to use data analysis unit that earn more. t
#identify the business task
To focus on the data profession and analyze the differences that its data available so as to gain insight into how the data profession trends is and the insights discovered will then help guide marketing strategy for the different designations.
#Consider key stakeholders
1. The managing director
2. Board of directors
3. The brand manager
4 Marketing analytics team
#The product: All designations data profession
A data profession encompasses roles that involve the collection, analysis, interpretation, and presentation of data to help organizations make informed decisions. Professionals in this field use statistical techniques, programming skills, and domain knowledge to extract meaningful insights from large and complex datasets. Here are some common roles within the data profession. These different designations in the data profession:Such as Analyst, Associate, senior Analyst manager etc. data profession are designed to offer professional data analysis, using data to make insightful decisions that will lead to growth in business
4.2 Phase 2: Prepare
#Where is your data stored?
The dataset is saved in a csv file, in an excel and The data is in avocado prices Public Domain, dataset is made available through Mobius by Keggle
#How is the data organized? Is it in long or wide format?
The data set contains a single CSV file organized in long format.
#Are there issues with bias or credibility in this data? Does your data ROCCC?
Taking the data through the ROCCC,i can say that they are limitations and bias with the data as it only contains a file of about 1.99mb..
#ROCCC analysis
- Reliable — LOW — Not reliable as it only has a single file.
- Original — LOW — Third party provider (Amazon Mechanical Turk)
- Comprehensive — MED — Parameters did not cover most parameters
- Current — LOW — Data is 5 years old and may not be relevant now
- Cited — LOW — Data collected from third party, hence unknown
The dataset is considered bad quality data in all as it does not meet the ROCCC requirement hence it is not recommended to produce business recommendations based on this data.
#How are you addressing licensing, privacy, security, and accessibility?
Since it is public data, there is no issue with licensing, privacy, security, and accessibility of the dataset
#How did you verify the data’s integrity?
Due to the fact that the data is collected in a survey, we are cannot ascertain its integrity or accuracy.
#How does it help you answer your question?
It can help to answer the question because data explore the predictions in the data profession within a certain region and it covers: first name, last name, sex date of joining, current date, designation,Age,salary, unit, and leaves used. leaves remaining, rating and level of experience
#Are there any problems with the data?
This Data is not current as it was collected in 2013 and so the market trends must have changed and the market size either grows or reduced.
4.3 Phase3:Process
##We Process the data by cleaning and ensuring that it is correct, relevant, complete and free of error and outline.
4.3.1 Install packages
#install the needed package
install.packages(“tidyverse”)
install.packages(“lubridate”)
install.packages(“ggplot2”)
install.packages(“skimr”)
install.packages(“janitor”)
install.packages(“readr”)
install.packages(“plyr”)
install.packages(“dplyr”)
##4.3.2 Load the needed Library
library(tidyverse)## Warning: package ‘tidyverse’ was built under R version 4.3.3## Warning: package ‘ggplot2’ was built under R version 4.3.3## Warning: package ‘tidyr’ was built under R version 4.3.3## Warning: package ‘readr’ was built under R version 4.3.3## Warning: package ‘purrr’ was built under R version 4.3.3## Warning: package ‘dplyr’ was built under R version 4.3.3## Warning: package ‘stringr’ was built under R version 4.3.3## Warning: package ‘lubridate’ was built under R version 4.3.3## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──## ✔ dplyr 1.1.4 ✔ readr 2.1.5## ✔ forcats 1.0.0 ✔ stringr 1.5.1## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1## ✔ purrr 1.0.2 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──## ✖ dplyr::filter() masks stats::filter()## ✖ dplyr::lag() masks stats::lag()## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errorslibrary(ggplot2)library(lubridate)library(dplyr)getwd()## [1] “C:/Users/ME/Desktop/Factresort material/factreslrt 4 May/factresort for june”
#4.3.3import/load dataset
Salary_Prediction_of_Data_Professions <- read_csv(“C:/Users/ME/Desktop/Factresort material/factreslrt 4 May/factresort for june/Salary Prediction of Data Professions.csv”)## Rows: 2639 Columns: 13## ── Column specification ────────────────────────────────────────────────────────## Delimiter: “,”## chr (7): FIRST NAME, LAST NAME, SEX, DOJ, CURRENT DATE, DESIGNATION, UNIT## dbl (6): AGE, SALARY, LEAVES USED, LEAVES REMAINING, RATINGS, PAST EXP## ## ℹ Use `spec()` to retrieve the full column specification for this data.## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#4.3.4 creating a dataframe for the dataset
Data_profession <- data.frame(Salary_Prediction_of_Data_Professions)
#step2: WRANGLING OF DATA
##4.3.5Check column names each
## [1] “FIRST.NAME” “LAST.NAME” “SEX” “DOJ” ## [5] “CURRENT.DATE” “DESIGNATION” “AGE” “SALARY” ## [9] “UNIT” “LEAVES.USED” “LEAVES.REMAINING” “RATINGS” ## [13] “PAST.EXP”
#4.3.6 Checking for consistency through str
str(Data_profession)## ‘data.frame’: 2639 obs. of 13 variables:## $ FIRST.NAME : chr “TOMASA” “ANNIE” “OLIVE” “CHERRY” …## $ LAST.NAME : chr “ARMEN” NA “ANCY” “AQUILAR” …## $ SEX : chr “F” “F” “F” “F” …## $ DOJ : chr “5/18/2014” NA “7/28/2014” “4/3/2013” …## $ CURRENT.DATE : chr “1/7/2016” “1/7/2016” “1/7/2016” “1/7/2016” …## $ DESIGNATION : chr “Analyst” “Associate” “Analyst” “Analyst” …## $ AGE : num 21 NA 21 22 NA 22 22 NA 28 22 …## $ SALARY : num 44570 89207 40955 45550 43161 …## $ UNIT : chr “Finance” “Web” “Finance” “IT” …## $ LEAVES.USED : num 24 NA 23 22 27 20 19 29 20 15 …## $ LEAVES.REMAINING: num 6 13 7 8 3 10 11 1 10 15 …## $ RATINGS : num 2 NA 3 3 NA 4 5 2 3 3 …## $ PAST.EXP : num 0 7 0 0 3 0 0 2 1 0 …
#clean dataset
tibble::as_tibble(Data_profession)## # A tibble: 2,639 × 13## FIRST.NAME LAST.NAME SEX DOJ CURRENT.DATE DESIGNATION AGE SALARY UNIT ## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>## 1 TOMASA ARMEN F 5/18… 1/7/2016 Analyst 21 44570 Fina…## 2 ANNIE <NA> F <NA> 1/7/2016 Associate NA 89207 Web ## 3 OLIVE ANCY F 7/28… 1/7/2016 Analyst 21 40955 Fina…## 4 CHERRY AQUILAR F 4/3/… 1/7/2016 Analyst 22 45550 IT ## 5 LEON ABOULAHOUD M 11/2… 1/7/2016 Analyst NA 43161 Oper…## 6 VICTORIA <NA> F 2/19… 1/7/2016 Analyst 22 48736 Mark…## 7 ELLIOT AGULAR M 9/2/… 1/7/2016 Analyst 22 40339 Mark…## 8 JACQUES AKMAL M 12/5… 1/7/2016 Analyst NA 40058 Mark…## 9 KATHY ALSOP F 6/29… 1/7/2016 Senior Ana… 28 63478 Oper…## 10 LILIAN APELA F 11/1… 1/7/2016 Analyst 22 43110 Fina…## # ℹ 2,629 more rows## # ℹ 4 more variables: LEAVES.USED <dbl>, LEAVES.REMAINING <dbl>, RATINGS <dbl>,## # PAST.EXP <dbl>
#removes rows with NA
Data_profession2 <-Data_profession %>% filter(complete.cases(.))
#REarrange rows
Data_prof1 <- arrange(Data_profession2,SALARY)
#STEP 4: ANALYSIS
##4.1Descriptive analysis on Data_profession
mean(Data_prof1$SALARY) #straight average (salary)## [1] 58117.64max(Data_prof1$SALARY) #salary## [1] 388112min(Data_prof1$SALARY) #Salary## [1] 40001
#Summarize dataset
summary(Data_prof1)## FIRST.NAME LAST.NAME SEX DOJ ## Length:2631 Length:2631 Length:2631 Length:2631 ## Class :character Class :character Class :character Class :character ## Mode :character Mode :character Mode :character Mode :character ## ## ## ## CURRENT.DATE DESIGNATION AGE SALARY ## Length:2631 Length:2631 Min. :21.00 Min. : 40001 ## Class :character Class :character 1st Qu.:22.00 1st Qu.: 43418 ## Mode :character Mode :character Median :24.00 Median : 46783 ## Mean :24.75 Mean : 58118 ## 3rd Qu.:25.00 3rd Qu.: 51402 ## Max. :45.00 Max. :388112 ## UNIT LEAVES.USED LEAVES.REMAINING RATINGS ## Length:2631 Min. :15.0 Min. : 0.000 Min. :2.000 ## Class :character 1st Qu.:19.0 1st Qu.: 4.000 1st Qu.:2.000 ## Mode :character Median :22.0 Median : 8.000 Median :3.000 ## Mean :22.5 Mean : 7.501 Mean :3.487 ## 3rd Qu.:26.0 3rd Qu.:11.000 3rd Qu.:4.000 ## Max. :30.0 Max. :15.000 Max. :5.000 ## PAST.EXP ## Min. : 0.000 ## 1st Qu.: 0.000 ## Median : 1.000 ## Mean : 1.563 ## 3rd Qu.: 2.000 ## Max. :23.000
#Changeing the date format
Data_prof2 <- Data_prof1 %>% mutate(N_DOJ = parse_date_time(DOJ, orders = c(“mdy”, “dmy”, “ymd”)))
#Splitting the date into columns of day, months, and year
Data_prof2$date <- as.Date(Data_prof2$N_DOJ)Data_prof2$month <- format(as.Date(Data_prof2$date), “%m”)Data_prof2$day <- format(as.Date(Data_prof2$date), “%d”)Data_prof2$year <- format(as.Date(Data_prof2$date), “%Y”)Data_prof2$day_of_week <- format(as.Date(Data_prof2$date), “%A”)
I then moved the cleaned data to Tableau where I had some interesting visuals to see the trends
This graph illustrates the average salary for different designations in the data profession. Here are some key points from the analysis
From the char above, the Director designation has the highest average salary, at $286,971. This indicates that directors, who likely have significant responsibility and oversight in data-related roles, are compensated the most; While Analysts have the lowest average salary at $45,023. This entry-level role likely involves data collection, processing, and basic analysis tasks.
Insights:
- Salary Distribution: There is a significant increase in average salary as you move up the hierarchy from analyst to director. The steepest increase is seen between the roles of senior manager and director.
- Career Progression: The graph suggests a clear financial incentive for career progression in the data profession, with substantial salary increases associated with higher levels of responsibility and expertise.
- Value of Experience and Responsibility: Higher salaries for senior managers and directors reflect the value placed on experience, leadership, and the ability to manage larger teams and projects.
This information is useful for understanding the financial potential in the data profession and can help in career planning and progression strategies.
This graph provides a detailed breakdown of the average salary for each designation within different units. The designations include Analyst, Associate, Director, Manager, Senior Analyst, and Senior Manager, and the units are Finance, IT, Management, Marketing, Operations, and Web. Here are the key insights from this graph:
Directors:
- Directors have significantly higher average salaries compared to other designations.
- Directors in Finance have the highest average salary, followed closely by those in IT, Management, Marketing, Operations, and Web.
- This indicates that the role of a director in Finance is particularly well-compensated.
General Insights:
- Highest Salaries: Directors in Finance have the highest average salary among all designations and units, indicating the high value placed on this role within the Finance unit.
- Unit Comparison: Overall, Finance and IT units tend to offer higher average salaries across most designations, suggesting that these units might be more lucrative for data professionals.
- Designation Comparison: The trend shows that higher designations such as Director and Senior Manager consistently have higher average salaries across all units, reflecting the increased responsibilities and expertise required for these roles.
This graph is useful for understanding how salaries in the data profession vary by both designation and unit, providing a clear picture of the financial potential and how it differs across various sectors.
This graph illustrates the average age for different designations in the data profession. Here are some key points from the analysis:
The Analyst has average age for analysts are 23.01 years. This suggests that the analyst position is often an entry-level role, typically filled by recent graduates or those early in their careers, while the Director has the highest average age at 41.69 years. This aligns with the significant experience and expertise required for this senior-level role.
Insights:
- Career Progression: There is a clear progression in average age as you move up the hierarchy from analyst to director. This progression suggests that advancing to higher designations typically requires accumulating years of experience.
- Experience and Seniority: Higher designations such as director and senior manager have higher average ages, reflecting the increased responsibility and expertise required for these roles.
- Early Career Roles: Roles like analyst and associate are typically filled by younger professionals early in their careers, often serving as stepping stones to more senior positions.
This information is valuable for understanding the typical career trajectory in the data profession, highlighting the relationship between age, experience, and professional advancement.
The provided chart showcases the average age of employees across different designations within various units. The designations include Analyst, Associate, Director, Manager, Senior Analyst, and Senior Manager, and the units encompass Finance, IT, Management, Marketing, Operations, and Web.
Key Observations:
Analyst:
-
- Average age across units is relatively young, around 23 years.
- This suggests that the Analyst role is typically an entry-level position, attracting younger employees
Director:
-
- Significant increase in average age, ranging from 41 to 43 years.
- This role likely demands extensive experience and expertise, reflected in the older average age.
Unit-Specific Observations:
- Finance:
-
- Consistent progression in average age from Analyst (23) to Senior Manager (37.25).
- Reflects a clear career progression within the finance unit.
- IT:
-
- Similar trend as Finance, with a steady increase in age from Analyst (23.03) to Senior Manager (36.50).
- Indicates structured career growth within the IT unit.
- Management:
-
- Follows a similar pattern, with an increase from Analyst (23.02) to Senior Manager (36.59).
- Emphasizes the importance of experience in management roles.
- Marketing:
-
- Progression from Analyst (23.07) to Senior Manager (36.21).
- Suggests that marketing roles also follow a trajectory of increasing age and experience.
- Operations:
-
- The trend is consistent with an increase from Analyst (22.96) to Senior Manager (36.36).
- Highlights the need for experience in operations roles.
- Web:
-
- Similar progression from Associate (29.85) to Senior Manager (36.80).
- Indicates that even web-based roles follow a structured career path with increasing age and experience.
Insights:
- Career Progression:
-
- There is a clear progression in average age as one moves up the career ladder from Analyst to Senior Manager.
- This suggests that the organization values experience and typically promotes employees internally as they gain more experience.
- Talent Management:
-
- The organization needs to ensure proper talent management practices to support employees at different career stages.
- Mentorship programs can be beneficial, where senior employees guide younger ones.
- Training and Development:
-
- Continuous training is essential to keep employees updated with industry trends and skills, especially for younger employees in Analyst and Associate roles.
- Senior employees might benefit from leadership and advanced strategic training.
- Retention Strategies:
-
- Retention strategies should be tailored to different age groups and career stages.
- Younger employees might be motivated by learning opportunities and career growth, while older employees might value stability and leadership roles.
- Succession Planning:
-
- Succession planning is crucial to ensure a smooth transition as older employees retire or move to different roles.
- Identifying and grooming potential leaders from younger age groups is important.
By understanding these trends and implementing appropriate strategies, the organization can ensure a balanced and effective workforce, driving better sales performance and overall organizational success.
Recommendations
By following these recommendations, organizations can effectively utilize insights from the data profession trends to enhance workforce development, salary competitiveness, and overall strategic alignment with market needs.
-
Career Pathway Development
Insight: There’s a clear financial and professional progression from entry-level to senior-level positions within data professions.
Action: Establish structured career pathways with clear milestones for promotion. Implement mentorship programs where senior employees guide junior staff, enhancing skills and preparing them for advanced roles.
-
Competitive Salary Structures
Insight: Directors, especially in Finance and IT, command the highest salaries. Analysts have the lowest.
Action: Review and adjust salary structures to remain competitive, particularly for high-value roles such as directors and senior managers. Ensure that salary increases reflect the added responsibilities and expertise at higher designations.
-
Targeted Training and Development
Insight: Different designations and units show varying average ages and experience levels.
Action: Develop targeted training programs. Offer leadership and strategic training for senior roles while focusing on technical skills and innovation for junior roles. Continuous learning opportunities can help retain talent and keep skills current.
-
Unit-Specific Strategies
Insight: Finance and IT units offer higher average salaries and exhibit clear career progression.
Action: Implement unit-specific strategies that cater to the unique needs and potential of each unit. For instance, focus on advanced financial modeling for the Finance unit and cutting-edge technology training for the IT unit. Align these strategies with the overall business goals to maximize impact.