-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdevelopment.txt
More file actions
38 lines (24 loc) · 2.65 KB
/
development.txt
File metadata and controls
38 lines (24 loc) · 2.65 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Ideas / Development of the Data Analysis task
1) First I did some exploratory data analysis in R - this is in the file called Prosper_loan.Rmd. Just some basic histograms and scatters. Identified some relationships - there was some variance in the borrower rate per occupation.
2) Clearly there was going to be a relationship between the credit score and the borrower rate. Potentially also between the amount loaned and the rate, but neither of these seemed particularly interesting to me. What was potentially of more interest was if there was any discrimination between occupations.
3) Therefore the theme would be to look at borrower rate vs Occupation, but to try and keep other factors constant. In particular you could look at keeping the income level constant (e.g. for the sub-set of £25 - 50k, can we see major differences in rates).
4) This could then be animated, i.e. cycling through the income ranges. You could then allow interaction - choose your income level to highlight which profession you 'should' be for the lowest rates.
5) Could consider other variables as the driving factors - e.g. loan/income level or similar.
Task 1) work out code for subsetting data in R for the £25-50k set. Done e.g. data = subset(loans.df, IncomeRange =="$25-50k")
Task 2) work out how to amalgamate categories: e.g. tradesmen - plumber, tradesmen-mechanic, tradesmen - electrician, tradesmen - carpenter. Might be easier with one group - tradesmen. Done - by creating a new field called OccGroup, then manually amalgamating levels, from the bottom up, e.g.:
levels(loans.df$OccGroup)[63:66] <- "Tradesman", to amalgamate all the tradesmen levels
Task 3) work out how to perform the sorting within a graph, as we go - i.e. rather than creating a new field with new levels, set by sorting a table. Perform the level setting by inserting within the graph code. E.g.:
ggplot(aes(x = factor(OccGroup,
levels = names(
sort(
table(OccGroup)
)
))),
data = subset(
loans.df, IncomeRange == "$75-100k"
)
Task 4) Create a line chart of the average borrower APR for a profession (say "analyst"), across the different income ranges. Done. Grouped lines by occupation. Used the Rmisc package and summarySE function, e.g.
AprMean <- summarySE(loans.df, measurevar = "BorrowerAPR",
groupvars = c("IncomeRange","OccGroup"))
Task 5) Write the data to CSV, so it can be used in d3.
Task 6) Use d3 to animate through the occupations, present the average, and allow the user to then select occupations to compare with the average