Scraping and Transforming Public Data

The Connecticut Coalition for Achievement Now([url removed, login to view]) is in the process of developing its annual ranking ofConnecticut schools and districts. We are looking for a programmer to developan automated system for scraping public data on student performance andtransforming it into a spreadsheet to be used in uploading the information toour website.

Reponses to this RFP are encouraged to be shortform and to the point. An appropriate response will summarize the processes tobe used in achieving the stated outcomes, including any intermediate milestonesto be generated, as well as required resources. Proposed costing shouldcorrespond to the processes proposed. In addition, approximate timeframes forthe engagement and near-term consultant availability should be described. Prompt replies are appreciated. We are glad to answeradditional questions about the scope of this engagement.

## Deliverables

**Developer characteristics**

· Demonstration of ability in similar projects.

· Three recommendations that show capability and work ethic.

· Milestones of how we will see that you’re completing your work.

· Timeline of one month or less (Nov. 1).

· The ability to start work as soon as possible and complete work quickly.

· Location near New Haven to come for meetings/interview (optional).

· Use of Basecamp (either ConnCAN’s own basecamp account or yours) or similar online project management tool to track progress (optional).

**Outputs and Milestones**

** **· Review the specifications in the RFP and the architecture of the public websites and develop a detailed plan for delivering the required products.

· Organize and deliver to ConnCAN the downloaded information in raw form as soon as they are completed.

· Transform data to meet specifications described in RFP and deliver to ConnCAN as a Excel spreadsheet.

· Provide ConnCAN with a detailed description of the process by which the scrapping and transforming were performed and any work product (macros etc.) that will be of use in future years.

**Project Overview**

The Connecticut State Department of Education publishes annual data on the performance of students in approximately 1,000 public schools and 166 school districts for grades 3-8 and 10 in the subjects of reading, writing, math, and science.

For the past three years, we have downloaded this public data by hand and used a combination of hand manipulation and a macro to transform it into a single spreadsheet that our previous website firm used to create a database to drive our online school report cards ([[url removed, login to view]][1]). This process is time-consuming and prone to human error. We are thus interested in developing a much quicker and cleaner way to get to the same result.


[url removed, login to view] holds Connecticut’s student test score information for elementary, middle and high school. The elementary and middle school test is called the Connecticut Mastery Test (CMT) and the high school test is called the Connecticut Academic Performance Test (CAPT).

· This is the link to access the CMT data: <[url removed, login to view]>

· This is the link to access the CAPT data:

<[url removed, login to view]>

**Downloading the Data**

For both the CMT and CAPT sites, we want to download data from the two previous years (2008 and 2009) for every school and district available as well as the state.

To do so, you first click on "State by District/School Report," then check off the year boxes at the top of the page, select either school or district from the drop down, then highlight the school or district names and click the arrow to move them to the right hand box, and then select “more selections.??

On the next screen, click on the boxes underneath each of the four subject areas labeled “% At / Above Goal.?? Under the section “disaggregate?? select “Ethnicity?? and “F/R Meals.??

Then select “Get Excel CSV??

**Transforming the Data**

The various spreadsheets created through the above downloading process must be combined with descriptive information about the schools and districts that the Department of Education makes available through a single spreadsheet (will be provided on request) to result in something that looks like what we have produced in the past (will be provided on request).

There are a few things we do with the data to produce the numbers that appear on the website:

**Selecting Exit Years and Averaging Scores.** We are interested in the scores for the latest grade in each school. For most middle schools this is 8th grade, for most elementary schools this is 5th grade. If a school doesn’t have one of these grades, use the previous grade (7th or 4th grade). If an elementary school only has only 3rd grade scores use those, but if a middle school only has 6th grade scores do not use any score for that school. Please flag any school with only one year of scores. You should track which grade you end up using in a column labeled “comparison grade.??

For these exit grades, create columns for the scores in each subject available (some grades do not have a science score) for the overall percentage of students at goal: reading, writing, math and science. Then create a column for the average (mean) between these subjects labeled “students at goal.?? Then do the same thing for the following subgroups created through the disaggregation function: Black, Hispanic, White, F/R Meals, Full Price.

Then using the mean scores for each of these subgroups, create new columns that calculate that subtract Black from White scores, Hispanic from White Scores, and F/R Meals from Full Price scores. Then create a column that is the mean of these three gap scores and label it overall gap.

**Improvement and Performance Gains.** After calculating the averages for both years for all grades, “improvement?? can be calculated by subtracting the 2008 mean score across subjects from the 2009 mean score across subjects for all of the grades within a school (e.g. 2009 8th grade ??" 2008 8th grade). If the number of subjects tested between years is different, flag the record for review. Then create a column with the mean improvement across all of the grades.

“Performance gains?? is a cohort analysis that attempts to track the progress of the same class of students between years. For elementary schools, the performance gains score is the mean change between the 2008 3rd grade and the 2009 4th grade, and the 2008 4th grade and the 2009 5th grade. For middle schools, the performance gains score is the mean change between the 2008 5th grade and the 2009 6th grade, the 2008 6th grade and the 2009 7th grade, and the 2008 7th grade and the 2009 8th grade. High schools don’t get performance gains because there is only one data point (10th grade).

**Adding Grades to the Scores.** In addition to the numbers, we can letter grades to the scores, which will have to be represented in columns.

Both the mean score for overall students and the subgroup categories should receive a grade based on the following scale.

Grade / Score

A 90-100

A- 84-89

B+ 78-83

B 72-77

B- 66-71

C+ 60-65

C 54-59

C- 48-53

D+ 42-47

D 36-41

D- 30-35

F <30

The gaps between subgroups should receive a grade based on the following scale:

Grade / Score

A 0-2

A- 3-5

B+ 6-8

B 9-11

B- 12-14

C+ 15-17

C 18-20

C- 21-23

D+ 24-26

D 27-29

D- 29-31

F >31

Performance Gains

If a school's score increased, the difference between these scores is divided by the percentage of students not at goal in 2008. Then the following grade scale is applied:

A .24 or more

A- .20 to .23

B+ .16 to .19

B .12 to .15

B- .08 to .11

C+ .04 to .07

C .00 to .03

If a school's score decreased, the difference between these scores is divided by the percentage of students at goal in 2008. Then the following grade scale is applied:

C- - 0.1 to [url removed, login to view]

D+ [url removed, login to view] to [url removed, login to view]

D .09 to [url removed, login to view]

D- [url removed, login to view] to [url removed, login to view]

F [url removed, login to view] or more

Schools with an overall average percentage of students within goal range in 2008 of 85 or greater receive an N/A since the grade scale begins to approach the ceiling of 100 above this level, which diminishes its meaningfulness as a measure of improvement.

**Combining scores with the descriptive information.** The attached spreadsheet from our 2009 report cards project contains information about how to categorize schools that can be helpful in creating consistent records for 2009. In particular, we had to make judgment about whether a school is an elementary or middle school and what type of school it is and how to describe that school type (“public charter school?? etc). You can start by using this list of schools, districts and descriptive information. It should then be combined with the attached spreadsheet of updated information to repopulate the columns with on the number of students (size), student demographic information, etc.

Once that is completed, you can combine this information with the updated scores to create a master spreadsheet.

Taidot: Amazon Web Services, tiedonsyöttö, tekniikka, MySQL, PHP, projektinhallinta, tietojärjestelmäarkkitehtuuri, Ohjelmistojen testaus, Web hosting, Verkkosivun hallinta, Verkkosivujen testaus

Näytä lisää: yours-24, www report letter writing, www academic writing org, writing students want to have future, writing sites for students, writing schools, writing records for students, writing numbers 1 20, writing numbers 1 10, writing macros in excel, writing macros, writing in math, writing excel macros, writing a performance review, writing an rfp, what is descriptive writing, what is a writing prompt, website scraping projects, website developer student, using numbers in writing

Tietoa työnantajasta:
( 0 arvostelua ) United States

Projektin tunnus: #2892241

7 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön


See private message.

$1530 USD 14 päivässä
(31 arvostelua)

See private message.

$425 USD 14 päivässä
(93 arvostelua)

See private message.

$1700 USD 14 päivässä
(59 arvostelua)

See private message.

$637.5 USD 14 päivässä
(1 arvostelu)

See private message.

$1700 USD 14 päivässä
(4 arvostelua)

See private message.

$1020 USD 14 päivässä
(1 arvostelu)

See private message.

$1700 USD 14 päivässä
(3 arvostelua)