Excel Data Restructurer with Java

My mother is currently writing her dissertation for her Ph.D. in Accounting. She had some data in an Excel file that she wanted to turn into a “stack area graph” in Excel. However, the structure of her data had too many variables for the graph to be created through the raw data.

She informed me of the problem and asked if I would be willing to help. I enjoy taking up these types of challenges, especially since I’ve worked with Microsoft products with Java multiple times.

The basic problem was that she had a huge list of papers, their topic area, their type, and how many of them were written in a given year. She needed to compile the information so that she could graph the topic area frequency for a given year using a stack area graph. However, the topic area and type didn’t always correlate, so the data could not easily be combined.

For example, the data looked something like this:

Screen Shot 2017-05-13 at 1.08.11 PM

She needed to get the cumulative number of papers written for each topic area (e.g. “AIS”) for each year. But you can see that there are three different types of papers within “AIS” in 1990: “Archival”, “Experimental”, and “Mother”. Thus, these types needed to be ignored and the “COUNT” column needed to be combined for that year (FYI, there was data like above for years from 1990 to 2016).

This needed to be done for each topic area, for each year. Then, it needed to be done again, but this time combining all the different types of paper and it would ignore the topic area. Thus, she was looking for two stacked area graphs.

I quickly realized that doing this manually would take a long time, so I decided to utilize my friend Java. Using simple logic that would assign papers to a dictionary according to the topic area, and then to a sub-dictionary according to the year, I was able to store all of the data from the bad-formatted template shown above.

I then created a new excel file that was able to store the data in a format that I would be able to use to create the stack area graph. The data ended up looking like this for seven different topic areas:

Screen Shot 2017-05-13 at 1.14.02 PM.png

I was then able to create a stack area graph in Excel with the new formatted data. The graph for the various topic areas looked like this:

Screen Shot 2017-05-13 at 1.15.40 PM

I was then able to tweak my Java program to be more versatile which would allow me to also re-format the data for the type as well as the topic area. So then I was also able to create a graph for type as well:

Screen Shot 2017-05-13 at 1.17.01 PM

I think it is really cool that I have been able to take my knowledge of coding an apply it to a real-life scenario where I was able to help my mom with her research. I think doing these types of projects (including the ones I did for ESF CoCo, which you can find here and here) have really helped increase my confidence with approaching real-life problems and developing creative solutions for them.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s