Hacker News story: Ask HN: Need help creating a database from 5 sets of CSVs

Ask HN: Need help creating a database from 5 sets of CSVs
I have 5 sets of CSVs (multiple CSVs in each set but the columns within the same set are the same). Only 2 of the sets have more than 2-3 columns required - the rest can be dropped. There are relationships between them. This is an export from a previous CRM. 3 of the sets are too big to open in Excel so I couldn't do a lookup to create the proper values and clean up the data this way. There are few columns that I'd like to morph into just one column based on an if statement (if legacy user created_by exist make that created_by if it doesn't exist use created_by user). Same with created time. I've messed around using bash, python, and postgres but it's been a hassle. When I pull new data I have to end up deleting and purging all the info and redoing a bunch of segmented scripts. (I spun up a rails app that I connected to my new db and displayed the data and could do basic search/pagination) -- but here is the thing. I realized I'm waisting my time on building that out. That was pulling in all the data and it's got so much excess junk and the ID structure is terrible. I want to keep the associations but use new primary key based on another column. I rather just start with a really reasonable and simple database that has the data properly setup so I can plug it into anything (or do simple sql queries/exports!) Not sure if anyone has had to deal with something similar and could share what they did but I'd appreciate any guidance. Looked into pgfutter http://easymorph.com/ http://tadviewer.com/ csvkit. I'm open to paying someone for their time if they could do this since I'm at my wits end. (It didn't take super long to do things in bash/python but I'm not an expert so I was hacking my way through it). Truth be told, the data isn't complex but doing the few associations and setting up new IDs is foreign to me. No pun intended. 6 comments on Hacker News.
I have 5 sets of CSVs (multiple CSVs in each set but the columns within the same set are the same). Only 2 of the sets have more than 2-3 columns required - the rest can be dropped. There are relationships between them. This is an export from a previous CRM. 3 of the sets are too big to open in Excel so I couldn't do a lookup to create the proper values and clean up the data this way. There are few columns that I'd like to morph into just one column based on an if statement (if legacy user created_by exist make that created_by if it doesn't exist use created_by user). Same with created time. I've messed around using bash, python, and postgres but it's been a hassle. When I pull new data I have to end up deleting and purging all the info and redoing a bunch of segmented scripts. (I spun up a rails app that I connected to my new db and displayed the data and could do basic search/pagination) -- but here is the thing. I realized I'm waisting my time on building that out. That was pulling in all the data and it's got so much excess junk and the ID structure is terrible. I want to keep the associations but use new primary key based on another column. I rather just start with a really reasonable and simple database that has the data properly setup so I can plug it into anything (or do simple sql queries/exports!) Not sure if anyone has had to deal with something similar and could share what they did but I'd appreciate any guidance. Looked into pgfutter http://easymorph.com/ http://tadviewer.com/ csvkit. I'm open to paying someone for their time if they could do this since I'm at my wits end. (It didn't take super long to do things in bash/python but I'm not an expert so I was hacking my way through it). Truth be told, the data isn't complex but doing the few associations and setting up new IDs is foreign to me. No pun intended.

Hacker News story: Ask HN: Need help creating a database from 5 sets of CSVs Hacker News story: Ask HN: Need help creating a database from 5 sets of CSVs Reviewed by Tha Kur on October 20, 2017 Rating: 5

No comments:

Powered by Blogger.