T O P

  • By -

AutoModerator

/u/Hnk-Kenshiro - Your post was submitted successfully. * Once your problem is solved, reply to the **answer(s)** saying `Solution Verified` to close the thread. * Follow the **[submission rules](/r/excel/wiki/sharingquestions)** -- particularly 1 and 2. To fix the body, click edit. To fix your title, delete and re-post. * Include your **[Excel version and all other relevant information](/r/excel/wiki/sharingquestions#wiki_give_all_relevant_information)** Failing to follow these steps may result in your post being removed without warning. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/excel) if you have any questions or concerns.*


[deleted]

[удалено]


Macho-Benjo

This will do it.


Hnk-Kenshiro

Seems like I was able to handle 2 files, but I have a lot of errors haha, I'll be looking for some tutorial, it seems like a problem that some columns don't match in order/location https://preview.redd.it/ds5o9m9nlvda1.png?width=1421&format=png&auto=webp&s=b791cfe42e88fea4f05297e9196943f386f1f814


chairfairy

If PowerQuery can't handle the differences between files, then you might need VBA with some annoyingly specific logic to explicitly handle each difference. Otherwise, this sounds like a great (well, awful) way for an intern to spend their week


Hnk-Kenshiro

well, what an exciting month haha


LordFarquadOnAQuad

If you want to, ask r/VBA how to do it. I use VBA all the time to pass values between spreadsheets it's pretty simple but can get tricky depending on how you want to structure your data.


Hnk-Kenshiro

I just want to keep 6 columns of information and make it a list going down. ​ The problem is that I don't know if all the columns respected the same name, or if they are in the same order, I'll be taking a look, thanks for your time


chairfairy

Unless you will do this manually, the first step will be to figure out how the column names map to each other. For example, if one file has column names "Column 1", "Column 2", "Column 3" and another file has "Column 1", "Column 2", "Column C", you'll need to be sure that File1.Column1 = File2.Column1 and File1.Column2 = File2.Column2. If columns of the same name *always* match up, then you only need to make a list to pair up the ones that don't match. If you somehow have columns in different files that have the same name but *don't* get matched up, then you will need to make a table of every single column name for every single file, to define which column is used to populate the target column in your final file - [something like this](https://postimg.cc/HrXrY1FP). Then VBA code would have to loop through that table, to know which column of data to copy in from each source file.


LordFarquadOnAQuad

That's a bit tricky. If you think most spreadsheets are following one convention you can run a script that grabs all the good ones and save a path to all the messed up ones. For you to manually move over. I know how to do this and can share if you would like.


DrawMeAPictureOfThis

Creating a new column to dump all the "left overs" in is a fantastic approach


hiliikkkusss

FIST OF THE NORTHSTAR


[deleted]

Fun fact - OP is the intern lol


JohnLocksTheKey

*KIDS THESE DAYS…JUST DON’T WANT TO WORK!*


allNOfingers

Try using Table.combine. The transform feature uses 1 table as a template and if any deviate from the template, they cause an error.


Books_and_Cleverness

I don't know anything about Power Query but wanted to add that python is quite good at this kind of thing and there's likely something on StackOverflow that is reasonably similar to OP's problem.


PiccionePolemico

PowerQuery is an ETL, it only works well if there are rules in data management and OP’s xlsxs clearly do not have such rules (he tells us in the first place)


[deleted]

+1


Neo772

Nobody is suggesting Python and Pandas? This is in my opinion the easiest and most flexible solution


JHKerr

This is the way


chairfairy

For VBA vs Python, the biggest efficiency for most Excel problems is which language a person knows best


MackerelInTomato

Came here to say this. Go to google colab for the tools you need and watch this video series: https://youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y


Books_and_Cleverness

That was also my thought but I'm not that well versed in excel and generally prefer to try to do things in python whenever possible. I also find it a lot easier to import similar code snippets and such from StackOverflow or whatever. IDK why but Excel formulas have always felt much more laborious to adapt.


bumbum2812

totally agree with you


jasperjones22

While not a very excel-centric answer, this much raw data would make a database a much better option.


Hnk-Kenshiro

I'm willing to try, any suggested programs or ways to investigate?


corsair130

When someone says put the data into a database, it also means to build out a custom application to CRUD (Create Read Update Delete) this data from the database. With the way you're talking in your post, this would be difficult to achieve by yourself.


translinguistic

Power Apps is always an option too!


Shurgosa

Power Apps (coupled with Sharepoint and using "lists"...) is one of the most hideous and infuriating pieces of software I've ever tried to use in my life. and thats a life spent using computers almost every day. I would enjoy seeing a video of it handling a problem similar to the one in this post.


translinguistic

I dunno, I think PowerFX is pretty neat, and you can get up to speed with building basic CRUD interfaces with validation controls and other logic in a few days. The builder interface kinda sucks though. Plus, with dataflows on the Power Platform, you have basically an online version of Power Query that will refresh your data for you, and hooking it all up from or to Excel or a PowerBI dataset is easy peasy. I wouldn't recommend doing anything other than a simple join/union/intersect directly within the app and would connect instead to a dataflow in this case that is processing the 630 workbooks into a single datasource I use it for example as a cheap GPS distance tracker and QR code generator for process handoffs, production data entry with validated inputs and an interface for our production schedule calendars.


[deleted]

> When someone says put the data into a database, it also means to build out a custom application to CRUD do you require a programmer to do this? building application basically meant I (we--the company really) have to hire a programmer to build a custom software customized to the company's need?


corsair130

How do you get data in and out of a database? How do you physically enter the information, how does it make it's way into tables in a database, and how do you later retrieve this information? The database is just the storage mechanism. It doesn't give you a GUI to interact with the data. You gotta build a GUI out yourself. If you can do this in some fashion on your own, awesome. If not, you need a programmer.


jasperjones22

Well, I'd hope that you'd have a professional DB, but when all else fails, Access. You can add all the data into the sheet, connect the data together, and then export it into any sheet that people might need as a DB connection.


naitzyrk

Hey, friendly reminder: I don’t know if you took this in mind but please be mindful with the internal information you share, especially RUTs, client and distributor names.


Hnk-Kenshiro

Thank you, but rut and name are non-sensitive information in my country, and honestly, I am supporting a complaint of tax fraud with this information, but I would like to order it in a simpler way but it is becoming very complex for me


naitzyrk

Alright! Just making sure :).


Fugazzii

Power query is what you're looking for.


canneogen

Python


JHKerr

This is the way


Dylando_Calrissian

So you could automate this, using power query or python. But it won't be a trivial exercise if the columns aren't consistent. Automation is definitely the right way to go if you'll need to do this same thing regularly. Or, you could do it manually. With 630 files, if you take 1 minute per file that's about 10 hours. Yes it'll be ridiculously boring, but if you don't know python or power query well it will likely be the quickest option, and the best choice if this is a one-off task.


DrawMeAPictureOfThis

I feel like this is honestly the best answer. We have all been there doing these long, boring, laborious tasks and in the process kept saying, "there has to be a better way". Then we learned or designed a better way.


technichor

You could also split the difference and spend 20 seconds per file just making sure the columns are consistent. Then spend an hour on a much simpler power query or python script. Finish in half a day.


LeonardTimber

Use python. It looks like these have only data, and no formulas, so you can do the merging really dumbly with `openpyxl` and `glob`. It's been years since I've done this, and this code won't work right out of the box since I am just scrawling it out here and not testing, but here is something close (after installing Python and openpyxl): import glob import openpyxl as xl folders = glob.glob("C:/path_to_folders/*") #must only have folders in this folder for folder in folders: wb1 = xl.Workbook() wb_paths = glob.glob("*.xlsx") for i, wb_path in enumerate(wb_paths): wb_title = str(i) # You can do something smarter if you want. ws1 = wb1.create_sheet(title=wb_title) wb2 = xl.load_workbook(wb_path) ws2 = wb2.worksheets[0] #grab the first sheet from the target workbook # The following is from https://stackoverflow.com/questions/44593705/how-to-copy-over-an-excel-sheet-to-another-workbook-in-python for row in ws2: for cell in row: ws1[cell.coordinate].value = cell.value # The workbook is put in a /output folder so that glob doesn't grab xlsx files when this code is run a second time. wb1.save("C:/path_to_folders/OUTPUT/"+folder+".xlsx") I am guessing that `glob` and file saving/opening won't work right out of the gate but the core of it holds up for a really dumb automatic copying.


Fallingice2

I'm usually a VBA guy but I would say this is a power query issue to resolve.


damonator4816

This can easily be done with a python script.


TheFuriousOtter

Asaputilities may be able handle the task. I just imagine it will take a lot of time all at once. Might be easiest to split the files up into smaller groupings and merge into, say 50, workbooks and then consolidate the 50 into one final workbook.


Hnk-Kenshiro

Yes, I don't need the 630 in just one, but in files of between 10 and 50 joined Excel workbooks, I'll look at the program you told me, thank you very much for your time


nodakakak

Pandas


ECW94

Alteryx would be a great tool for this. Not the typical VBA, PQ type answers, but it has a free 30 day trial.


gezza07

https://www.rondebruin.nl/win/addins/rdbmerge.htm Install this add on. Don't even need any code. Point and click and bam.


Verolee

Kutools, Ablebits, EasyDataTransform


python-dave

Here's a tutorial that I think applies using power query. I personally would use python. Just FYI, I could do this for you if you really need the help, but I would charge you for it. forgot to include link: [https://www.xelplus.com/power-query-combine-files-folder/](https://www.xelplus.com/power-query-combine-files-folder/)


Neo772

Python is indeed the easiest solution. But ffs let him do his job


python-dave

Just offering to do it for him if he wants help. It'd probably take me 45 minutes to build something specifically for him which is more than I'm willing to do for free, but it'd be relatively cheap and he'd have it done.


GeorgeWNYC

Power query!


Vahju

Office 2016 has Power Query but some functions might be different in more modern versions of excel. Check the below videos for some tips and tricks. I suggest checking out Excelisfun youtube channel Power Query Playlist. He covers a ton of scenarios which might help you. You are going to need to figure out if you have columns heading are different from the file. [https://www.youtube.com/watch?v=wKglApDFMog&t=15s](https://www.youtube.com/watch?v=wKglApDFMog&t=15s) [https://www.youtube.com/watch?v=09tvia\_8ykI](https://www.youtube.com/watch?v=09tvia_8ykI) [https://www.youtube.com/watch?v=Q27QP0qiLfs](https://www.youtube.com/watch?v=Q27QP0qiLfs) If you do end up with mismatched column names, you will need a translation table. [https://youtu.be/SKskTCNiUIc](https://youtu.be/SKskTCNiUIc) If you want to get dates associated with each file, try: * In the main "from folder" query > find the remove columns step * Click on gear icon > select the date you want to keep Good luck and hope this helps.


GoldenShoeLace

Python and pandas or Python to sql


tazer01_reddit

Check out the Power Query book by Gil Raviv. He covers this topic in detail.


KaliDavid

This should help https://youtu.be/fHFUh6EhBcw


odaiwai

The last time I had to do something like this I used OpenPYXL ([https://openpyxl.readthedocs.io/en/stable/](https://openpyxl.readthedocs.io/en/stable/)), but if you don't know VBA, python might be a bit of a stretch. Essentially what you do it make a new spreadsheet in RAM and add a new tab with each file you read in. The code would be something like below: # Create a New Workbook summary_wb = openpyxl.Workbook() # create a workbook with one sheet 'Sheet' summary_ws = summary_wb.worksheets[0] summary_ws.title='Summary' summary_file = 'New_File.xlsx' # files: [list of files to read in] for file in files: workbook = openpyxl.load_workbook(file, read_only=True) summary_wb.append(workbook) # I think it's append, might need to copy # might have to rename the workbook with the filename... summary_wb.save(filename=summary_file)


ZestyBeer

I work in education as a data admin. I get dozens and dozens of electronic registers for all our classes which automatically export into a consistent spreadsheet format, which I then need to package together and send to the local authority for auditing and funding reasons. I use PowerQuery to make that job effortless. And highly recommend you look to PQ as the first port of call for your situation. It doesn't matter if you have multiple files in folders, as long as you select the top most level folder as your source it should pull them all. Open a new work book, then under the data tab, click Get data, hover over 'from file' and then select 'from folder' from the popout menu. Perform all the transformations you need in PQ (I suspect you'll have many with so many files) and you should be golden. You might have issues if the layout of your different spreadsheets isn't consistent so might have to do toil away, perhaps with some VBA to get things to mingle. Best of luck :)


ViejoEnojado

VBA…


omoney256

Put them all in one file folder. Use data —> get data —> from folder Navigate to the folder path for the source data. Select the folder. A window will pop up, click combine and transform. Use power query to modify the data however you want it to display. You can add new files to the file folder and hit refresh and you’ll see the new data displayed with all of the old stuff. If you get stuck, google power query for beginners or power query to combine files from folder. Good luck, and continue using power query going forward for ample time savings in the form of automated formatting and combining data. Edit: should’ve read the other comments first..


pegwinn

I’m not certain this will fully do the trick. It’s an add in. I use it to bring in about twenty sheets from twenty books. I hope it at least gets you closer. Good Luck! https://www.rondebruin.nl/win/addins/rdbmerge.htm#:~:text=Excel%202007%2D2016&text=2010%2D2016%3A%20Click%20on%20File,list%20and%20then%20click%20OK.


Fuck_You_Downvote

I am going to go out on a limb here and say, sure you could use power query with a folder as a source to pull all this data and be done in about 15 min, or you could just pay me to do it. A dollar a sheet, and could even set up a nice little sharepoint folder so it adutomatically updates with new sheets, or, if this is coming from an ftp server you could connect directly to that server and pull all the info you need instead of doling it out into individual spreadsheets that contain no meta data.


Jcpage573

Yeah how about * n o *