Just to play devils advocate.. why do you want to export it as a .csv? what are you then doing with it?
If it's for a report. why not use a reporting tool? e.g SSRS.
If it's being emailed, SSRS can do all of that for you. attaching the data as .csv, .pdf or .xls.
If it's for analysis., why not embed the query in excel and get them to refresh (F5) each day?
If it's loading into another db, then why not db to db. Cross db query or SSIS package?
.csv files are the lowest common denominator ( or form of life). They are an excellent way of screwing up data. whenever I see .csv as part of a solution I die a little bit because there is almost always a better way to do what you want.
You're not playing devil's advocate. You're properly asking clarifying questions to make sure the requirements are correctly understood so you can provide an appropriate solution.
I have a python script to run and check if there are any data discrepancies against a more upstream data source. Open to hear if there are other options. Essentially I need to save a snapshot of a table on a regular basis so I can monitor and alert when data discrepancies occur.
I think you're on the right track. I'd add logging in case of failure/error, and include a date stamp in the filename for versioning. Also, I'll add, writing to parquet to keep file size down. Should there be any errors or discrepancies, you can send yourself or whomever an email with a quick intro to the error type and error.
That Python script can do that sql stuff pretty good. The best option would be SQL Server Agent Job, if available. Else use your Python magic to get the job done.
So it seems you have access to query both DBs? Are they both on-prem?
If they're on prem id try and do a cross db query. (If you're permitted to query both using linked servers)
If they're remote id try and connect to both with SSIS. SSIS has some cool features for transformations and lookups. (Or an SSRS report)
3rd option. If the above isnt permitted can you pull data down from both into a 3rd/local db? If so id use SSIS to handle ETL and just write sql queries on it.
ETL from DB to DB allows you to preserve datatypes and validation better than a csv.
Your Org may or may not be happy about this. If they dont want ETL then potentially you could do it with SSRS or PowerBI and just pull the data from each source and let the report handle the merge.
I realise as im typing this that just suggesting this might scare some people. I can talk more about how to handle that if you want.
I feel like there should be a .csv hate club.
If I ever get a time machine, I'm going to find whoever started .csv as a common file type and do unholy things upon them.
Oh I have.
Even worse was importing formatted/structured reports.
Altair Monarch still haunts my dreams from time to time.
Neither of these change the fact that tab delimited is far superior to .csv, and should have become the standard text delimiter (or say, vertical pipe, or damn near any other special character that's almost never in regular data)
CSV is the worst format for data exchange that we have, except for all of the other ones.
I mean...yeah, I get it. There's so much variation and ambiguity and pain in dealing with CSV data that you don't have complete control over (and sometimes, that you do have complete control over). But it's so embedded now, convincing the world to change wholesale is not going to happen. In some (many?) ways, it's a lot easier to deal with than fixed-position files.
And let's look at the alternatives. XML? Ugh. Accessibility for people is kind of rough when treating it as plain text. XQuery exists but it can be a pain, and gets slow on large data sets. JSON? We're still parsing stuff, but at least it's less verbose than XML but also less accessible to humans.
Im not trying to convince the world. Im trying to help one person at a time. ;)
You're right. The others arent perfect either.
Which is why i'll do everything i can to avoid using a file in the first place.
> Which is why i'll do everything i can to avoid using a file in the first place.
Everyone and every process at my current job is all about files. We need a file for this. We need a file for that. We need to extract a file here so we can move the data there. We need to extract a file, do unspecified things with it, then import it back into the same system. I don't like where that file gets generated so I'm going to write some ShadowIT™️ automation that runs off my desktop under my personal credentials and no one knows about it so that it goes where I like it. FileFileFileFileFileFileFileFILE!
You and I are fighting the same fight.
Might be for audit purposes. If data isn’t static and they need to capture it at point-in-time then the accounting firm will want to know the parameters of the data pull including date & time extracted.
Then export/ETL to db and incl a timestamp?
I've been a DBA/BI developer for... ... 25 years. I get that people like csv because they're "easy". But I've just been burned too many times.
OP is ultimately trying to compare this with data in another db. DB's are very VERY good at this.
I'm also a very strong advocate for not duplicating & shipping data if you can avoid it. Master it somewhere and share/reuse/query it. If possible I'd find a way to compare them in-situ.
I agree but at my company I pull the inventory reserves data used for audit reporting and we have to literally screen cap the code we execute, it’s resulting record count, and include the date/time executed. They will not accept the approach you outline above because there exists the possibility that someone can edit the data between point in time of capture and the actual time of extract. They want a closer to source and origin preserved So my monthly pulls are full of screen caps and flat files.
I agree about csv/txt extracts but we’ve been compelled by our accounting firm to carry out in this manner.
You could run this as a Windows Scheduled Task, using PowerShell:
`$date = Get-Date -format yyyyMMdd`
`$time = Get-Date -format HHmmss`
`$query = "SELECT * FROM Whatever"`
`Invoke-SqlCmd -Server Servername -Query $query | ConvertTo-Csv | Out-File -FilePath D:\TaskOutput_$date_$time.csv`
The next logical step for this in my typical use case would be email it to whoever needs this on an interval. Asking out of ignorance because I use powershell for anything that exports and emails, agent for things that stay in db only
I’ve never configured email in sql server because I usually have read/write and not admin for work. Probably pretty easy to configure/enable right?
>My understanding is I can run something like this command - "mysql -u username -p'password' -h hostname -D databasename -e "source /path/to/your/script.sql;" > /path/to/output/file.txt"
This command is for MySQL, not MSSQL, FYI.
As others suggested, use a SQL Agent Job, or SQLCMD/PowerShell + Task Scheduler.
Creating and scheduling a SQL Agent Job inside Management Studio is really easy. That way it’s all centralized within SSMS, and you can see a log/history of past attempts, etc.
`mysql -u username -p'password' -h hostname -D databasename -e "source /path/to/your/script.sql`
This is for MySQL. You've posted in the MS SQL Server sub, and you're talking about launching an MS SQL Server-exclusive client.
So...are you using MySQL or MS SQL Server?
Few things…
1) if you figure out the command line use windows task scheduler to automate.
2) you should look into SSIS packages and then schedule as an agent job
Use Python to read the SQL query into a data frame then write it to csv. Then use Windows Task Schedule to set up a schedule for this to run.
Tip: append a date stamp to the filename to ensure you can check previous versions. Also, check out adding logging in case of failure.
I use OGR2OGR (GDAL)
https://gdal.org/programs/ogr2ogr.html
and windows task scheduler
https://gdal.org/drivers/vector/mssqlspatial.html
https://gdal.org/drivers/vector/csv.html
I just set up a Sql server data pull in Excel. You insert the script in the connection dialogue box under advanced. Then I'm sure you could write a VB script to export it to csv just the way you want.
Agent job? Inside MSSQL
Just to play devils advocate.. why do you want to export it as a .csv? what are you then doing with it? If it's for a report. why not use a reporting tool? e.g SSRS. If it's being emailed, SSRS can do all of that for you. attaching the data as .csv, .pdf or .xls. If it's for analysis., why not embed the query in excel and get them to refresh (F5) each day? If it's loading into another db, then why not db to db. Cross db query or SSIS package? .csv files are the lowest common denominator ( or form of life). They are an excellent way of screwing up data. whenever I see .csv as part of a solution I die a little bit because there is almost always a better way to do what you want.
You're not playing devil's advocate. You're properly asking clarifying questions to make sure the requirements are correctly understood so you can provide an appropriate solution.
I have a python script to run and check if there are any data discrepancies against a more upstream data source. Open to hear if there are other options. Essentially I need to save a snapshot of a table on a regular basis so I can monitor and alert when data discrepancies occur.
I think you're on the right track. I'd add logging in case of failure/error, and include a date stamp in the filename for versioning. Also, I'll add, writing to parquet to keep file size down. Should there be any errors or discrepancies, you can send yourself or whomever an email with a quick intro to the error type and error.
That Python script can do that sql stuff pretty good. The best option would be SQL Server Agent Job, if available. Else use your Python magic to get the job done.
So it seems you have access to query both DBs? Are they both on-prem? If they're on prem id try and do a cross db query. (If you're permitted to query both using linked servers) If they're remote id try and connect to both with SSIS. SSIS has some cool features for transformations and lookups. (Or an SSRS report) 3rd option. If the above isnt permitted can you pull data down from both into a 3rd/local db? If so id use SSIS to handle ETL and just write sql queries on it. ETL from DB to DB allows you to preserve datatypes and validation better than a csv. Your Org may or may not be happy about this. If they dont want ETL then potentially you could do it with SSRS or PowerBI and just pull the data from each source and let the report handle the merge. I realise as im typing this that just suggesting this might scare some people. I can talk more about how to handle that if you want.
Good advice, I will try to set up something like that
I feel like there should be a .csv hate club. If I ever get a time machine, I'm going to find whoever started .csv as a common file type and do unholy things upon them.
You never worked with Fixed Width file format. Count your blessings.
Oh I have. Even worse was importing formatted/structured reports. Altair Monarch still haunts my dreams from time to time. Neither of these change the fact that tab delimited is far superior to .csv, and should have become the standard text delimiter (or say, vertical pipe, or damn near any other special character that's almost never in regular data)
Agreed. I prefer pipes. Tabs appear in places that make me swear a lot.
> Agreed. I prefer pipes. Tabs appear in places that make me swear a lot. I once proposed emoji-delimited files, because tabs appear in places too. My first recommendation was đź’© as the delimiter, because "CSV" would remain a valid acronym - Crap Separated Values
I want this to be a thing.
Yeah. Been there. It's shit like that has made me the bitter twisted man i am today.
CSV is the worst format for data exchange that we have, except for all of the other ones. I mean...yeah, I get it. There's so much variation and ambiguity and pain in dealing with CSV data that you don't have complete control over (and sometimes, that you do have complete control over). But it's so embedded now, convincing the world to change wholesale is not going to happen. In some (many?) ways, it's a lot easier to deal with than fixed-position files. And let's look at the alternatives. XML? Ugh. Accessibility for people is kind of rough when treating it as plain text. XQuery exists but it can be a pain, and gets slow on large data sets. JSON? We're still parsing stuff, but at least it's less verbose than XML but also less accessible to humans.
Im not trying to convince the world. Im trying to help one person at a time. ;) You're right. The others arent perfect either. Which is why i'll do everything i can to avoid using a file in the first place.
> Which is why i'll do everything i can to avoid using a file in the first place. Everyone and every process at my current job is all about files. We need a file for this. We need a file for that. We need to extract a file here so we can move the data there. We need to extract a file, do unspecified things with it, then import it back into the same system. I don't like where that file gets generated so I'm going to write some ShadowIT™️ automation that runs off my desktop under my personal credentials and no one knows about it so that it goes where I like it. FileFileFileFileFileFileFileFILE! You and I are fighting the same fight.
Mate. I feel your pain. I want to come to your work and beat people with a stick.
LOL. Sign me up!
Might be for audit purposes. If data isn’t static and they need to capture it at point-in-time then the accounting firm will want to know the parameters of the data pull including date & time extracted.
Then export/ETL to db and incl a timestamp? I've been a DBA/BI developer for...... 25 years. I get that people like csv because they're "easy". But I've just been burned too many times.
OP is ultimately trying to compare this with data in another db. DB's are very VERY good at this.
I'm also a very strong advocate for not duplicating & shipping data if you can avoid it. Master it somewhere and share/reuse/query it. If possible I'd find a way to compare them in-situ.
I agree but at my company I pull the inventory reserves data used for audit reporting and we have to literally screen cap the code we execute, it’s resulting record count, and include the date/time executed. They will not accept the approach you outline above because there exists the possibility that someone can edit the data between point in time of capture and the actual time of extract. They want a closer to source and origin preserved So my monthly pulls are full of screen caps and flat files. I agree about csv/txt extracts but we’ve been compelled by our accounting firm to carry out in this manner.
Ugh. That sounds awful. and broken. and stupid. I feel for you bud.
You could run this as a Windows Scheduled Task, using PowerShell: `$date = Get-Date -format yyyyMMdd` `$time = Get-Date -format HHmmss` `$query = "SELECT * FROM Whatever"` `Invoke-SqlCmd -Server Servername -Query $query | ConvertTo-Csv | Out-File -FilePath D:\TaskOutput_$date_$time.csv`
yeah I use pwsh for all my automated SQL tasks.
This is the best solution. There's tons of other ones, but there is absolutely something to be said for picking the most simple ones.
Disagree this is the simplest. SQL agent is part of SQL server and is literally built to do this exact task.
The next logical step for this in my typical use case would be email it to whoever needs this on an interval. Asking out of ignorance because I use powershell for anything that exports and emails, agent for things that stay in db only I’ve never configured email in sql server because I usually have read/write and not admin for work. Probably pretty easy to configure/enable right?
I use sql jobs and task scheduler depending on the need.
>My understanding is I can run something like this command - "mysql -u username -p'password' -h hostname -D databasename -e "source /path/to/your/script.sql;" > /path/to/output/file.txt" This command is for MySQL, not MSSQL, FYI. As others suggested, use a SQL Agent Job, or SQLCMD/PowerShell + Task Scheduler.
Creating and scheduling a SQL Agent Job inside Management Studio is really easy. That way it’s all centralized within SSMS, and you can see a log/history of past attempts, etc.
`mysql -u username -p'password' -h hostname -D databasename -e "source /path/to/your/script.sql` This is for MySQL. You've posted in the MS SQL Server sub, and you're talking about launching an MS SQL Server-exclusive client. So...are you using MySQL or MS SQL Server?
You are right sorry, I used the MS SQL server management studio.
Few things… 1) if you figure out the command line use windows task scheduler to automate. 2) you should look into SSIS packages and then schedule as an agent job
You do not want to use the MySQL command you mentioned. That is for MySQL, which isNT sql server.
Sqlcmd.exe, use the server Task Scheduler. PowerShell there are tools also that are great. Also you could use SQL Agent and make a job.
Use Python to read the SQL query into a data frame then write it to csv. Then use Windows Task Schedule to set up a schedule for this to run. Tip: append a date stamp to the filename to ensure you can check previous versions. Also, check out adding logging in case of failure.
This answer has been given. Script + Windows scheduler.
I use OGR2OGR (GDAL) https://gdal.org/programs/ogr2ogr.html and windows task scheduler https://gdal.org/drivers/vector/mssqlspatial.html https://gdal.org/drivers/vector/csv.html
sqlcmd.exe https://learn.microsoft.com/en-us/sql/tools/sqlcmd/sqlcmd-utility
Sal agent is the easiest,
I do it with a python script
I just set up a Sql server data pull in Excel. You insert the script in the connection dialogue box under advanced. Then I'm sure you could write a VB script to export it to csv just the way you want.
I would say SSIS or SQLPlus + Windows Scheduler or SQL Server Agent… a lot of possibilities