Video: Spludlow Framework - Data Case study - Companies House
Introduction
Working with the UK Companies House dataset titled “Free
Company Data”.
I am documenting my experience with this dataset as it’s a
pretty useful thing even if you only use it for test data. I will go through
the whole process of getting this raw data into a database table using the
Spludlow Framework to help where it can.
All the code used here is in the class
“Spludlow.Data.BasicCompanyData” this is included in the core system. I
recommend you look at the code as you run each bit.
Running the Import
See video.
Follow these step to create the database:
·
Ensure “SpludlowGroup” has the “sysadmin” role on the database,
so it can do things like create databases:
SSMS->Security->Logins->SpludlowGroup Properties->Server
Roles->Tick sysadmin
·
Create a directory “C:\DatabaseShared” and ensure “SpludlowGroup”
has full control (This directory must be accessible by the Spludlow Framework
and the database server, if the database is remote then create a share on the
remote host)
·
Download the BasicCompanyDataSupportFiles.7z archive from the
downloads directory on the Spludlow Web
·
Extract the archive rename the directory and put here “C:\ProgramData\SpludlowV1\BasicCompany”
(If you put anywhere else ensure that “SpludlowGroup” has full permissions on
the directory
·
On the Intranet Call page search for “company” and click on the Spludlow.Data.BasicCompanyData.Run()
method the parameter text boxes will appear.
·
workingDirectory: “C:\ProgramData\SpludlowV1\BasicCompany”
·
connectionString: “WSYS-HOME-PC”
·
databaseName: “BasicCompany”
·
databaseTempBackupDirectory: “C:\DatabaseShared”
·
Run on message queue and check the status.
·
Fix the error by editing the file “DataURLs.txt” in notepad.
·
Go to http://download.companieshouse.gov.uk/en_output.html
in a web browser
·
You should see a list of 5 links to the 5 parts of the data.
·
Right click on the first part 1 link and select “Copy Link” (Or
whatever you do in your browser)
·
Open the DataURLs.txt file in notepad
·
Paste the full URL to part 1 of the data from the previous step 5
times, pressing return after each, so you have 5 identical lines
·
Edit lines 2, 3, 4, and 5 changing “part1_5” to “part2_5”, “part3_5”,
“part4_5”, and “part5_5”.
·
The file should now contain 5 lines with URLs to the 5 parts of
the data, save and close notepad, like so:
o
http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part1_5.zip
o
http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part2_5.zip
o
http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part3_5.zip
o
http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part4_5.zip
o
http://download.companieshouse.gov.uk/BasicCompanyData-2017-02-03-part5_5.zip
·
Restart the stopped message queue to start processing again
·
Keep an eye on the Status page, Logs page, the working directory,
and task manager. It may take a while.
·
When its finished you will get a log informing you
Examining the data
Have a look at some table data.
Create a database diagram in SSMS:
Create an Access front end and perform as query.