I have one scenario for retrieving millions of records from elastic search.
I am a beginner at Elastic-search and not able to use elastic search very efficiently.
I am indexing Author Model as shown below in elastic search and I am using NEST Client for using elastic search with a .net application.
Below I am explaining my models.
Author
--------------------------------
AuthorKey string
List<Study> Nested
Study
---------------------------------
PMID int
PublicationDate date
PublicationType string
MeshTerms string
Content string
We have almost 10 Millions of authors and each author has completed minimum 3 studies.
So there are approximate 30 millions records available in the elastic index.
Now I would like to get authors data along with its total study count
Below is sample JSON Data:
{
"Authors": [
{
"AuthorKey": "Author1",
"AuthorName": "karan",
"AuthorLastName": "shah",
"Study": [
{
"PMId": 1000,
"PublicationDate": "2019-01-17T06:35:52.178Z",
"content": "this is dummy content.how can i solve this",
"MeshTerms": "karan,dharan,nilesh,manan,mehul sir,manoj",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
},
{
"PMId": 1001,
"PublicationDate": "2019-01-16T05:55:14.947Z",
"content": "this is dummy content.how can i solve this",
"MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
},
{
"PMId": 1002,
"PublicationDate": "2019-01-15T05:55:14.947Z",
"content": "this is dummy content for record2.how can i solve
this",
"MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
"PublicationType": [
"ClinicalTrial1",
"Medical2"
]
},
{
"PMId": 1003,
"PublicationDate": "2011-01-15T05:55:14.947Z",
"content": "this is dummy content for record3.how can i solve this",
"MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
"PublicationType": [
"ClinicalTrial1",
"Medical3"
]
}
]
},
{
"AuthorKey": "Author2",
"AuthorName": "dharan",
"AuthorLastName": "shah",
"Study": [
{
"PMId": 2001,
"PublicationDate": "2011-01-16T05:55:14.947Z",
"content": "this is dummy content for author 2.how can i solve
this",
"MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
},
{
"PMId": 2002,
"PublicationDate": "2019-01-15T05:55:14.947Z",
"content": "this is dummy content for author 2.how can i solve
this",
"MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
"PublicationType": [
"ClinicalTrial1",
"Medical2"
]
},
{
"PMId": 2003,
"PublicationDate": "2015-01-15T05:55:14.947Z",
"content": "this is dummy content for record2.how can i solve
this",
"MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
"PublicationType": [
"ClinicalTrial1",
"Medical3"
]
}
]
},
{
"AuthorKey": "Author3",
"AuthorName": "Nilesh",
"AuthorLastName": "Mistrey",
"Study": [
{
"PMId": 3000,
"PublicationDate": "2012-01-16T05:55:14.947Z",
"content": "this is dummy content for author 2 .how can i solve
this",
"MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul sir2,manoj2",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
}
]
}
How to retrieve all authors along with their total studies count in descending order?
Expected output:
{
"Authors": [
{
"AuthorKey": "Author1",
"AuthorName": "karan",
"AuthorLastName": "shah",
"StudyCount": 4
},
{
"AuthorKey": "Author2",
"AuthorName": "dharan",
"AuthorLastName": "shah",
"StudyCount": 3
},
{
"AuthorKey": "Author3",
"AuthorName": "Nilesh",
"AuthorLastName": "Mistrey",
"StudyCount": 1
}
]
}
Below is mapping of the index:
{
"authorindex": {
"mappings": {
"_doc": {
"properties": {
"AuthorKey": {
"type": "keyword"
},
"AuthorLastName": {
"type": "keyword"
},
"AuthorName": {
"type": "keyword"
},
"Study": {
"type": "nested",
"properties": {
"MeshTerms": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"PMId": {
"type": "long"
},
"PublicationDate": {
"type": "date"
},
"PublicationType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}